Following is a threat model for AI systems based on IBM AI Risk Atlas
| Risk | Description | Threat | Context | Mitigating Controls | Question to Application Owner | Control Type |
| Attribute inference attack | Inferring sensitive attributes from seemingly anonymous data. | Privacy loss and potential discrimination. | Predicting ethnicity or income level from usage patterns. | Limit model access, privacy-preserving learning, regular audits. | Does your model expose risks of revealing sensitive attributes about individuals? | Preventative |
| Confidential data in prompt | Sensitive or confidential info included in AI prompts during use. | Data leakage, insider threats, or unauthorized disclosures. | Users feeding client account numbers into LLM-based virtual assistants. | Prompt monitoring, DLP tools, input anonymization. | How do you prevent or detect confidential information being entered into AI prompts? | Preventative |
| Confidential information in data | Model training includes sensitive enterprise or client data. | Unauthorized disclosure, regulatory breaches, insider threats. | Sensitive client information embedded in foundation model training data. | Data classification, PII removal, data minimization. | How do you ensure that confidential information is excluded from your training data? | Preventative |
| Copyright infringement | Model training or outputs reproduce copyrighted content without authorization. | Legal liability and reputational risk from IP violations. | LLMs generating copyrighted text or code based on training data. | Content filtering, watermark detection, copyright scanning tools. | Is there a mechanism in place to detect copyrighted or third-party content in outputs? | Preventative |
| Dangerous use | AI system is used to perform or assist in harmful actions. | Security breaches, fraud, or misuse of tools for illegal activity. | AI-generated code used to develop malicious scripts or automate fraud. | Use restrictions, access controls, misuse monitoring. | Are controls in place to prevent your AI system from being used in harmful or unintended ways? | Preventative |
| Data contamination | Training data is corrupted with misleading or adversarial inputs. | Model accuracy and integrity are compromised, leading to bad decisions. | Poisoned inputs degrade fraud detection or risk modeling. | Data validation, anomaly detection, adversarial training. | How do you ensure training data has not been tampered with or poisoned? | Preventative |
| Data poisoning | Malicious or manipulated data used to train or retrain the AI model. | Degraded model performance or hidden backdoors. | Insider injecting corrupt training data into fraud detection model. | Data validation, anomaly detection, secure pipelines. | How do you protect your training pipelines from data poisoning attacks? | Preventative |
| Data privacy rights alignment | AI system usage may conflict with user privacy expectations or laws. | Regulatory violations and fines (e.g., GDPR, CCPA). | Models trained on user data without clear consent or opt-out mechanisms. | Privacy impact assessments, consent frameworks, legal review. | How does your AI system align with current privacy laws and user consent expectations? | Preventative |
| Data transfer restrictions | Limitations on moving data across jurisdictions or systems. | Non-compliance with regulations like GDPR or cross-border laws. | Sending user data from EU to non-EU cloud platforms for processing. | Data residency controls, encryption, regulatory assessments. | Are all data transfers associated with your AI system compliant with jurisdictional laws? | Preventative |
| Data usage restrictions | Limitations on how training or operational data can be used. | Non-compliance with data usage terms or regulatory requirements. | Using client data in AI without explicit consent or data minimization. | Data governance, consent management, policy enforcement. | Are there data usage policies in place to comply with legal and contractual obligations? | Preventative |
| Data usage rights restrictions | Restrictions on how ingested or shared data can be used in AI systems. | Violation of privacy agreements or regulatory requirements. | Shared financial data used for analytics beyond the original consent scope. | Data cataloging, access controls, data usage audits. | How do you ensure data is used in alignment with its permissions and consent terms? | Preventative |
| Evasion attack | Adversaries modify inputs to evade detection by AI systems. | Undetected fraudulent or malicious behavior. | Fraudsters bypassing transaction monitoring by exploiting input weaknesses. | Adversarial testing, robust model design, input normalization. | Have you tested your model against adversarial inputs to detect evasion tactics? | Preventative |
| Exposing personal information | Model reveals private or sensitive details during interactions. | Data breaches, privacy violations, loss of client trust. | Chatbot retrieving and disclosing historical user inputs. | Output filtering, PII scanning, memory reset mechanisms. | What safeguards exist to prevent your system from revealing personal or historical information? | Operational |
| Extraction attack | Attackers attempt to replicate or steal the underlying model through its outputs. | Loss of intellectual property or proprietary model behavior. | External APIs or chatbots could be reverse-engineered to extract valuable models. | Rate limiting, output perturbation, model watermarking. | Are you using techniques to detect or prevent model extraction through repeated queries? | Preventative |
| Hallucination | AI generates factually incorrect or misleading outputs. | Misleading internal or client-facing outputs and decision-making errors. | Chatbots generating incorrect financial product information. | Output validation, grounding, human-in-the-loop review. | How do you identify and correct AI outputs that contain hallucinated or false information? | Operational |
| Harmful code generation | AI-generated code introduces security flaws or malicious logic. | Vulnerabilities in production systems, legal liability. | Using generative AI to write scripts for internal applications without review. | Code review, static analysis, AI coding guardrails. | Is AI-generated code reviewed and tested using security best practices before deployment? | Preventative |
| Harmful output | AI system generates outputs that can directly or indirectly cause harm. | Physical, financial, or psychological impact to users or systems. | Generative model produces misleading compliance advice. | Use-case restrictions, output screening, human oversight. | How do you assess and control for potential harm in your AI model’s outputs? | Operational |
| Human exploitation | AI systems enable or reinforce harmful manipulation or exploitation of people. | Facilitation of scams, misinformation, or manipulation at scale. | AI-generated phishing campaigns or fake financial advice. | Misuse detection, access policies, ethical review processes. | Do you assess whether your AI system can be misused to exploit or manipulate users? | Operational |
| Improper data curation | Poor quality or mislabelled data leads to flawed training or outcomes. | Unintended behavior or inaccurate outputs. | Inconsistent labels in financial document categorization model. | Data quality checks, labeling guidelines, human validation. | What steps do you take to ensure high quality and consistency in your training data? | Preventative |
| Inaccessible training data | Lack of access to original training data prevents validation or retraining. | Limits explainability, compliance, and future adaptability. | Third-party model lacks transparency on how it was trained. | Vendor agreements, shadow datasets, audit rights. | Can you access or audit the training data used for this model, even if outsourced? | Operational |
| IP information in prompt | Proprietary data entered into AI tools could be leaked or misused. | Loss of competitive advantage, legal liability, or IP theft. | Employees inputting internal strategy documents into public LLMs. | Input sanitization, use policies, prompt filtering. | Do your input filters or use policies prevent entry of proprietary or IP-sensitive data? | Preventative |
| Jailbreaking | Attackers bypass safety mechanisms and content filters in LLMs. | Generation of harmful or non-compliant content through system manipulation. | Chatbots providing financial advice that violates compliance policies. | Rigorous input testing, adversarial defenses, automated response blockers. | How are you protecting your system from prompt manipulations and jailbreak attacks? | Preventative |
| Membership inference attack | Attackers deduce whether a specific data point was used in training. | Privacy violations and regulatory exposure. | Identifying customer data in training sets of fraud detection models. | Differential privacy, model regularization, audit logging. | Do you assess your models for vulnerability to membership inference attacks? | Preventative |
| Nonconsensual use | AI is applied to user data or scenarios without proper consent. | Ethical concerns, legal violations, and loss of trust. | Training a chatbot on internal employee interactions without notice. | Consent frameworks, opt-in mechanisms, usage audits. | Have you ensured that user data is only used with proper consent and transparency? | Operational |
| Personal information in data | Training data may include PII, either unintentionally or without proper consent. | Violates data protection regulations and increases risk of breaches. | PII embedded in training sets for fraud models or LLM-based chatbots. | PII scanning, data anonymization, data minimization practices. | How do you ensure that personal information is not present in training or production data? | Preventative |
| Personal information in prompt | Users input personal information into prompts unintentionally. | Exposure of PII through outputs or logs. | Employees submitting client details to LLM-powered tools. | Input redaction, user training, privacy warnings. | What measures prevent personal information from being submitted in prompts? | Preventative |
| Prompt injection attack | Manipulation of LLM prompts to override system instructions. | Could lead to unauthorized actions, data leaks, or incorrect model behavior. | Customer chatbots or LLM-integrated internal assistants may be exploited. | Input sanitization, context isolation, adversarial training. | What measures are in place to prevent prompt injection or manipulation of inputs? | Preventative |
| Prompt leaking | Prompts or histories can be exposed or cached in unintended ways. | Disclosure of sensitive prompts or business logic. | Cached conversations with customer data leaked through logs or memory. | Prompt encryption, secure logging, access restrictions. | What protections are in place to prevent stored or logged prompts from being accessed improperly? | Preventative |
| Prompt priming | Subtle manipulation of initial prompts to influence LLM behavior downstream. | Indirect control over AI behavior leading to biased or unsafe responses. | Using system prompts to bias chatbots to favor certain products or responses. | Prompt review policies, context audits, prompt integrity tools. | Are initial system prompts secured and reviewed to prevent unintended influence? | Preventative |
| Reidentification | Combining outputs with other data to reidentify individuals from anonymized datasets. | Violation of data protection laws and user privacy expectations. | Merging AI predictions with other customer data to infer identities. | K-anonymity, differential privacy, data minimization. | Have you assessed your system for reidentification risks using auxiliary data? | Preventative |
| Revealing confidential information | Outputs unintentionally include or infer confidential data. | Compliance risk, reputational damage, insider exposure. | LLMs revealing business strategy in casual outputs. | Context filtering, redaction, policy enforcement. | How do you monitor outputs to prevent leaks of confidential information? | Operational |
| Spreading disinformation | Model produces or amplifies false information. | User harm, loss of credibility, or market manipulation. | LLM chatbot delivering unverified or speculative investment advice. | Fact-checking integrations, restricted domains, human validation. | How do you prevent your system from generating or amplifying disinformation? | Operational |
| Spreading toxicity | Model generates toxic, offensive, or harmful content. | User harm, brand damage, or legal repercussions. | Chatbot output includes offensive or biased statements to customers. | Content moderation, toxicity filters, human review loops. | What safeguards are in place to prevent generation of toxic or offensive content? | Operational |
| Toxic output | AI produces offensive, hateful, or inappropriate responses. | Brand harm, regulatory complaints, or user backlash. | AI chatbot making discriminatory or inappropriate remarks to customers. | Toxicity filters, human moderation, response audits. | What controls are in place to prevent or detect toxic content generated by your system? | Operational |
| Uncertain data provenance | Unknown or unverifiable source of data used to train the AI model. | Compliance risks and model unreliability due to untrusted data. | Open-source datasets used without vetting their origin or content. | Provenance tracking tools, data documentation, risk classification. | Do you maintain traceability of data sources used in your model training? | Operational |
| Untraceable attribution | Model outputs lack a clear trace to data sources or logic. | Hinders root cause analysis, regulatory response, or accountability. | Risk models producing decisions with no audit trail. | Explainability tools, decision logs, model tracing. | Can your system explain how a specific output was derived and from what data? | Operational |