Cyber Risks in AI – Shaheen N Abdul Jabbar

Following is a threat model for AI systems based on IBM AI Risk Atlas

Risk	Description	Threat	Context	Mitigating Controls	Question to Application Owner	Control Type
Attribute inference attack	Inferring sensitive attributes from seemingly anonymous data.	Privacy loss and potential discrimination.	Predicting ethnicity or income level from usage patterns.	Limit model access, privacy-preserving learning, regular audits.	Does your model expose risks of revealing sensitive attributes about individuals?	Preventative
Confidential data in prompt	Sensitive or confidential info included in AI prompts during use.	Data leakage, insider threats, or unauthorized disclosures.	Users feeding client account numbers into LLM-based virtual assistants.	Prompt monitoring, DLP tools, input anonymization.	How do you prevent or detect confidential information being entered into AI prompts?	Preventative
Confidential information in data	Model training includes sensitive enterprise or client data.	Unauthorized disclosure, regulatory breaches, insider threats.	Sensitive client information embedded in foundation model training data.	Data classification, PII removal, data minimization.	How do you ensure that confidential information is excluded from your training data?	Preventative
Copyright infringement	Model training or outputs reproduce copyrighted content without authorization.	Legal liability and reputational risk from IP violations.	LLMs generating copyrighted text or code based on training data.	Content filtering, watermark detection, copyright scanning tools.	Is there a mechanism in place to detect copyrighted or third-party content in outputs?	Preventative
Dangerous use	AI system is used to perform or assist in harmful actions.	Security breaches, fraud, or misuse of tools for illegal activity.	AI-generated code used to develop malicious scripts or automate fraud.	Use restrictions, access controls, misuse monitoring.	Are controls in place to prevent your AI system from being used in harmful or unintended ways?	Preventative
Data contamination	Training data is corrupted with misleading or adversarial inputs.	Model accuracy and integrity are compromised, leading to bad decisions.	Poisoned inputs degrade fraud detection or risk modeling.	Data validation, anomaly detection, adversarial training.	How do you ensure training data has not been tampered with or poisoned?	Preventative
Data poisoning	Malicious or manipulated data used to train or retrain the AI model.	Degraded model performance or hidden backdoors.	Insider injecting corrupt training data into fraud detection model.	Data validation, anomaly detection, secure pipelines.	How do you protect your training pipelines from data poisoning attacks?	Preventative
Data privacy rights alignment	AI system usage may conflict with user privacy expectations or laws.	Regulatory violations and fines (e.g., GDPR, CCPA).	Models trained on user data without clear consent or opt-out mechanisms.	Privacy impact assessments, consent frameworks, legal review.	How does your AI system align with current privacy laws and user consent expectations?	Preventative
Data transfer restrictions	Limitations on moving data across jurisdictions or systems.	Non-compliance with regulations like GDPR or cross-border laws.	Sending user data from EU to non-EU cloud platforms for processing.	Data residency controls, encryption, regulatory assessments.	Are all data transfers associated with your AI system compliant with jurisdictional laws?	Preventative
Data usage restrictions	Limitations on how training or operational data can be used.	Non-compliance with data usage terms or regulatory requirements.	Using client data in AI without explicit consent or data minimization.	Data governance, consent management, policy enforcement.	Are there data usage policies in place to comply with legal and contractual obligations?	Preventative
Data usage rights restrictions	Restrictions on how ingested or shared data can be used in AI systems.	Violation of privacy agreements or regulatory requirements.	Shared financial data used for analytics beyond the original consent scope.	Data cataloging, access controls, data usage audits.	How do you ensure data is used in alignment with its permissions and consent terms?	Preventative
Evasion attack	Adversaries modify inputs to evade detection by AI systems.	Undetected fraudulent or malicious behavior.	Fraudsters bypassing transaction monitoring by exploiting input weaknesses.	Adversarial testing, robust model design, input normalization.	Have you tested your model against adversarial inputs to detect evasion tactics?	Preventative
Exposing personal information	Model reveals private or sensitive details during interactions.	Data breaches, privacy violations, loss of client trust.	Chatbot retrieving and disclosing historical user inputs.	Output filtering, PII scanning, memory reset mechanisms.	What safeguards exist to prevent your system from revealing personal or historical information?	Operational
Extraction attack	Attackers attempt to replicate or steal the underlying model through its outputs.	Loss of intellectual property or proprietary model behavior.	External APIs or chatbots could be reverse-engineered to extract valuable models.	Rate limiting, output perturbation, model watermarking.	Are you using techniques to detect or prevent model extraction through repeated queries?	Preventative
Hallucination	AI generates factually incorrect or misleading outputs.	Misleading internal or client-facing outputs and decision-making errors.	Chatbots generating incorrect financial product information.	Output validation, grounding, human-in-the-loop review.	How do you identify and correct AI outputs that contain hallucinated or false information?	Operational
Harmful code generation	AI-generated code introduces security flaws or malicious logic.	Vulnerabilities in production systems, legal liability.	Using generative AI to write scripts for internal applications without review.	Code review, static analysis, AI coding guardrails.	Is AI-generated code reviewed and tested using security best practices before deployment?	Preventative
Harmful output	AI system generates outputs that can directly or indirectly cause harm.	Physical, financial, or psychological impact to users or systems.	Generative model produces misleading compliance advice.	Use-case restrictions, output screening, human oversight.	How do you assess and control for potential harm in your AI model’s outputs?	Operational
Human exploitation	AI systems enable or reinforce harmful manipulation or exploitation of people.	Facilitation of scams, misinformation, or manipulation at scale.	AI-generated phishing campaigns or fake financial advice.	Misuse detection, access policies, ethical review processes.	Do you assess whether your AI system can be misused to exploit or manipulate users?	Operational
Improper data curation	Poor quality or mislabelled data leads to flawed training or outcomes.	Unintended behavior or inaccurate outputs.	Inconsistent labels in financial document categorization model.	Data quality checks, labeling guidelines, human validation.	What steps do you take to ensure high quality and consistency in your training data?	Preventative
Inaccessible training data	Lack of access to original training data prevents validation or retraining.	Limits explainability, compliance, and future adaptability.	Third-party model lacks transparency on how it was trained.	Vendor agreements, shadow datasets, audit rights.	Can you access or audit the training data used for this model, even if outsourced?	Operational
IP information in prompt	Proprietary data entered into AI tools could be leaked or misused.	Loss of competitive advantage, legal liability, or IP theft.	Employees inputting internal strategy documents into public LLMs.	Input sanitization, use policies, prompt filtering.	Do your input filters or use policies prevent entry of proprietary or IP-sensitive data?	Preventative
Jailbreaking	Attackers bypass safety mechanisms and content filters in LLMs.	Generation of harmful or non-compliant content through system manipulation.	Chatbots providing financial advice that violates compliance policies.	Rigorous input testing, adversarial defenses, automated response blockers.	How are you protecting your system from prompt manipulations and jailbreak attacks?	Preventative
Membership inference attack	Attackers deduce whether a specific data point was used in training.	Privacy violations and regulatory exposure.	Identifying customer data in training sets of fraud detection models.	Differential privacy, model regularization, audit logging.	Do you assess your models for vulnerability to membership inference attacks?	Preventative
Nonconsensual use	AI is applied to user data or scenarios without proper consent.	Ethical concerns, legal violations, and loss of trust.	Training a chatbot on internal employee interactions without notice.	Consent frameworks, opt-in mechanisms, usage audits.	Have you ensured that user data is only used with proper consent and transparency?	Operational
Personal information in data	Training data may include PII, either unintentionally or without proper consent.	Violates data protection regulations and increases risk of breaches.	PII embedded in training sets for fraud models or LLM-based chatbots.	PII scanning, data anonymization, data minimization practices.	How do you ensure that personal information is not present in training or production data?	Preventative
Personal information in prompt	Users input personal information into prompts unintentionally.	Exposure of PII through outputs or logs.	Employees submitting client details to LLM-powered tools.	Input redaction, user training, privacy warnings.	What measures prevent personal information from being submitted in prompts?	Preventative
Prompt injection attack	Manipulation of LLM prompts to override system instructions.	Could lead to unauthorized actions, data leaks, or incorrect model behavior.	Customer chatbots or LLM-integrated internal assistants may be exploited.	Input sanitization, context isolation, adversarial training.	What measures are in place to prevent prompt injection or manipulation of inputs?	Preventative
Prompt leaking	Prompts or histories can be exposed or cached in unintended ways.	Disclosure of sensitive prompts or business logic.	Cached conversations with customer data leaked through logs or memory.	Prompt encryption, secure logging, access restrictions.	What protections are in place to prevent stored or logged prompts from being accessed improperly?	Preventative
Prompt priming	Subtle manipulation of initial prompts to influence LLM behavior downstream.	Indirect control over AI behavior leading to biased or unsafe responses.	Using system prompts to bias chatbots to favor certain products or responses.	Prompt review policies, context audits, prompt integrity tools.	Are initial system prompts secured and reviewed to prevent unintended influence?	Preventative
Reidentification	Combining outputs with other data to reidentify individuals from anonymized datasets.	Violation of data protection laws and user privacy expectations.	Merging AI predictions with other customer data to infer identities.	K-anonymity, differential privacy, data minimization.	Have you assessed your system for reidentification risks using auxiliary data?	Preventative
Revealing confidential information	Outputs unintentionally include or infer confidential data.	Compliance risk, reputational damage, insider exposure.	LLMs revealing business strategy in casual outputs.	Context filtering, redaction, policy enforcement.	How do you monitor outputs to prevent leaks of confidential information?	Operational
Spreading disinformation	Model produces or amplifies false information.	User harm, loss of credibility, or market manipulation.	LLM chatbot delivering unverified or speculative investment advice.	Fact-checking integrations, restricted domains, human validation.	How do you prevent your system from generating or amplifying disinformation?	Operational
Spreading toxicity	Model generates toxic, offensive, or harmful content.	User harm, brand damage, or legal repercussions.	Chatbot output includes offensive or biased statements to customers.	Content moderation, toxicity filters, human review loops.	What safeguards are in place to prevent generation of toxic or offensive content?	Operational
Toxic output	AI produces offensive, hateful, or inappropriate responses.	Brand harm, regulatory complaints, or user backlash.	AI chatbot making discriminatory or inappropriate remarks to customers.	Toxicity filters, human moderation, response audits.	What controls are in place to prevent or detect toxic content generated by your system?	Operational
Uncertain data provenance	Unknown or unverifiable source of data used to train the AI model.	Compliance risks and model unreliability due to untrusted data.	Open-source datasets used without vetting their origin or content.	Provenance tracking tools, data documentation, risk classification.	Do you maintain traceability of data sources used in your model training?	Operational
Untraceable attribution	Model outputs lack a clear trace to data sources or logic.	Hinders root cause analysis, regulatory response, or accountability.	Risk models producing decisions with no audit trail.	Explainability tools, decision logs, model tracing.	Can your system explain how a specific output was derived and from what data?	Operational

Leave a comment Cancel reply