Threat modelling is now considered a best practice in comprehensive technical approaches to dealing with AI safety issues [S+25]. Threat modeling [S14] is a structured, proactive process used to identify potential threats and vulnerabilities in a system. While the traditional focus is on cyber-security and privacy issues, threat modelling has been extended for AI systems to account for the unique components and attack surfaces of machine learning models, data pipelines, and their interactions.
In this post, we give a high-level description of how threat modelling can be used to risk-assess AI systems. Such a process can be greatly aided if the threat modeller has access to architecture artefacts like system and data viewpoints as prescribed in architecture frameworks like The Open Group Architecture Framework (TOGAF) or the NATO Architecture Framework (NAF).
System Decomposition (What are we working on?) : The first step is to model the AI system and its environment, often using a Data Flow Diagram to visualise components, data paths, and trust boundaries. The key is to identify AI-specific components like training data (both labelled and unlabelled), machine learning model, training / retraining pipeline, inference endpoints or API, and external dependencies (e.g. open-source libraries, MLaaS providers) and map how data are transformed and moved between these components, paying close attention, in particular, to trust boundaries where different levels of trust or privilege exist.
Threat Identification (What can go wrong?) : Once the system is modelled, we then proceed to systematically brainstorm potential threats. There are a variety of methodologies, including STRIDE, LINDDUN, PASTA, and VAST. The identification of system-specific threats can be further systematised using attack enumeration / exploration methodologies like attack trees [H+24] and/or searching AI attack libraries like
- MITRE ATLAS (Adversarial Threat Landscape for AI Systems), which is a knowledge base and framework that organises tactics and techniques used to attack AI/ML systems, including data poisoning, model evasion, adversarial examples, and ML supply chain attacks. It plays a similar role to MITRE ATT\&CK but in the AI domain, covering how adversaries gain access to ML pipelines, manipulate data or models, and weaponise or disrupt AI systems.
- MIT AI Risk Repository [S+25b]
- PLOT4AI (Practical Library of Threats 4 AI)
And, yes, LLMs can be quite useful for navigating and querying these large AI risk and threat repositories and AI is itself increasingly used in threat modelling [P23]. And there are threat models for AI too [B25].
Risk Assessment and Mitigation (What are we going to do about it?) : After identifying the threats, risk assessment is performed to prioritise them. The DREAD model is often used to calculate a severity score for each threat. DREAD stands for Damage Potential, Reproducibility, Exploitability, Affected Users, and Discoverability. For the highest-risk threats, appropriate security controls then need to be designed and implemented. There are standard controls that can be implemented at the data pipeline level (e.g. enforce data provenance and integrity policies to prevent poisoning; apply strict access controls to training data) and infrastructure level (e.g. apply strong authentication and authorization controls at trust boundaries). There are also AI-specific controls that can implemented at the modelling level, including the use of adversarial / robust learning methods, incorporation of causal knowledge as prior or constraints in learning algorithms, and input sanitisation and output validation. Some of these topics are nuanced and will be covered in future posts.
Odds and Ends
When should threat modelling be done and how should the results be documented? [S14, Chapter 17] contains good guidance on how to introduce and embed threat modelling into development life cycles in different types of organisations, including the need to harmonise it with agile methodologies and more traditional project management activities. In the specific context of AI, I recommend threat modelling activities be conducted and tracked alongside the creation and maintenance of AI model cards and system cards, which is now a recommended control (ISM-2084) in the Information Security Manual. Useful guidance on AI model cards and system cards can be found in [K+25] and [M+22]. It is quite natural to articulate threat models and mitigations in widely accessible AI model and system card templates.
Useful threat modelling resources, including training material, software tools, and example real-world threat models can be found at https://github.com/hysnsec/awesome-threat-modelling and https://github.com/TalEliyahu/Threat_Model_Examples.
References
[S+25] Rohin Shah et al, An approach to technical AGI safety and security, arXiv:2504.01849, 2025.
[S14] Adam Shostack, Threat Modeling: Designing for Security, John Wiley & Sons, 2014.
[H+24] Seied Veria Hoseini et al, Threat modeling AI/ML with the attack tree, IEEE Access, volume 12, 2024.
[S+25b] Peter Slattery et al, The AI risk repository: A comprehensive meta-review, database, and taxonomy of risks from artificial intelligence, arXiv:2408.12622, 2025.
[P23] Pavan Paidy, Leveraging AI in Threat Modeling for Enhanced Application Security, International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol 4, pages 57-66, 2023.
[B25] Adam Bales, A polycrisis threat model for AI, AI & Society, Springer, 2025.
[K+25] Anna Knack et al, Defence AI Assurance: Identifying Promising Practice and A System Card Template for Defence,
The Alan Turing Institute, 2025.
[M+22] Lachlan McCalman et al, Assessing AI fairness in finance, Computer, vol 55, pages 94–97, 2022.