A structured framework for classifying, detecting, and defending against attacks on AI agent systems.
Version 1.0 | March 2026 | OpenA2A
AI agents operate differently from traditional software. They make decisions based on natural language, delegate actions to tools with varying trust levels, communicate with other agents via open protocols, and maintain persistent memory that can be poisoned. These properties create attack surfaces that existing frameworks do not adequately model.
The AI Agent Threat Matrix classifies attacks against AI agent systems into 9 tactics and 57 techniques, organized by kill chain stage. Every technique is grounded in observed adversary behavior or validated in a controlled lab environment. Every technique maps to automated detection, a reproducible lab scenario, and a defensive control.
This matrix covers the agent layer — the infrastructure between the model and the user:
- Governance file manipulation (SOUL.md, system prompts, behavioral constraints)
- Skill and plugin supply chain attacks
- MCP and A2A protocol exploitation
- Agent memory poisoning and persistence
- Credential exposure through agent infrastructure
- Sandbox escape via framework defaults
- Cross-agent lateral movement and identity attacks
This matrix does not cover:
- Model-level attacks (adversarial examples, training data poisoning) — see MITRE ATLAS
- Prompt injection as a standalone topic — see OWASP Top 10 for LLM
- Traditional enterprise network attacks — see MITRE ATT&CK
The Agent Threat Matrix is designed to work alongside these frameworks, not replace them.
| Tactic | Kill Chain Stage | Techniques | Description |
|---|---|---|---|
| Reconnaissance | 1 | 7 | Map the target agent's attack surface, capabilities, and behavioral boundaries |
| Initial Access | 2 | 8 | Gain control over agent behavior through prompt manipulation or input exploitation |
| Credential Harvest | 3 | 6 | Extract API keys, tokens, and credentials from agent context and connected services |
| Privilege Escalation | 4 | 6 | Escalate capabilities beyond declared scope or bypass authorization |
| Lateral Movement | 5 | 6 | Pivot from compromised agent to connected services or other agents |
| Persistence | 6 | 6 | Establish persistent access surviving restarts and session changes |
| Collection | 7 | 6 | Gather and stage data from databases, file systems, and APIs |
| Exfiltration | 8 | 6 | Transfer collected data out of target environment |
| Impact | 9 | 6 | Modify data, deploy malicious code, or disrupt services |
57 techniques across 9 tactics. 34 attack classes grouping related techniques. 16 techniques with real-world evidence. 38 techniques validated in controlled lab environments. 3 techniques adapted from traditional environments (marked as such).
Every technique in this matrix is assigned an evidence tier:
| Tier | Meaning | Count |
|---|---|---|
| Observed | Confirmed in real-world production systems | 16 (28%) |
| Validated | Reproduced in controlled lab environment (DVAA) | 38 (67%) |
| Adapted | Well-understood traditional technique applied to agent context, not yet observed agent-specifically | 3 (5%) |
We do not publish purely theoretical techniques. Every entry has either a real-world observation, a reproducible lab scenario, or an established traditional precedent.
See EVIDENCE_AUDIT.md for the full justification of every technique's evidence tier.
This matrix maps every technique to four external references:
| Framework | What It Provides |
|---|---|
| HackMyAgent | Automated detection (204 checks, 115 attack payloads) |
| DVAA | Lab validation (10 vulnerable agents, 14 challenges) |
| OASB | Defensive controls (72 controls across 11 categories) |
| MITRE ATT&CK / ATLAS / OWASP LLM | Gap analysis showing what existing frameworks cover and where the agent layer extends beyond them |
See cross-references/ for detailed mapping documents.
Attack classes group related techniques by the underlying vulnerability pattern:
| Class | Description | Techniques |
|---|---|---|
| SOUL-POISON | Malicious instructions injected into governance files at write-time | T-2001, T-2003, T-2007, T-2008 |
| SOUL-DRIFT | Multi-turn sequences gradually eroding behavioral boundaries | T-2004, T-4006 |
| SOUL-INJECT | Conflicting instructions via tool outputs or indirect channels | T-1003, T-2001, T-2003, T-2007, T-2008 |
| PHANTOM-SOUL | Agent deployed with zero behavioral constraints | T-1006 |
| SOUL-FORK | Different behavior under evaluation vs production | T-5002 |
| SOUL-HIJACK | External content achieving constitution override | T-4002, T-5002 |
| SOUL-BOUNDARY | Exploiting ambiguous constraint definitions | T-2008 |
| SOUL-DELEGATE | Delegation without authorization chain verification | T-4001 |
| SOUL-IMPERSONATE | False capability claims beyond authorization | T-4002 |
| SOUL-HV | Harm avoidance override variants (4 sub-types) | T-2001, T-2003 |
| Class | Description | Real-World Evidence |
|---|---|---|
| UNICODE-STEGO | Invisible Unicode encoding instructions in source code | os-info-checker npm attack (May 2025) |
| MEM-POISON | Persistent instructions in agent memory surviving restarts | DVAA L2-04, L3-03 |
| SKILL-MEM-AMP | Skill plants payload in memory, survives uninstall | DVAA validated |
| RAG-POISON | Malicious content in vector databases retrieved by RAG pipeline | Academic research, DVAA RAGBot |
| HEARTBEAT-RCE | Periodic instruction fetch via heartbeat URL persistence | OpenClaw heartbeat mechanism, NemoClaw H-007 |
| SKILL-FRONTMATTER | YAML metadata injection bypassing content filters | DVAA PluginBot |
| SKILL-EXFIL | Skill exfiltrates data outside declared tool boundaries | DVAA tool chain scenarios |
| ORG-SKILL-SPREAD | Compromised admin skill propagates organization-wide | ClawHavoc campaign patterns |
| Class | Description | Real-World Evidence |
|---|---|---|
| GATEWAY-EXPLOIT | Misconfigured API gateways exposing agent infrastructure | ~75K OpenClaw gateways unauthenticated |
| MCP-EXPLOIT | MCP server config, tool permissions, transport security | DVAA MCP agents, MCP protocol analysis |
| RETROACTIVE-PRIV | Existing credentials silently gaining AI permissions | 32 API keys in HTTP responses (Jan 2026) |
| LLM-EXPOSE | LLM inference endpoints exposed without authentication | ~56K Ollama instances (Mar 2026) |
| AITOOL-EXPOSE | AI development tools exposed (Jupyter, MLflow, Gradio) | ~8.3K Jupyter, ~740 MLflow (Mar 2026) |
| CODE-INJECTION | Command injection via unsanitized inputs | NemoClaw C-001, C-002 |
| INTEGRITY-BYPASS | Digest/hash bypass on empty or missing values | NemoClaw C-005 |
| TOCTOU-RACE | Time-of-check-time-of-use race in verification pipelines | NemoClaw C-006 |
| Class | Description | Evidence |
|---|---|---|
| NEMO-CRED-LEAK | Credential exposure in NemoClaw configuration | C-004 (CLI args), H-004 (env passthrough) |
| NEMO-NETWORK-EXPOSE | Network services bound to public interfaces | C-004 (gateway), C-005 (k3s) |
| NEMO-SUPPLY-CHAIN | Supply chain integrity bypass | C-003 (curl|sh), C-005 (digest bypass) |
| NEMO-SANDBOX-ESCAPE | Sandbox isolation failure | H-001 (Docker privileged), H-004 (Landlock) |
| NEMO-OPENCLAW-INHERIT | Inherited OpenClaw flaws surviving sandboxing | H-007 (Telegram pre-allowed) |
| Class | Description |
|---|---|
| AGENT-IMPERSONATE | False capability claims in A2A communications |
| BEHAVIORAL-IMPERSONATE | Stolen credentials detected via behavioral baseline mismatch |
| Class | Description |
|---|---|
| SANDBOX-ESCAPE | General sandbox escape via privileged containers or LSM degradation |
Follow the kill chain stages sequentially. Use technique IDs to plan attack paths. Reference DVAA challenges for practice. The Attack Paths section provides complete worked examples.
Map your defenses against each tactic. Use OASB controls as a checklist. Any tactic without detection or prevention represents a gap. Focus on breaking the chain at the earliest stage.
Cite techniques using their IDs (e.g., "ATM T-2001"). The catalog is extensible — contribute new techniques via pull request with evidence requirements.
Map your product's detection capabilities to ATM technique IDs. This enables customers to understand which agent-specific threats your product covers.
Complete kill chain traversals demonstrated in DVAA:
Path A — API Agent Full Compromise: T-1001 → T-2001 → T-3001 → T-5004 → T-7001 → T-8002 (LegacyBot → ToolBot: recon, inject, harvest creds, pivot, enumerate files, exfiltrate)
Path B — Memory Persistence Chain: T-1001 → T-2001 → T-6001 → T-7004 → T-8005 (MemoryBot: recon, inject, persist in memory, dump memory, exfiltrate via conversation)
Path C — Multi-Agent A2A Chain: T-1006 → T-2001 → T-4002 → T-5002 → T-9001 (Orchestrator → Worker → ToolBot: discover agent card, inject, impersonate admin, pivot via A2A, modify data)
Path D — Supply Chain to Full Compromise: T-1002 → T-2005 → T-6004 → T-5003 → T-9006 (PluginBot → ProxyBot: discover tools, inject via tool description, backdoor skill, hop MCP servers, compromise downstream)
agent-threat-matrix/
├── README.md # This document
├── EVIDENCE_AUDIT.md # Evidence tier justification for every technique
├── matrix.json # Machine-readable matrix (full data)
├── tactics/ # One file per tactic (kill chain stage)
├── techniques/ # One file per technique (T-XXXX)
├── attack-classes/ # One file per attack class
├── mitigations/ # OASB control mappings
├── evidence/ # Real-world evidence reports
├── cross-references/ # MITRE ATT&CK, ATLAS, OWASP LLM mappings
├── CONTRIBUTING.md # How to propose new techniques
├── CHANGELOG.md
└── LICENSE # Apache-2.0
When referencing individual techniques:
AI Agent Threat Matrix T-2001 (Direct Prompt Injection). OpenA2A, 2026. https://threats.opena2a.org/techniques/T-2001
When referencing the framework:
OpenA2A. "AI Agent Threat Matrix v1.0." March 2026. https://threats.opena2a.org
See CONTRIBUTING.md for the full process. In summary:
- Every new technique must have either a real-world observation or a reproducible lab scenario
- Assign the next available technique ID in the appropriate stage range
- Include HMA detection check ID, DVAA validation (if applicable), and OASB control mapping
- Submit as a pull request with evidence documentation
Apache-2.0. The framework is free to use, cite, and build upon.
| Project | Role |
|---|---|
| HackMyAgent | Automated detection for ATM techniques (204 checks, 115 payloads) |
| DVAA | Lab validation environment (10 agents, 14 challenges) |
| OASB | Defensive benchmark (72 controls) |
| AI Agent Kill Chain | Tactical framework (the 9-stage progression model) |
Maintained by OpenA2A. Contributions welcome via pull request.