Summary
We'd like to contribute a real-world case study that extends LLM01 (Prompt Injection) into the autonomous agent domain, demonstrating how indirect prompt injection combined with persistent memory and autonomous execution creates a fundamentally new attack class.
Finding
We identified and demonstrated (on controlled infrastructure) a chain attack against autonomous AI agent frameworks where a single crafted text input achieves persistent, autonomous code execution:
- Indirect prompt injection delivered via group chat, email, or web content is stored in the agent's persistent memory
- Poisoned memory modifies the agent's autonomous execution config (heartbeat/cron)
- Modified config executes attacker payload every 15 to 60 minutes indefinitely
This goes beyond traditional prompt injection because:
- It persists across sessions (memory poisoning)
- It executes autonomously (no human trigger after initial injection)
- It is self-reinforcing (payload can re-inject if cleaned)
- The entire exploit is natural language text
Scale
Passive reconnaissance (no active scanning) found:
- 26+ exposed agent gateways on public internet (Shodan)
- 1,532 leaked API keys on public GitHub
- 2,232 repos with autonomous execution configs (HEARTBEAT.md)
- 2,904 repos with persistent memory systems (MEMORY.md)
Affected Systems
- OpenClaw (confirmed via controlled PoC on own infrastructure)
- AutoGPT, CrewAI, ZeroClaw (architecturally vulnerable)
- Any agent framework combining persistent memory + untrusted input + autonomous execution
Key Finding
The AI model's safety alignment (tested on Claude Sonnet 4.6) is the ONLY defense layer. Infrastructure provides zero isolation, zero integrity checking, and zero memory provenance tracking. The model blocked direct credential exfiltration (20/20 attempts) but allowed system diagnostic commands when framed as helpful troubleshooting, inadvertently leaking API keys.
CVSS and CVE
- CVSS 3.1: AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:L = 9.9 Critical
- CWE-74 (Injection) / CWE-94 (Code Injection)
- CVE Request #2006754 submitted to MITRE on March 12, 2026
Proposed Contribution
We believe this warrants either:
- (a) An extension to LLM01 covering agent-specific persistence vectors
- (b) A new entry in the Agentic AI security initiative
- (c) A case study or whitepaper contribution
Full technical paper and PoC details available upon request.
Wesley Freilich, Oystra Security Research
info@oystra.ai | oystra.ai
Summary
We'd like to contribute a real-world case study that extends LLM01 (Prompt Injection) into the autonomous agent domain, demonstrating how indirect prompt injection combined with persistent memory and autonomous execution creates a fundamentally new attack class.
Finding
We identified and demonstrated (on controlled infrastructure) a chain attack against autonomous AI agent frameworks where a single crafted text input achieves persistent, autonomous code execution:
This goes beyond traditional prompt injection because:
Scale
Passive reconnaissance (no active scanning) found:
Affected Systems
Key Finding
The AI model's safety alignment (tested on Claude Sonnet 4.6) is the ONLY defense layer. Infrastructure provides zero isolation, zero integrity checking, and zero memory provenance tracking. The model blocked direct credential exfiltration (20/20 attempts) but allowed system diagnostic commands when framed as helpful troubleshooting, inadvertently leaking API keys.
CVSS and CVE
Proposed Contribution
We believe this warrants either:
Full technical paper and PoC details available upon request.
Wesley Freilich, Oystra Security Research
info@oystra.ai | oystra.ai