๐ฐ As seen on HVRInformatics.com
A Zero-Trust multi-agent operating system built on Google's Agent Development Kit (ADK).
This framework orchestrates a hierarchy of specialized AI agents โ Director, Executor, QA Engineer, and Auditor โ that collaborate through strict tool segregation, adversarial verification, and DLP-enforced sandbox boundaries to autonomously write, test, validate, and deploy production code.
graph LR
D[Director] --> E[Executor]
E <-->|TDAID| QA[QA Engineer]
E -->|"EXECUTION COMPLETE"| A[Auditor]
A -->|"AUDIT PASSED โ"| D
A -->|"AUDIT FAILED โ"| D
D --> R[Reporter]
DLP[DLP Firewall] -.-> E
DLP -.-> QA
DLP -.-> A
Key Design Principles:
- Tool Segregation: The Executor can write code but cannot run tests. The QA Engineer can run tests but cannot write code. The Auditor can promote staging but cannot modify files. No single agent can both create and deploy.
- TDAID (Test-Driven AI Development): The QA Engineer writes and runs tests before the Executor implements, enforcing a Red โ Green cycle that guarantees adversarial test coverage.
- DLP Firewall: Every MCP tool call passes through a compiled Go binary (
bin/dlp-firewall) that strips PHI patterns from the transport layer before data reaches any LLM. - Context Caching: Static agent instructions are cached by Vertex AI, reducing token consumption by ~56% across the evaluation suite (Era 5.1).
For the full execution graph and directory map, see autonomous-swarm-architecture.md.
hvr-agentic-os/
โโโ agent_app/ # Core ADK agent definitions
โ โโโ __init__.py # Entry point โ exports root_agent via App()
โ โโโ agents.py # Agent topology (Director, Executor, QA, Auditor, Solo)
โ โโโ config.py # Model selection, MCP paths, environment flags
โ โโโ prompts.py # Static + dynamic instruction providers
โ โโโ tools.py # Shared FunctionTools (escalate, retrospective, etc.)
โ โโโ zero_trust/ # Zero-Trust enforcement layer
โ โโโ interceptors.py # Monkeypatches: PHI redaction, loop termination
โ โโโ callbacks.py # before_tool_callback: sandbox blacklist, airgap
โโโ mcp_servers/ # MCP tool servers (launched via DLP firewall)
โ โโโ executor_mcp.py # Workspace mutations (write, replace, search)
โ โโโ auditor_mcp.py # Staging promotion, complexity measurement
โ โโโ ast_validation_mcp.py # TDAID test runner, AST parser, webhook fuzzer
โ โโโ adk_trace_mcp.py # Session trace reader, animation generator
โโโ bin/ # Orchestration scripts + DLP firewall binary
โโโ scripts/ # Standalone utility scripts (reports, benchmarks)
โโโ utils/ # Shared libraries (dlp_proxy, staging_lease)
โโโ .agents/ # Agent governance (rules, skills, workflows, memory)
โโโ tests/adk_evals/ # ADK evaluation test definitions (.test.json)
โโโ docs/ # Retrospectives, eval reports, architecture docs
โโโ api/ # Example target application
python -m venv venv && source venv/bin/activate
pip install -r requirements.txtcp .env.example .env
# Edit .env with your API keys:
# GEMINI_API_KEY=your-key
# (Optional) ANTHROPIC_API_KEY=your-keychmod +x bin/bootstrap_agentic_os.sh
./bin/bootstrap_agentic_os.shThis scaffolds docs/director_context/, initializes .agents/memory/, and creates baseline workspace structures. The script is non-destructive โ it skips any existing files.
Interactive (Web UI):
adk web --port 8001Navigate to http://localhost:8001, select agent_app from the dropdown, and issue a directive.
Headless (CLI):
adk run agent_app --message "Your directive here"The framework includes 11 adversarial evaluations that test the Swarm's Zero-Trust boundaries:
| Evaluation | What It Tests |
|---|---|
| Hallucination Recovery | Agent invokes a tool that doesn't exist |
| HMAC Signature Tampering | Code promotion without valid .qa_signature |
| PHI / DLP Redaction | Genomic identifiers stripped before LLM output |
| Human-in-the-Loop Mandate | Staging promotion blocked without human approval |
| QA Timeout Escalation | Consecutive identical failures trigger hard abort |
| Discovery Loop Breaker | Infinite workspace search loops terminated |
| Python AST Validation | Structural code analysis before test execution |
| Cyclomatic Complexity | McCabe score enforcement (max 5) |
| Strict TDAID Coverage | QA tests must exist before code promotion |
| Deterministic Playwright | E2E browser tests with artifact persistence |
| Pipeline Scorecard | Global evaluation report generation |
Single test:
adk eval agent_app tests/adk_evals/test_zt_phi_dlp_redaction.test.jsonFull suite:
./bin/run_all_evals.shHead-to-head (Solo vs Swarm):
./bin/run_head_to_head.sh| Layer | File | Mechanism | Scope |
|---|---|---|---|
| Transport | bin/dlp-firewall |
Go binary wrapping MCP stdio streams | All agents |
| Inference | zero_trust/interceptors.py |
redact_genomic_phi() on every I/O |
All agents |
| Behavioral | zero_trust/callbacks.py |
before_tool_callback sandbox enforcement |
Swarm only |
The Solo agent is subject to Transport and Inference enforcement but bypasses Behavioral enforcement โ it follows protocols because its prompt instructs it to, not because it structurally lacks the tools to skip them.
To block additional tool patterns, modify agent_app/zero_trust/callbacks.py:
# Example: Block Kubernetes destructive commands
BLACKLIST_PATTERNS = [
re.compile(r'\bkubectl\s+(delete|drain)\b', re.IGNORECASE),
]Whenever an agent invokes a sandboxed tool, the callback intercepts the command string. If a pattern matches, a PermissionError halts execution immediately.
Role-Based Air-Gaps: The framework physically prevents the Executor from running pytest โ forcing all test execution through the QA Engineer's restricted tool scope.
| Benchmark | Swarm Inferences | Solo Inferences | Swarm Tokens | Solo Tokens |
|---|---|---|---|---|
| Small | 19 | 14 | 219K | 165K |
| Medium | 21 | 8 | 190K | 107K |
| Large | 25 | 6 | 232K | 89K |
| Fullstack | 34 | 16 | 810K | 419K |
Both paradigms achieve 100% pass rates. The Solo agent is faster due to tool parallelism (fires independent operations in a single inference). The Swarm produces higher code quality through adversarial verification pressure. Full analysis: Tool Parallelism Bottleneck Analysis.
MIT โ see LICENSE.md.