Skip to content

HReed1/hvr-agentic-os

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

176 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

HVR Agentic OS

๐Ÿ“ฐ As seen on HVRInformatics.com

A Zero-Trust multi-agent operating system built on Google's Agent Development Kit (ADK).

This framework orchestrates a hierarchy of specialized AI agents โ€” Director, Executor, QA Engineer, and Auditor โ€” that collaborate through strict tool segregation, adversarial verification, and DLP-enforced sandbox boundaries to autonomously write, test, validate, and deploy production code.


Architecture

graph LR
    D[Director] --> E[Executor]
    E <-->|TDAID| QA[QA Engineer]
    E -->|"EXECUTION COMPLETE"| A[Auditor]
    A -->|"AUDIT PASSED โœ“"| D
    A -->|"AUDIT FAILED โœ—"| D
    D --> R[Reporter]

    DLP[DLP Firewall] -.-> E
    DLP -.-> QA
    DLP -.-> A
Loading

Key Design Principles:

  • Tool Segregation: The Executor can write code but cannot run tests. The QA Engineer can run tests but cannot write code. The Auditor can promote staging but cannot modify files. No single agent can both create and deploy.
  • TDAID (Test-Driven AI Development): The QA Engineer writes and runs tests before the Executor implements, enforcing a Red โ†’ Green cycle that guarantees adversarial test coverage.
  • DLP Firewall: Every MCP tool call passes through a compiled Go binary (bin/dlp-firewall) that strips PHI patterns from the transport layer before data reaches any LLM.
  • Context Caching: Static agent instructions are cached by Vertex AI, reducing token consumption by ~56% across the evaluation suite (Era 5.1).

For the full execution graph and directory map, see autonomous-swarm-architecture.md.


Project Structure

hvr-agentic-os/
โ”œโ”€โ”€ agent_app/                    # Core ADK agent definitions
โ”‚   โ”œโ”€โ”€ __init__.py               # Entry point โ€” exports root_agent via App()
โ”‚   โ”œโ”€โ”€ agents.py                 # Agent topology (Director, Executor, QA, Auditor, Solo)
โ”‚   โ”œโ”€โ”€ config.py                 # Model selection, MCP paths, environment flags
โ”‚   โ”œโ”€โ”€ prompts.py                # Static + dynamic instruction providers
โ”‚   โ”œโ”€โ”€ tools.py                  # Shared FunctionTools (escalate, retrospective, etc.)
โ”‚   โ””โ”€โ”€ zero_trust/               # Zero-Trust enforcement layer
โ”‚       โ”œโ”€โ”€ interceptors.py       # Monkeypatches: PHI redaction, loop termination
โ”‚       โ””โ”€โ”€ callbacks.py          # before_tool_callback: sandbox blacklist, airgap
โ”œโ”€โ”€ mcp_servers/                  # MCP tool servers (launched via DLP firewall)
โ”‚   โ”œโ”€โ”€ executor_mcp.py           # Workspace mutations (write, replace, search)
โ”‚   โ”œโ”€โ”€ auditor_mcp.py            # Staging promotion, complexity measurement
โ”‚   โ”œโ”€โ”€ ast_validation_mcp.py     # TDAID test runner, AST parser, webhook fuzzer
โ”‚   โ””โ”€โ”€ adk_trace_mcp.py          # Session trace reader, animation generator
โ”œโ”€โ”€ bin/                          # Orchestration scripts + DLP firewall binary
โ”œโ”€โ”€ scripts/                      # Standalone utility scripts (reports, benchmarks)
โ”œโ”€โ”€ utils/                        # Shared libraries (dlp_proxy, staging_lease)
โ”œโ”€โ”€ .agents/                      # Agent governance (rules, skills, workflows, memory)
โ”œโ”€โ”€ tests/adk_evals/              # ADK evaluation test definitions (.test.json)
โ”œโ”€โ”€ docs/                         # Retrospectives, eval reports, architecture docs
โ””โ”€โ”€ api/                          # Example target application

Quick Start

1. Install Dependencies

python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

2. Configure Environment

cp .env.example .env
# Edit .env with your API keys:
#   GEMINI_API_KEY=your-key
#   (Optional) ANTHROPIC_API_KEY=your-key

3. Bootstrap the OS

chmod +x bin/bootstrap_agentic_os.sh
./bin/bootstrap_agentic_os.sh

This scaffolds docs/director_context/, initializes .agents/memory/, and creates baseline workspace structures. The script is non-destructive โ€” it skips any existing files.

4. Wake the Swarm

Interactive (Web UI):

adk web --port 8001

Navigate to http://localhost:8001, select agent_app from the dropdown, and issue a directive.

Headless (CLI):

adk run agent_app --message "Your directive here"

Evaluation Suite

The framework includes 11 adversarial evaluations that test the Swarm's Zero-Trust boundaries:

Evaluation What It Tests
Hallucination Recovery Agent invokes a tool that doesn't exist
HMAC Signature Tampering Code promotion without valid .qa_signature
PHI / DLP Redaction Genomic identifiers stripped before LLM output
Human-in-the-Loop Mandate Staging promotion blocked without human approval
QA Timeout Escalation Consecutive identical failures trigger hard abort
Discovery Loop Breaker Infinite workspace search loops terminated
Python AST Validation Structural code analysis before test execution
Cyclomatic Complexity McCabe score enforcement (max 5)
Strict TDAID Coverage QA tests must exist before code promotion
Deterministic Playwright E2E browser tests with artifact persistence
Pipeline Scorecard Global evaluation report generation

Running Evaluations

Single test:

adk eval agent_app tests/adk_evals/test_zt_phi_dlp_redaction.test.json

Full suite:

./bin/run_all_evals.sh

Head-to-head (Solo vs Swarm):

./bin/run_head_to_head.sh

Zero-Trust Enforcement Layers

Layer File Mechanism Scope
Transport bin/dlp-firewall Go binary wrapping MCP stdio streams All agents
Inference zero_trust/interceptors.py redact_genomic_phi() on every I/O All agents
Behavioral zero_trust/callbacks.py before_tool_callback sandbox enforcement Swarm only

The Solo agent is subject to Transport and Inference enforcement but bypasses Behavioral enforcement โ€” it follows protocols because its prompt instructs it to, not because it structurally lacks the tools to skip them.


Customizing the Firewall

To block additional tool patterns, modify agent_app/zero_trust/callbacks.py:

# Example: Block Kubernetes destructive commands
BLACKLIST_PATTERNS = [
    re.compile(r'\bkubectl\s+(delete|drain)\b', re.IGNORECASE),
]

Whenever an agent invokes a sandboxed tool, the callback intercepts the command string. If a pattern matches, a PermissionError halts execution immediately.

Role-Based Air-Gaps: The framework physically prevents the Executor from running pytest โ€” forcing all test execution through the QA Engineer's restricted tool scope.


Benchmarks (Era 5)

Benchmark Swarm Inferences Solo Inferences Swarm Tokens Solo Tokens
Small 19 14 219K 165K
Medium 21 8 190K 107K
Large 25 6 232K 89K
Fullstack 34 16 810K 419K

Both paradigms achieve 100% pass rates. The Solo agent is faster due to tool parallelism (fires independent operations in a single inference). The Swarm produces higher code quality through adversarial verification pressure. Full analysis: Tool Parallelism Bottleneck Analysis.


License

MIT โ€” see LICENSE.md.

About

An open-source framework for orchestrating autonomous AI agent swarms using Google ADK, with Zero-Trust security, TDAID testing, and structured escalation protocols.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

โšก