Skip to content

feat: distributed policy engine — agent-based evaluation for 'ask'-tier commands #51

@butterflysky-ai

Description

@butterflysky-ai

Concept

Extend cc-toolgate from a local static evaluator to a distributed policy engine where a separate Claude Code agent (B) can evaluate "ask"-tier tool calls from a working agent (A).

Architecture

Agent A (worker)                    Agent B (judge)
    │                                    │
    ├─ tool call ──→ PreToolUse hook     │
    │                    │               │
    │              allow-listed? ─yes─→ approve (instant)
    │              deny-listed?  ─yes─→ reject (instant)
    │              "ask" tier?          │
    │                    │              │
    │                    ├──→ message bus ──→ B evaluates
    │                    │              │    (LLM reasoning
    │                    │              │     + AST analysis)
    │                    │              │         │
    │                    │              │    uncertain?
    │                    │              │    ├─→ escalate to
    │                    │              │    │   human (Discord)
    │                    │              │    │
    │              ←── verdict ─────────┘    │
    │              approve/deny              │
    ├─ continues ←─┘                         │

Decision tiers

  1. allow-listed → instant approve, no B involvement
  2. deny-listed → instant reject, no B involvement
  3. "ask" tier → B evaluates with LLM reasoning, returns verdict
  4. B uncertain → escalate to human via external messaging (Discord/DM)

Key design points

  • cc-toolgate already has allow/deny/ask taxonomy — this extends the "ask" path to an external evaluator
  • A is blocked synchronously on the PreToolUse hook anyway, so B can take whatever time it needs
  • B cross-references cc-toolgate's existing allowlist as a first pass
  • Once the agent-shell-parser port is done, B can evaluate both the raw command AND the parsed AST — e.g. distinguish rm with a glob in a sensitive path vs rm of a known temp file
  • cc-toolgate passes its env variables so B has environment-aware context (which git identity is active, which kubeconfig, etc.)
  • Executor and evaluator are separate sessions with independent context — prevents the working agent from manipulating the judge
  • Could be a new backend mode: evaluator = "local" | "agent" | "agent+escalate"

Dependencies

  • Port cc-toolgate to agent-shell-parser backend (AST-level command analysis)
  • Design message bus (file watcher, redis, nats — TBD)
  • PreToolUse hook integration for queue push/response wait
  • External messaging integration for human escalation (Dione/Discord)

Why

Separation of executor and evaluator is a fundamental security principle. Auto-mode permissions are "trust the model to self-police." This is "separate the proposer from the approver" — same reason you don't review your own PRs. A policy engine can also encode org-specific rules beyond generic safety checks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions