Skip to content

Latest commit

 

History

History
338 lines (235 loc) · 23.4 KB

File metadata and controls

338 lines (235 loc) · 23.4 KB

External AI red-team engagement package

Generated 2026-05-05. Operator-ready package for engaging an external AI red-team firm to deliver the Phase 1 pentest deliverable from MACRO_ROADMAP.md. This document is the single hand-off surface: vendor shortlist, scoping template, RFP, kickoff checklist, decision matrix, remediation playbook.

Forward references. This package references packages/safety/security/pentest-2026-05/scope-and-methodology.md and packages/safety/compliance/TIER_C_HANDOFF.md as authoritative scope documents. Those files are scheduled deliverables of Phase 0 (target: week ending 2026-05-15) and are NOT yet checked in. Until they land, the scoping doc template below is canonical and MACRO_ROADMAP.md § Phase 1 stands in for the handoff context. Update cross-refs once those files exist.

Pricing transparency. All dollar figures are estimated ranges based on industry-typical brackets for AI red-team work in 2026 (4-6 week engagements, 2-3 testers). Mark as "research-needed" in any RFP response — actual quotes will arrive from vendor SOWs.


1. Vendor comparison table

Vendor Scope coverage Typical engagement size AI-specific experience Pricing tier (est.) Known clients (AI space) Source / notes
Bishop Fox Full-stack red team, web/API, cloud, prompt injection, model extraction 4-8 weeks, 2-4 testers Published research on LLM jailbreaks (2023-2025); dedicated AI/ML practice $60k-$150k research-needed (NDA-covered) https://bishopfox.com/services/ai-ml-security
NCC Group Web, mobile, cloud, AI/ML threat modeling, supply chain 6-12 weeks, 3-5 testers "AI/ML Security" service line since 2023; CHECK/CREST-certified; published OWASP LLM Top 10 contributions $80k-$200k UK government, financial sector (research-needed for AI-specific names) https://www.nccgroup.com/services/ai-ml-security/
Trail of Bits Cryptography, smart-contract, ML model security, supply chain, fuzzing 6-10 weeks, 2-3 testers Heavy ML security publishing (PrivacyRaven, Fickling pickle scanner); deep technical bench $100k-$250k OpenAI (public — model weight security audit, 2023); research-needed others https://www.trailofbits.com/services/ai-ml-assurance/
Snyk Security Labs SAST/DAST/SCA-anchored, supply chain, container, IaC; AI/ML add-on Continuous (platform) + 2-6 week pro-services Snyk Code AI features; less red-team-native than the above $40k-$100k (services) on top of platform license platform-wide; AI-specific work research-needed https://snyk.io/product/snyk-labs/
AppSec Engineer AppSec training + consulting; AI security curriculum 4-8 weeks, 2-3 testers "AI Security Certified Professional" training program; consulting practice newer than red-team incumbents $30k-$80k (smaller boutique pricing) training clients (Microsoft, AWS — research-needed for consulting clients) https://www.appsecengineer.com/
HiddenLayer (boutique, AI-specialist) Adversarial ML, model theft, prompt injection, MLOps supply chain 4-6 weeks, 2-3 testers ML/AI-native firm; product is ML detection & response; red-team services adjacent $50k-$120k Fortune-500 ML platforms (research-needed) https://hiddenlayer.com/services/
Robust Intelligence (boutique, AI-specialist) LLM red-teaming, RAG security, agentic system audits 3-6 weeks, 2-3 testers Founded by ML researchers; AI Validation Platform; published research on LLM vulnerabilities $60k-$150k research-needed (Cisco acquisition 2024 — services pipeline post-acq) https://www.robustintelligence.com/

Selection note. For an agent-framework + skill-marketplace product (enchanter-ai), the load-bearing skill is prompt injection + agentic tool-use exploitation, not classical web app pentest. Trail of Bits, Bishop Fox, and Robust Intelligence are the strongest fits on that axis. NCC Group is the strongest fit if SOC-2/compliance auditor alignment matters more than technical depth. Snyk and AppSec Engineer fit if budget is the dominant constraint.


2. Scoping doc template

This is what we hand the firm before the RFP. It sets the in-scope boundary so quotes are comparable. Copy this template into packages/safety/security/pentest-2026-05/scope-and-methodology.md once that path exists; until then it lives here.

2.1 System under test

  • Product: enchanter-ai (agent framework + skill marketplace + capability-shield runtime)
  • Repos: enchanted-skills/ monorepo (umbrella), per-plugin sub-repos (hydra, sylph, djinn, wixie, others)
  • Surfaces:
    • Skill loader + frontmatter parser (hydra)
    • Capability-shield PreToolUse hook chain
    • Signing/verification path (cosign + Sigstore + SLSA)
    • State layer (precedent log, inference-engine artifacts.jsonl, briefings)
    • Hook injection surface (UserPromptSubmit, PostToolUse)
    • Sub-agent dispatch (delegation.md contract)
    • Inference substrate (cross-session evidence accumulation)
  • Out-of-scope:
    • Anthropic API itself (vendor surface, not ours)
    • Underlying OS / Windows shell (host responsibility)
    • Third-party MCP servers we don't ship

2.2 Threat actors in scope

  1. Malicious skill author — crafts a SKILL.md that bypasses capability-shield, exfiltrates state, or hijacks tool-use.
  2. Prompt injection via untrusted content — adversarial input in fetched web pages, user-pasted text, file contents triggering unintended tool calls.
  3. Supply-chain attacker — compromises a dependency (npm, PyPI, GitHub Actions artifact) to inject malicious code into the skill loader.
  4. Insider with repo write — attempts to backdoor capability-shield without triggering signing/verification.
  5. Model-extraction adversary — attempts to elicit system prompts, conduct hooks, or precedent log contents via prompt manipulation.

2.3 Methodology required

  • Whitebox. Full source access. We provide cloned repos + setup instructions + a test harness.
  • Threat modeling. STRIDE or LINDDUN pass on the surfaces above, delivered as a written artifact.
  • Manual exploitation. Not just automated scanners. The bar is "demonstrate exploit chain end-to-end."
  • OWASP LLM Top 10 (2025) coverage. Map findings against the framework.
  • CVSS v4.0 scoring. v3.1 acceptable as fallback if firm's tooling lags.
  • Reproducibility. Each finding ships with a reproducible POC (script, payload, or step-by-step).

2.4 Deliverables required

  • Executive summary (≤ 5 pages, non-technical)
  • Technical findings report (CVSS-scored, POC per finding, remediation guidance)
  • Threat model artifact
  • Remediation timeline recommendation (Critical: ≤ 7d, High: ≤ 30d, Medium: ≤ 90d, Low: best-effort)
  • Debrief presentation (1 hour, video-recorded, attendees: governance owner, hydra owner, AIMS owner)

3. RFP template

Ready to send to the shortlist. Replace bracketed fields before transmission.


REQUEST FOR PROPOSAL — AI Agent Framework Penetration Test

Issued by: [legal entity name], operator of enchanter-ai Date issued: [YYYY-MM-DD] Responses due: [YYYY-MM-DD, 21 days from issue] Contact: [governance owner name + email]

3.1 Company background

enchanter-ai is an agent framework + skill marketplace for AI assistants (Claude Code, ChatGPT, Gemini). The platform's load-bearing security control is capability-shield, a PreToolUse hook system that gates tool calls based on signed SKILL.md frontmatter. Compromise of the skill loader, capability-shield, or signing chain is treated as catastrophic.

The platform is pre-revenue, in the SOC 2 Type II evidence-collection phase, and targeting ISO 42001 certification in Phase 3 of the macro roadmap. This pentest is the gating deliverable for Phase 1 exit.

3.2 Engagement scope

See attached scope-and-methodology.md (section 2 of this package) for full system surfaces, threat actors, and out-of-scope items. Headline asks:

  • Whitebox audit of the skill loader + capability-shield + signing chain
  • Prompt-injection and agentic tool-use exploitation
  • Supply-chain attack pathway analysis
  • State-layer integrity (precedent log, inference substrate)

3.3 Required deliverables

  1. Executive summary (≤ 5 pages, board-readable)
  2. Technical findings report with:
    • CVSS v4.0 score per finding
    • Reproducible POC per finding
    • Concrete remediation guidance with code-level recommendations
  3. Threat model artifact (STRIDE or LINDDUN, written)
  4. Remediation timeline recommendation by severity bracket
  5. Recorded 1-hour debrief with governance, hydra, AIMS owners

All deliverables in PDF + Markdown source. POCs as standalone repos or patch files.

3.4 Required attestations

Vendor must execute, prior to scope walkthrough:

  • NDA — mutual, covering source, findings, customer identities, and engagement metadata. Term: 5 years post-engagement.
  • No-data-retention clause — vendor destroys all customer data and findings artifacts 90 days post-final-deliverable, with written attestation of destruction.
  • Indemnification — vendor indemnifies for negligent harm to production systems during testing. We carry our own E&O for testing-induced outages on test environments.
  • Responsible-disclosure — vendor agrees not to publish any finding, even sanitized, without our written consent for 12 months post-engagement.
  • Insurance — proof of $5M cyber liability + $2M professional E&O.

3.5 Timeline expectations

  • RFP response: 21 days from issue
  • Vendor selection + SOW signing: 14 days from response deadline
  • Kickoff: within 30 days of SOW signing
  • Field work: 4-8 weeks (vendor proposes)
  • Draft report: within 14 days of field-work end
  • Final report + debrief: within 30 days of draft

Total elapsed: 90-120 days from RFP issue to final report. Phase 2 of the macro roadmap depends on receiving the final report by month 3.

3.6 Evaluation criteria

See decision matrix (section 5). Weighted: AI-domain depth 40%, pricing 20%, timeline 20%, references 10%, methodology fit 10%.

3.7 Required RFP response sections

Vendor responses must include, in this order:

  1. Firm overview + AI-security practice tenure
  2. Proposed team (names, credentials, AI/ML-specific experience per tester)
  3. Methodology (mapped to section 2.3 requirements)
  4. Proposed timeline with milestones
  5. Fixed-fee or T&M quote with breakdown
  6. Three references — AI/agent-platform engagements within the last 24 months
  7. Sample sanitized deliverable from a comparable engagement
  8. Attestation acknowledgments (section 3.4)

4. 30-day pre-engagement kickoff checklist

Owner is the governance owner unless otherwise noted. Days are counted from SOW signature (day 0).

Days 0-7: Legal + paper

  • MSA executed — Master Services Agreement, signed by both legal entities. (legal owner)
  • NDA executed — mutual, 5-year term. (legal owner)
  • SOW finalized — scope, fees, deliverables, timeline locked. (governance owner + vendor)
  • Insurance certificate received — $5M cyber + $2M E&O, named insured matches MSA. (legal owner)
  • Data-handling rider executed — destruction attestation 90 days post-deliverable. (legal owner)
  • Indemnification rider reviewed — counsel sign-off. (legal owner)

Days 7-14: Scope walkthrough + technical handoff

  • Scope walkthrough meeting — 90 min, vendor team + governance + hydra owner + AIMS owner. Walk section 2 surface by surface. Record. (governance owner)
  • Repo access provisioned — read-only GitHub access to monorepo + per-plugin repos. SSH keys collected from vendor's named testers only. (hydra owner)
  • Test environment provisioned — isolated VM or sandbox account, not production. Snapshot baseline taken. (operator)
  • Build instructions verified — vendor team can stand up the framework end-to-end in <1 day. (hydra owner)
  • Test harness shared — internal red-team harness (if any) provided as starting point. (hydra owner)

Days 14-21: Contacts + comms

  • Contact roster published — single-page doc: vendor lead, vendor backup, governance owner, hydra owner, AIMS owner, legal, operator. Phone + email + Signal handle each. (governance owner)
  • Escalation paths defined:
    • P0 — production-exploitable critical found mid-engagement → vendor lead calls hydra owner direct, within 1 hour. Backup: governance owner.
    • P1 — scope dispute → escalate to governance owner within 24h. Vendor pauses billable hours during dispute.
    • P2 — comms gap >48h → governance owner pings vendor lead. Two missed check-ins = SOW review trigger.
  • Evidence-sharing channel established — encrypted (Signal, Keybase, or 1Password shared vault). NOT email, NOT Slack. Vendor confirms received-and-tested. (operator)
  • Standing weekly check-in scheduled — 30 min, Tuesdays, governance + vendor lead. (governance owner)

Days 21-30: Final pre-kickoff

  • Internal red-team handoff doc finalized — what we've already tested ourselves; what we suspect; what we want them to challenge. Living doc shared in evidence channel. (hydra owner)
  • Debrief schedule pre-booked — 1-hour video call slot reserved 75 days out (estimated final-report date + 7d). (governance owner)
  • Remediation team standby — hydra owner + 1 SWE on-call for the 4-week field-work window for clarifying questions. (hydra owner)
  • Communication blackout established — no public mention of engagement (blog, social, conference talks) until 30 days post-final-report. (governance owner)
  • Kickoff call — day 30, 60 min, full vendor team + full our-side team. Go-signal. (governance owner)

5. Decision matrix

Score each vendor 1-10 per axis. Multiply by weight. Highest weighted total wins. Tie-breaker: AI-domain depth.

Criterion Weight What "10" looks like What "1" looks like
AI-domain depth 40% Published research on LLM/agent exploits in last 18mo; named senior tester with ML background; sample deliverable on a comparable agent platform Generic web pentest firm with "we can do AI too" pitch deck, no publications, no named ML expertise
Pricing 20% Fixed-fee, within budget, transparent breakdown T&M with no cap, exceeds budget by >50%, opaque rate sheet
Timeline 20% Can start within 30 days of SOW; field work ≤ 6 weeks; final report within 30 days of draft Cannot start for 90+ days; field work open-ended; report timeline vague
References 10% Three named, contactable, within-24-month AI-platform refs willing to vouch One vague ref, NDA-blocked, or refs are non-AI engagements
Methodology fit 10% Section 2.3 requirements addressed point-by-point; threat-model artifact promised; CVSS v4.0 native Boilerplate methodology pasted from web pentest template; no threat model; CVSS v3.1 only

Weighted score example. Vendor A: AI 9, price 6, time 8, refs 7, methodology 9 → (9×0.4)+(6×0.2)+(8×0.2)+(7×0.1)+(9×0.1) = 3.6+1.2+1.6+0.7+0.9 = 8.0. Vendor B: 7, 9, 7, 8, 7 → 2.8+1.8+1.4+0.8+0.7 = 7.5. Vendor A wins on depth despite higher cost.

Hard floor. No vendor scoring < 7 on AI-domain depth proceeds, regardless of weighted total. The whole point of this engagement is AI-specific risk; a generic firm cannot deliver.


6. Post-engagement remediation playbook

Each finding from the technical report flows through this playbook. Severity is the vendor's CVSS bracket; we may upgrade severity but not downgrade without written justification in packages/safety/security/pentest-2026-05/severity-overrides.md.

6.1 Severity → owner → SLA

Severity (CVSS v4.0) Owner Triage SLA Fix SLA Verification SLA
Critical (9.0-10.0) hydra owner (primary), governance owner (escalation) 24h 7 days Independent re-test by another team member within 14 days
High (7.0-8.9) hydra owner 72h 30 days Self-verified with reproducible test added to regression suite; spot-check by reviewer
Medium (4.0-6.9) plugin owner (per-finding) 7 days 90 days Self-verified, regression test added
Low (0.1-3.9) plugin owner 14 days Best-effort, next quarterly cycle Regression test optional; document risk acceptance if not fixed
Info / Hardening plugin owner 30 days Track in learnings.md; no SLA No verification required; review at next management review

6.2 Evidence-of-closure rules

A finding is "closed" when all four hold:

  1. Code change committed and merged (or risk-accepted with written justification + management-review minute).
  2. Regression test added to the appropriate suite (capability-shield test suite, hydra test suite, or tests.json per CLAUDE.md § Artifacts).
  3. Re-test executed by the verification owner (per the table above) and passed.
  4. Closure recorded in packages/safety/security/pentest-2026-05/closure-log.md with: finding ID, severity, owner, fix commit hash, regression-test commit hash, re-test result, date closed.

Risk-accepted findings (not fixed) require:

  • Written justification (business or technical rationale)
  • Compensating control documented
  • Management-review minute recording the acceptance
  • Re-evaluation date set (typically 12 months)

6.3 Aggregate reporting

Weekly during remediation window:

  • Open / closed / risk-accepted counts per severity, posted to governance Slack channel
  • SLA breach watch — any finding within 25% of SLA expiry without fix gets escalated to governance owner

Monthly:

  • Closure-log diff reviewed at management review (per ISO 42001 §9.3 template)
  • Trend snapshot — finding categories repeating across the report flow into wixie/state/precedent.jsonl (see shared/conduct/precedent.md) and via /inference-emit into the substrate

6.4 Re-test protocol

For Critical and High findings, an independent re-test confirms closure:

  1. Re-test executor: a team member NOT on the original fix. For Critical, this is the hydra owner if the fix was someone else's, or a delegated SWE if hydra owner did the fix.
  2. Re-test input: the vendor-supplied POC, run against the post-fix build.
  3. Pass criterion: POC no longer triggers the finding behavior, AND the regression test in the suite passes.
  4. Fail criterion: POC still works OR regression test fails OR a closely-related variant of the POC works. Failed re-test re-opens the finding; SLA clock continues from original triage date, not re-test date.
  5. Re-test artifact: attach the re-test output to the closure-log entry.

6.5 Cross-finding patterns

If three or more findings share a root cause (e.g., "frontmatter parser trusts unsigned input across three skill loaders"), the playbook escalates:

  • A root-cause fix epic is opened, not three independent fixes.
  • The root-cause fix becomes the primary closure path; per-finding closures point to the epic.
  • The pattern is emitted to the inference substrate as a substrate-relevant artifact (per shared/conduct/inference-substrate.md) so it surfaces in future briefings.

6.6 Out-of-engagement findings

If the vendor finds something out of scope but security-relevant (e.g., a host-OS issue, an Anthropic API misuse, a third-party MCP server flaw), it's logged with severity Info / Hardening and routed to the appropriate external party. We don't drop these; we relay.


7. Anti-patterns (read before sending the RFP)

  • Selecting on price alone. A $30k AppSec Engineer engagement that misses a Critical capability-shield bypass costs the company more than the difference vs. a $120k Trail of Bits engagement that catches it.
  • Skipping the NDA + destruction attestation. Standard pentest contracts often allow vendors to retain findings for "case study" purposes. Negotiate this out.
  • Letting the vendor self-define scope. They will pad. Scope is ours; methodology is theirs.
  • Accepting CVSS v3.1 only. v4.0 was released 2023; any AI-security firm worth hiring uses v4.0 by 2026. v3.1-only is a signal of stale tooling.
  • No standing weekly check-in. Four-week field work without check-ins = unpleasant surprises at draft-report time.
  • Closing findings without regression tests. The bug comes back. Always test.
  • Letting Critical findings sit past 7-day SLA. If the SLA breaks, escalate within hours, not days. Critical findings are P0.
  • Treating the debrief as optional. The recorded debrief is the most efficient transfer of vendor's mental model. Attend, record, transcribe, index.

8. Open items requiring decisions before RFP issue

These are explicit blockers. Resolve before sending.

  1. Legal entity name — which entity executes the MSA? (enchanter-ai LLC vs. parent? — governance owner to confirm)
  2. Budget ceiling — is the $25k-$100k bracket from MACRO_ROADMAP.md § Phase 1 firm, or stretchable to $150k for a top-tier firm? (CFO / governance)
  3. Production access policy — vendor gets prod read-only? Sandbox-only? Decide before scope walkthrough. (operator + governance)
  4. Disclosure policy — public sanitized writeup ever permissible, or perpetual confidentiality? (governance + marketing)
  5. Insurance carrier confirmation — does our existing E&O cover testing-induced outages on test infra? (legal + insurance broker)

These five questions go to governance owner on day -7 (one week before RFP send). RFP does not ship with unresolved items 1, 2, or 3.


Appendix A — File references

  • roadmap-2026/MACRO_ROADMAP.md — parent roadmap; Phase 1 is what this package executes
  • packages/safety/security/pentest-2026-05/scope-and-methodology.mdplanned (Phase 0); canonical scope doc once it lands
  • packages/safety/compliance/TIER_C_HANDOFF.mdplanned (Phase 0); abstract handoff context this package replaces with concrete execution
  • packages/safety/security/pentest-2026-05/closure-log.mdcreated at engagement kickoff (day 30); single source of finding lifecycle
  • packages/safety/security/pentest-2026-05/severity-overrides.mdcreated if needed during engagement; documents any severity downgrades
  • wixie/state/precedent.jsonl — cross-session failure ancestry (shared/conduct/precedent.md)
  • wixie/plugins/inference-engine/state/artifacts.jsonl — substrate stream (shared/conduct/inference-substrate.md)

Appendix B — Pricing-bracket honesty disclaimer

All pricing figures in section 1 are estimated based on industry-typical brackets for AI red-team engagements in 2024-2025 (most recent public data points). Actual quotes will deviate by ±50% based on:

  • Vendor's current backlog (busy firm = premium pricing)
  • Engagement timing (Q4 typically cheapest; Q1/Q2 typically priciest)
  • Negotiated scope (fixed-fee vs. T&M)
  • Reference-customer status (some firms discount for case-study rights)

Do not commit to a budget on these numbers alone. The RFP responses are the only authoritative pricing input.

Appendix C — Where this package ends and the engagement begins

This document covers RFP issue → SOW signing → kickoff → remediation → closure. It does NOT cover:

  • The pentest report itself (vendor deliverable)
  • Specific remediations for specific findings (per-finding owners' problem)
  • Surveillance audits (Phase 2-3 deliverable)
  • Re-engagement decisions (annual review, not in scope here)

When the final report lands and remediation begins, this package retires. A new doc — packages/safety/security/pentest-2026-05/findings-and-remediation.md — takes over for the active-remediation phase.