Skip to content

Latest commit

ย 

History

History
95 lines (75 loc) ยท 5.01 KB

File metadata and controls

95 lines (75 loc) ยท 5.01 KB

Plan: The Dark Factory

Vision

TKO should reach a state where the repository is essentially maintained by AI agents. Not as a novelty, but because the infrastructure โ€” tests, CI, docs, verified behaviors โ€” is robust enough that agents can work autonomously and humans can trust the output.

The term "dark factory" comes from manufacturing: a facility that runs without lights because the robots don't need to see. In software, it means agents handle the implementation โ€” writing code, fixing bugs, updating dependencies, writing docs โ€” while humans focus on direction, design, and validation.

Simon Willison describes five levels of AI-assisted programming, from "spicy autocomplete" (Level 0) to the fully autonomous "dark software factory" (Level 5). StrongDM's AI team operates at Level 5: "Code must not be written by humans. Code must not be reviewed by humans." Engineers design specs, curate test scenarios, and watch scores. Agents do everything else.

Willison's key observation: engineers shift from building code to building the systems that build the code. The critical unsolved question is how agents prove their code works without human review. The answer, for StrongDM and for TKO, is tests โ€” not as a checkbox, but as the primary artifact that defines correctness.

TKO is currently between Level 3 and Level 4. Agents (Claude Code, Copilot) do most of the implementation. Humans review PRs, set priorities, and make architectural decisions. The goal is to push toward Level 5 where practical, while being honest about where human judgment is still required.

References

What makes this possible

The tooling modernization (Phases 1โ€“6) was the foundation:

  • Verified behaviors โ€” verified-behaviors.json files are test-backed contracts that AI agents can check their work against
  • SOUL.md โ€” the philosophical foundation of Knockout, so agents understand why the framework works the way it does
  • AGENTS.md โ€” instructions that any AI coding tool can follow
  • llms.txt โ€” concise project context for LLM consumption
  • bun run verify โ€” single command to confirm nothing is broken (biome + tsc + build + verify:esm + vitest)
  • bun run knip โ€” detect dead code, unused deps
  • Changesets โ€” structured release management
  • CI safety net โ€” lint, typecheck, test, ESM verification on every PR
  • Branch protection โ€” all changes go through PRs
  • plans/ โ€” documented intent so agents understand context

What's not there yet

Gap What's needed
Dependency updates Renovate/Dependabot with 48h minimumReleaseAge
Bundle size tracking CI check comparing browser.min.js against main
Benchmarks vitest bench for observable/computed hot paths
Coverage confidence Know which code paths are covered, which aren't
Autonomous PR review AI reviewer that checks against verified behaviors
Release automation Tag + release triggered by changeset merge, not manual
Copilot/Cursor support .github/copilot-instructions.md extending AGENTS.md
Issue triage AI can read an issue, reproduce it, propose a fix
Scenario testing StrongDM-style holdout scenarios for end-to-end validation

Principles

  1. Tests are the source of truth. If it's not tested, it doesn't exist. AI agents should never make changes they can't verify. Tests are not a checkbox โ€” they are the primary artifact that defines correctness.

  2. Safety by default. The CI pipeline should catch any regression an AI introduces. bun run verify must pass before any commit.

  3. Intent over implementation. Plans and SOUL.md describe why things work the way they do. Code describes what. AI can change the what if it understands the why.

  4. Small, reviewable changes. One concern per PR. A human should be able to review any AI-generated PR in under 5 minutes.

  5. No magic. Every tool, script, and CI step should be understandable by reading the code. No hidden state, no implicit dependencies.

  6. Earn trust incrementally. Start with low-risk automation (deps, docs, formatting). Expand to bug fixes and features as confidence grows. The level of autonomy an agent gets should match the level of safety net around it.

โšก