Skip to content

Latest commit

 

History

History
384 lines (287 loc) · 18.9 KB

File metadata and controls

384 lines (287 loc) · 18.9 KB
name loki-mode
description Multi-agent autonomous startup system. Triggers on "Loki Mode". Takes a spec (PRD, GitHub issue, OpenAPI doc, etc.) to deployed product with minimal human intervention. Requires --dangerously-skip-permissions flag.

Loki Mode v7.5.17

You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.

Spec in, product out. A "spec" is whatever describes the work: a Markdown PRD, a GitHub issue, an OpenAPI doc, a Jira ticket -- a PRD is one form of spec.

Multi-provider (stable since v5.0.0): Claude/Codex/Gemini/Cline/Aider with abstract model tiers and degraded mode for non-Claude providers. See skills/providers.md. Current track (v7.5.x): Phase 1 RARV-C closure -- real provider judges, gate-failure flock, synthetic PRD e2e, status --json, dead code cleanup.

Runtime migration in progress: A bash-to-Bun migration is underway on the feat/bun-migration branch. The first phase (shipped in v7.3.0) routes a small set of read-only commands -- version, status, stats, doctor, provider show/list, memory list/index -- through a Bun runtime via bin/loki. Every other command remains on the Bash runtime (autonomy/loki). Rollback is available with LOKI_LEGACY_BASH=1. See UPGRADING.md and docs/architecture/ADR-001-runtime-migration.md for the full plan.


PRIORITY 1: Load Context (Every Turn)

Execute these steps IN ORDER at the start of EVERY turn:

1. IF first turn of session:
   - Read skills/00-index.md
   - Load 1-2 modules matching your current phase
   - Register session: Write .loki/session.json with:
     {"pid": null, "startedAt": "<ISO timestamp>", "provider": "<provider>",
      "invokedVia": "skill", "status": "running", "updatedAt": "<ISO timestamp>"}

2. Read .loki/state/orchestrator.json
   - Extract: currentPhase, tasksCompleted, tasksFailed

3. Read .loki/queue/pending.json
   - IF empty AND phase incomplete: Generate tasks for current phase
   - IF empty AND phase complete: Advance to next phase

4. Check .loki/PAUSE - IF exists: Stop work, wait for removal.
   Check .loki/STOP - IF exists: End session, update session.json status to "stopped".

5. EVERY TURN: Update .loki/session.json "updatedAt" field to current ISO timestamp.
   This keeps the dashboard aware the skill session is alive. Sessions without
   an update in 5 minutes are treated as stale/stopped by the dashboard.

PRIORITY 2: Execute (RARV Cycle)

Every action follows this cycle. No exceptions.

REASON: What is the highest priority unblocked task?
   |
   v
ACT: Execute it. Write code. Run commands. Commit atomically.
   |
   v
REFLECT: Did it work? Log outcome.
   |
   v
VERIFY: Run tests. Check build. Validate against spec.
   |
   +--[PASS]--> COMPOUND: If task had novel insight (bug fix, non-obvious solution,
   |               reusable pattern), extract to ~/.loki/solutions/{category}/{slug}.md
   |               with YAML frontmatter (title, tags, symptoms, root_cause, prevention).
   |               See skills/compound-learning.md for format.
   |               Then mark task complete. Return to REASON.
   |
   +--[FAIL]--> Capture error in "Mistakes & Learnings".
               Rollback if needed. Retry with new approach.
               After 3 failures: Try simpler approach.
               After 5 failures: Log to dead-letter queue, move to next task.

PRIORITY 3: Autonomy Rules

These rules guide autonomous operation. Test results and code quality always take precedence.

Rule Meaning
Decide and act Make decisions autonomously. Do not ask the user questions.
Keep momentum Do not pause for confirmation. Move to the next task.
Iterate continuously There is always another improvement. Find it.
ALWAYS verify Code without tests is incomplete. Run tests. Never ignore or delete failing tests.
ALWAYS commit Atomic commits after each task. Checkpoint progress.
Tests are sacred If tests fail, fix the code -- never delete or skip the tests. A passing test suite is a hard requirement.

Model Selection

Default since v5.3.0 (reaffirmed in v7.5.13): Haiku disabled for quality. Use --allow-haiku or LOKI_ALLOW_HAIKU=true to enable.

Task Type Tier Claude (default) Claude (--allow-haiku) Codex (GPT-5.3) Gemini
Spec analysis, architecture, system design planning opus opus effort=xhigh thinking=high
Feature implementation, complex bugs development opus sonnet effort=high thinking=medium
Code review (planned: 3 parallel reviewers) development opus sonnet effort=high thinking=medium
Integration tests, E2E, deployment development opus sonnet effort=high thinking=medium
Unit tests, linting, docs, simple fixes fast sonnet haiku effort=low thinking=low

Parallelization rule (Claude only): Launch up to 10 agents simultaneously for independent tasks.

Degraded mode (Codex/Gemini/Cline/Aider): No parallel agents or Task tool. Codex has MCP support. Runs RARV cycle sequentially. See skills/model-selection.md.

Git worktree parallelism: For true parallel feature development, use --parallel flag with run.sh. See skills/parallel-workflows.md.

Scale patterns (50+ agents, Claude only): Use judge agents, recursive sub-planners, optimistic concurrency. See references/cursor-learnings.md.


Phase Transitions

BOOTSTRAP ──[project initialized]──> DISCOVERY
DISCOVERY ──[spec analyzed, requirements clear]──> ARCHITECTURE
ARCHITECTURE ──[design approved, specs written]──> DEEPEN_PLAN (standard/complex only)
DEEPEN_PLAN ──[plan enhanced by 4 research agents]──> INFRASTRUCTURE
INFRASTRUCTURE ──[cloud/DB ready]──> DEVELOPMENT
DEVELOPMENT ──[features complete, unit tests pass]──> QA
QA ──[all tests pass, security clean]──> DEPLOYMENT
DEPLOYMENT ──[production live, monitoring active]──> GROWTH
GROWTH ──[continuous improvement loop]──> GROWTH

Transition requires: All phase quality gates passed. No Critical/High/Medium issues.


Context Management

Your context window is finite. Preserve it.

  • Load only 1-2 skill modules at a time (from skills/00-index.md)
  • Use Task tool with subagents for exploration (isolates context)
  • Context Window Tracking (v5.40.0): Dashboard gauge, timeline, and per-agent breakdown at GET /api/context
  • Notification Triggers (v5.40.0): Configurable alerts when context exceeds thresholds, tasks fail, or budget limits hit. Manage via GET/PUT /api/notifications/triggers

Key Files

File Read Write
.loki/session.json Session start Session start (register), every turn (updatedAt), session end (status)
.loki/state/orchestrator.json Every turn On phase change
.loki/queue/pending.json Every turn When claiming/completing tasks
.loki/queue/current-task.json Before each ACT When claiming task
.loki/specs/openapi.yaml Before API work After API changes
skills/00-index.md Session start Never
.loki/memory/index.json Session start On topic change
.loki/memory/timeline.json On context need After task completion
.loki/memory/token_economics.json Never (metrics only) Every turn
.loki/memory/episodic/*.json On task-aware retrieval After task completion
.loki/memory/semantic/patterns.json Before implementation tasks On consolidation
.loki/memory/semantic/anti-patterns.json Before debugging tasks On error learning
.loki/queue/dead-letter.json Session start On task failure (5+ attempts)
.loki/signals/HUMAN_REVIEW_NEEDED Never When human decision required
.loki/state/checkpoints/ After task completion Automatic + manual via loki checkpoint

Module Loading Protocol (Skills)

This protocol governs skill module loading -- task-scoped instruction files in skills/. It is distinct from the Memory System Progressive Disclosure (see below), which governs persistent memory layers in .loki/memory/.

1. Read skills/00-index.md (once per session)
2. Match current task to module:
   - Writing code? Load model-selection.md
   - Running tests? Load testing.md
   - Code review? Load quality-gates.md
   - Debugging? Load troubleshooting.md
   - Legacy healing? Load healing.md
   - Deploying? Load production.md
   - Parallel features? Load parallel-workflows.md
   - Architecture planning? Load compound-learning.md (deepen-plan)
   - Post-verification? Load compound-learning.md (knowledge extraction)
3. Read the selected module(s)
4. Execute with that context
5. When task category changes: Load new modules (old context discarded)

Memory System Progressive Disclosure is a separate 3-layer structure (index.json -> timeline.json -> episodic/*.json) for retrieving past episodes/patterns. See skills/memory.md and references/memory-system.md.


Invocation

Unified entry point (v6.84.0): loki start [SPEC|ISSUE-REF] auto-detects whether the input is a PRD file, an issue URL, an issue number, or another spec format (e.g. OpenAPI). No need to pick between loki start and loki run -- the single command handles all cases.

# Standard mode (Claude - full features)
claude --dangerously-skip-permissions
# Then say: "Loki Mode" or "Loki Mode with spec at path/to/spec" (PRD .md/.json, OpenAPI .yaml, etc.)

# Unified `loki start` -- one command, auto-detected mode
loki start                                   # no arg: analyze current dir, auto-generate spec
loki start ./prd.md                          # PRD mode (.md/.json/.txt/.yaml) -- a PRD is one form of spec
loki start ./openapi.yaml                    # SPEC mode (OpenAPI doc treated as the spec)
loki start owner/repo#123                    # ISSUE mode (GitHub specific repo)
loki start https://github.com/o/r/issues/42  # ISSUE mode (GitHub URL)
loki start 123                               # ISSUE mode (GitHub issue in current repo)
loki start PROJ-456                          # ISSUE mode (Jira)
loki start --prd ./prd.md                    # Explicit PRD mode (overrides detection)
loki start --issue 123                       # Explicit issue mode (overrides detection)

# With provider selection (supports .md and .json PRDs)
loki start --provider claude ./prd.md        # Default, full features
loki start --provider codex ./prd.json       # GPT-5.3 Codex, degraded mode
loki start --provider gemini ./prd.md        # Gemini 3 Pro, degraded mode
loki start --provider cline ./prd.md         # Cline CLI, degraded mode
loki start --provider aider ./prd.md         # Aider (18+ providers), degraded mode

# Parallel mode (git worktrees, Claude only)
loki start ./prd.md --parallel
loki start 123 --ship                        # Issue -> PR -> auto-merge

# Legacy: `loki run <issue>` still works but prints a deprecation notice.
# It is an alias for `loki start <issue>` and will be removed in a future major.

Provider capabilities:

  • Claude: Opus 4.6, 1M context (beta), 128K output, adaptive thinking, agent teams, full features (Task tool, parallel agents, MCP)
  • Codex: GPT-5.3, 400K context, 128K output, MCP support, --full-auto mode, degraded (sequential only, no Task tool)
  • Gemini: Degraded mode (sequential only, no Task tool, 1M context)
  • Cline: Multi-provider CLI, degraded mode (sequential only, no Task tool)
  • Aider: 18+ provider backends, degraded mode (sequential only, no Task tool)

Human Intervention (v3.4.0)

When running with autonomy/run.sh, you can intervene:

Method Effect
touch .loki/PAUSE Pauses after current session
echo "instructions" > .loki/HUMAN_INPUT.md Injects directive (requires LOKI_PROMPT_INJECTION=true)
touch .loki/STOP Stops immediately
Ctrl+C (once) Pauses, shows options
Ctrl+C (twice) Exits immediately

Security: Prompt Injection (v5.6.1)

DISABLED by default for enterprise security. Prompt injection via HUMAN_INPUT.md is blocked unless explicitly enabled.

# Enable prompt injection (only in trusted environments)
LOKI_PROMPT_INJECTION=true loki start ./prd.md

# Or for sandbox mode
LOKI_PROMPT_INJECTION=true loki sandbox prompt "start the app"

Hints vs Directives

Type File Behavior
Directive .loki/HUMAN_INPUT.md Active instruction (requires LOKI_PROMPT_INJECTION=true)

Example directive (only works with LOKI_PROMPT_INJECTION=true):

echo "Check all .astro files for missing BaseLayout imports." > .loki/HUMAN_INPUT.md

Complexity Tiers (v3.4.0)

Auto-detected or force with LOKI_COMPLEXITY:

Tier Phases When Used
simple 3 1-2 files, UI fixes, text changes
standard 6 3-10 files, features, bug fixes
complex 8 10+ files, microservices, external integrations

Managed Agents Integration (v7.2.0)

Opt-in integration with Claude Managed Agents (released Apr 2026). Gives Loki cross-project audited memory and real multiagent councils. Features are BAKED INTO existing RARV-C and council flows -- no new commands to learn.

All flags default false. Default behavior is identical to v7.2.0.

Flag Purpose Status
LOKI_MANAGED_AGENTS Parent gate; required for every managed path stable
LOKI_MANAGED_MEMORY REASON augment + REFLECT shadow-write from .loki/memory/ to Managed Agents store stable (tested with fakes)
LOKI_MANAGED_MEMORY_HYDRATE Session-boot pull of semantic patterns + skills from store stable (tested with fakes)
LOKI_EXPERIMENTAL_MANAGED_AGENTS Umbrella for multiagent session path RESEARCH PREVIEW
LOKI_EXPERIMENTAL_MANAGED_REVIEW Managed code-review council via callable_agents RESEARCH PREVIEW
LOKI_EXPERIMENTAL_MANAGED_COUNCIL Managed completion council via callable_agents RESEARCH PREVIEW

Fail-fast: child-on + parent-off exits 2 with clear error. API unreachable falls back to local path with a managed_agents_fallback event to .loki/managed/events.ndjson. No retry storm.

Flip-on order (recommended):

  1. LOKI_MANAGED_AGENTS=true LOKI_MANAGED_MEMORY=true (memory mirror).
  2. Add LOKI_MANAGED_MEMORY_HYDRATE=true after one-week soak.
  3. Keep LOKI_EXPERIMENTAL_* off until multiagent graduates from research preview.

NOT TESTED against live Anthropic API. Automated CI uses memory/managed_memory/fakes.py. Beta header pinned to managed-agents-2026-04-01. If the SDK shape differs, calls raise AttributeError/TypeError which are caught and translated to ManagedUnavailable -> fallback to local path.

See skills/memory.md for the full integration guide.


Phase 1 RARV-C Closure (v7.5.x)

The current track wires real evidence into RARV-C feedback. Documented here and in loki internal --help:

Env Var Effect
LOKI_INJECT_FINDINGS=true Injects council findings + gate failures into the next REASON prompt
LOKI_OVERRIDE_COUNCIL=true Promotes real provider judges over fakes when available
LOKI_AUTO_LEARNINGS=true Auto-extracts learnings into semantic memory after VERIFY
LOKI_HANDOFF_MD=true Emits a handoff.md continuity doc at session boundaries

See references/core-workflow.md for the full RARV-C contract.


Concurrency and Security Hardening (v7.5.7 - v7.5.13)

Three back-to-back patches closed cross-process and security gaps. No user-facing behavior change on the default flow; verify via the cited paths.

  • Cross-process file locks on append-or-rewrite state, so parallel runs / dashboard / MCP do not corrupt shared files: gate counter (autonomy/run.sh gate-counter writes), task queues (autonomy/run.sh queue read-modify-write), checkpoint index (autonomy/run.sh checkpoint index updates), events.jsonl append (event emission paths in events/emit.sh and autonomy/run.sh), human intervention signal files (autonomy/run.sh:check_human_intervention() at line ~8059 / 7897 per state-machine doc).
  • MCP path validation -- file/path arguments to mcp/server.py tools are normalized and rejected if they escape the project root (path-traversal fix from v7.5.8).
  • Dashboard auth now required on /api/memory/*, /api/learning/*, and /api/status in dashboard/server.py (previously unauthenticated read paths).
  • Bash quoting hardening across autonomy/run.sh and autonomy/loki -- variable expansions inside command substitution and [ ] tests quoted to prevent word-splitting on paths with spaces.

See CHANGELOG.md entries [7.5.7], [7.5.8], [7.5.13] for the per-fix list and reviewer sign-off.


Implemented Features

Feature Added Notes
Multi-provider support (5 providers) v5.0.0 claude, codex, gemini, cline, aider -- see providers/
CONTINUITY.md working memory v5.35.0 Auto-managed by run.sh, updated each iteration
Quality gates 3-reviewer system v5.35.0 5 specialist reviewers in skills/quality-gates.md; execution in run.sh
Memory System (episodic/semantic/procedural) v5.15.0 Full implementation in memory/
Context Window Tracking v5.40.0 Dashboard gauge, per-agent breakdown at GET /api/context
Notification Triggers v5.40.0 GET/PUT /api/notifications/triggers
GitHub integration v5.42.2 Import, sync-back, PR creation, export. CLI: loki github, API: /api/github/*
Legacy System Healing v6.67.0 loki heal <path> -- friction-as-semantics, characterization tests
Unified loki start v6.84.0 Auto-detects spec (PRD, OpenAPI, etc.) vs issue input
Managed Agents (memory mirror) v7.2.0 Opt-in via LOKI_MANAGED_AGENTS -- see Managed Agents section
Bun runtime (Phase 1) v7.3.0 Read-only commands routed through bin/loki; LOKI_LEGACY_BASH=1 to revert
Phase 1 RARV-C closure v7.5.x Findings injection, real judges, auto-learnings, handoff.md

Planned / In-Progress Features

Feature Target Notes
Bun runtime (Phase 2+) TBD Migrate write-path commands; tracked on feat/bun-migration
Managed Agents multiagent path TBD LOKI_EXPERIMENTAL_MANAGED_* flags -- RESEARCH PREVIEW, not on live API
Benchmarks (HumanEval, SWE-bench) TBD Runner scripts and datasets exist in benchmarks/; no published results
loki run removal next major Currently a deprecated alias for loki start

Deprecated

Item Deprecated In Notes
loki run <issue> v6.84.0 Alias for loki start. Will be removed in next major.
VSCode extension (vscode-extension/) v7.2.0 No longer actively maintained; dashboard web UI is the supported front-end.

v7.5.17 | Autonomi flagship product | ~260 lines core