feat: pluggable runner interface — support Pi, Hermes, and other coding agents

## Motivation

I tested Stokowski's fully automated pipeline (investigate → implement → code-review(fresh) → done) in a real project and compared it against Overstory's multi-worker tournament model. The conclusion: **Stokowski's adversarial code-review mechanism (using a `session: fresh` independent agent to review the previous stage's code) produces production-quality code**, at roughly half the token cost of Overstory.

Currently Stokowski supports `claude` and `codex` as runners, and the author suggests in `workflow.example.yaml`: *"set runner: codex here to get a second-opinion from a different provider."* This design philosophy is excellent — **different LLMs writing and reviewing code, cancelling out each other's biases.**

However, the open-source community has other capable coding agent CLIs:

| Agent | Non-interactive mode | JSON output | Session resume | Worktree |
|---|---|---|---|---|
| **Pi** | `pi -p "..."` | ✅ `--mode json` (JSONL) | ❌ | ❌ |
| **Hermes** | `hermes chat -q "..."` | ❌ (quiet text) | ✅ `--resume` | ✅ `--worktree` |
| **OpenCode** | via plugin | ❌ | ❌ | ❌ |

All of these have the core capability a Stokowski runner needs: accept a prompt → execute in a workspace → return results.

## Proposal

1. **Abstract the runner interface**: unify `build_claude_args` / `build_codex_args` into a runner protocol, so new runners only need to implement `build_args()` + `parse_output()`
2. **Add Pi runner**: Pi's `--mode json` output format is similar to Claude Code's `stream-json` (JSONL event stream) — lowest adaptation cost
3. **Add Hermes runner**: Hermes has `--resume` and `--worktree`, closest in capability to Claude Code

## Test Data

I ran a real-world task (Python URL shortener with FastAPI + SQLite + pytest) comparing three approaches:

**Stokowski fully automated pipeline (claude implement + code-review fresh):**
- Total: 11,024 tokens, ~4 minutes
- Output: 6 files, 11 tests, including URL validation, collision retry, parameterized db_path
- The code-review agent automatically fixed security and robustness issues

**Overstory multi-worker tournament (lead + 2 builders):**
- Total: ~24,000+ tokens, ~6 minutes
- Output: 5 files per builder, 5 tests each
- Lead's merged version was decent but at 2x the token cost

**Conclusion: Stokowski's investigate → implement → adversarial review pipeline is the most cost-effective fully automated approach.** The current claude + codex dual-runner design has already proven the value of adversarial review. If every stage could use a different vendor's agent — for example, Pi (Gemini) for investigate, Hermes (DeepSeek) for implement, Codex (GPT) for code-review — the adversarial effect would be even stronger, leveraging each model's strengths: some excel at analysis, some at generation, some at finding bugs. And all of this works via pure CLI invocation without Claude Code hooks, making deployment simpler.

## Additional: Local tracker experiment

I also adapted a `local_tracker.py` (replacing Linear as the task source). In simple testing, I found that Stokowski's state machine engine can run independently of Linear, with behavior consistent with the Linear mode. I plan to eventually modify it to use my DataScript database via HTTP for state management, but I would love to see the project ship a "planning with file" mode — allowing users who don't use Linear to drive Stokowski's fully automated pipeline through local files.

---

I'm happy to contribute a PR for the runner interface abstraction and/or the Pi runner if there's interest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pluggable runner interface — support Pi, Hermes, and other coding agents #18

Motivation

Proposal

Test Data

Additional: Local tracker experiment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Agent	Non-interactive mode	JSON output	Session resume	Worktree
Pi	`pi -p "..."`	✅ `--mode json` (JSONL)	❌	❌
Hermes	`hermes chat -q "..."`	❌ (quiet text)	✅ `--resume`	✅ `--worktree`
OpenCode	via plugin	❌	❌	❌

feat: pluggable runner interface — support Pi, Hermes, and other coding agents #18

Description

Motivation

Proposal

Test Data

Additional: Local tracker experiment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions