Observed
Implementing #104 via subagent-driven-development, each subagent ran .venv/bin/pytest directly to verify its task. The full feature ran cleanly (1235 tests passing) but mship finish warned:
Test-evidence warnings: Tests not run in: mothership
Because no subagent invoked mship test, the .mothership/test-runs/<task>/ directory stayed empty, the test-iteration counter never incremented, the per-repo evidence chain didn't fill in, and finish had no signal that the tests had actually passed.
Why this matters
mship test is the substrate's evidence-recording entry point. When an agent workflow bypasses it (running raw pytest instead), the substrate value disappears: no iteration tagging, no "new failure / fix / regression" diff between runs, no audit trail for finish. The framing "mship is the brain, go-task is the muscle" depends on the muscle being mship's muscle — but the actual implementation tasks were calling pytest, not mship test.
This is a workflow-adoption gap. If mship test isn't the path of least resistance for agents, they'll skip it — exactly what happened here.
Why it happened on #104
- The plan (
docs/superpowers/plans/2026-05-14-task-dependency-graph.md) wrote each TDD verification step as pytest tests/... rather than mship test. Agents followed the plan literally.
- The implementer-prompt template in
subagent-driven-development doesn't mention mship test.
- The vendored TDD skill doesn't reference
mship test.
Suggested direction
A few overlapping options:
- Plans generated in a mship workspace default to
mship test in TDD step commands. Today writing-plans emits pytest tests/...; in a mship workspace, emit mship test (or mship test -- <pytest-args>). Most impactful change.
- Implementer-prompt template (
subagent-driven-development/implementer-prompt.md) instructs the subagent to prefer mship test when a mothership.yaml is present.
mship test accepts a pytest passthrough cleanly so the substitution is trivial in plans (mship test -- -k myfilter -v). If this works today, document it.
mship finish could detect commit-journal entries with test-pass signals ("all passing", "N passed") and treat them as a softer form of evidence, with a --source=journal annotation. Narrower / opportunistic — but doesn't require workflow changes.
The bigger lever is (1)+(2). (4) is the auto-detect fallback for sessions that still drop through.
Acceptance
- writing-plans skill emits
mship test in TDD steps when generating plans in a mship workspace.
- subagent-driven-development's implementer-prompt mentions
mship test preference.
mship test argument-passthrough syntax is documented (or fixed if it's awkward today).
Related
Observed
Implementing #104 via subagent-driven-development, each subagent ran
.venv/bin/pytestdirectly to verify its task. The full feature ran cleanly (1235 tests passing) butmship finishwarned:Because no subagent invoked
mship test, the.mothership/test-runs/<task>/directory stayed empty, the test-iteration counter never incremented, the per-repo evidence chain didn't fill in, and finish had no signal that the tests had actually passed.Why this matters
mship testis the substrate's evidence-recording entry point. When an agent workflow bypasses it (running rawpytestinstead), the substrate value disappears: no iteration tagging, no "new failure / fix / regression" diff between runs, no audit trail for finish. The framing "mship is the brain, go-task is the muscle" depends on the muscle being mship's muscle — but the actual implementation tasks were calling pytest, notmship test.This is a workflow-adoption gap. If
mship testisn't the path of least resistance for agents, they'll skip it — exactly what happened here.Why it happened on #104
docs/superpowers/plans/2026-05-14-task-dependency-graph.md) wrote each TDD verification step aspytest tests/...rather thanmship test. Agents followed the plan literally.subagent-driven-developmentdoesn't mentionmship test.mship test.Suggested direction
A few overlapping options:
mship testin TDD step commands. Today writing-plans emitspytest tests/...; in a mship workspace, emitmship test(ormship test -- <pytest-args>). Most impactful change.subagent-driven-development/implementer-prompt.md) instructs the subagent to prefermship testwhen amothership.yamlis present.mship testaccepts a pytest passthrough cleanly so the substitution is trivial in plans (mship test -- -k myfilter -v). If this works today, document it.mship finishcould detect commit-journal entries with test-pass signals ("all passing", "N passed") and treat them as a softer form of evidence, with a--source=journalannotation. Narrower / opportunistic — but doesn't require workflow changes.The bigger lever is (1)+(2). (4) is the auto-detect fallback for sessions that still drop through.
Acceptance
mship testin TDD steps when generating plans in a mship workspace.mship testpreference.mship testargument-passthrough syntax is documented (or fixed if it's awkward today).Related
src/mship/core/test_evidence.py(where the warning fires).src/mship/skills/writing-plans/andsrc/mship/skills/subagent-driven-development/.