Skip to content

subagent-driven workflow bypasses mship test; finish loses test-evidence trail #142

@atomikpanda

Description

@atomikpanda

Observed

Implementing #104 via subagent-driven-development, each subagent ran .venv/bin/pytest directly to verify its task. The full feature ran cleanly (1235 tests passing) but mship finish warned:

Test-evidence warnings: Tests not run in: mothership

Because no subagent invoked mship test, the .mothership/test-runs/<task>/ directory stayed empty, the test-iteration counter never incremented, the per-repo evidence chain didn't fill in, and finish had no signal that the tests had actually passed.

Why this matters

mship test is the substrate's evidence-recording entry point. When an agent workflow bypasses it (running raw pytest instead), the substrate value disappears: no iteration tagging, no "new failure / fix / regression" diff between runs, no audit trail for finish. The framing "mship is the brain, go-task is the muscle" depends on the muscle being mship's muscle — but the actual implementation tasks were calling pytest, not mship test.

This is a workflow-adoption gap. If mship test isn't the path of least resistance for agents, they'll skip it — exactly what happened here.

Why it happened on #104

  • The plan (docs/superpowers/plans/2026-05-14-task-dependency-graph.md) wrote each TDD verification step as pytest tests/... rather than mship test. Agents followed the plan literally.
  • The implementer-prompt template in subagent-driven-development doesn't mention mship test.
  • The vendored TDD skill doesn't reference mship test.

Suggested direction

A few overlapping options:

  1. Plans generated in a mship workspace default to mship test in TDD step commands. Today writing-plans emits pytest tests/...; in a mship workspace, emit mship test (or mship test -- <pytest-args>). Most impactful change.
  2. Implementer-prompt template (subagent-driven-development/implementer-prompt.md) instructs the subagent to prefer mship test when a mothership.yaml is present.
  3. mship test accepts a pytest passthrough cleanly so the substitution is trivial in plans (mship test -- -k myfilter -v). If this works today, document it.
  4. mship finish could detect commit-journal entries with test-pass signals ("all passing", "N passed") and treat them as a softer form of evidence, with a --source=journal annotation. Narrower / opportunistic — but doesn't require workflow changes.

The bigger lever is (1)+(2). (4) is the auto-detect fallback for sessions that still drop through.

Acceptance

  • writing-plans skill emits mship test in TDD steps when generating plans in a mship workspace.
  • subagent-driven-development's implementer-prompt mentions mship test preference.
  • mship test argument-passthrough syntax is documented (or fixed if it's awkward today).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions