subagent-driven workflow bypasses `mship test`; finish loses test-evidence trail

## Observed

Implementing #104 via subagent-driven-development, each subagent ran `.venv/bin/pytest` directly to verify its task. The full feature ran cleanly (1235 tests passing) but `mship finish` warned:

> Test-evidence warnings: Tests not run in: mothership

Because no subagent invoked `mship test`, the `.mothership/test-runs/<task>/` directory stayed empty, the test-iteration counter never incremented, the per-repo evidence chain didn't fill in, and finish had no signal that the tests had actually passed.

## Why this matters

`mship test` is the substrate's evidence-recording entry point. When an agent workflow bypasses it (running raw `pytest` instead), the substrate value disappears: no iteration tagging, no "new failure / fix / regression" diff between runs, no audit trail for finish. The framing "mship is the brain, go-task is the muscle" depends on the muscle being mship's muscle — but the actual implementation tasks were calling pytest, not `mship test`.

This is a workflow-adoption gap. If `mship test` isn't the path of least resistance for agents, they'll skip it — exactly what happened here.

## Why it happened on #104

- The plan (`docs/superpowers/plans/2026-05-14-task-dependency-graph.md`) wrote each TDD verification step as `pytest tests/...` rather than `mship test`. Agents followed the plan literally.
- The implementer-prompt template in `subagent-driven-development` doesn't mention `mship test`.
- The vendored TDD skill doesn't reference `mship test`.

## Suggested direction

A few overlapping options:

1. **Plans generated in a mship workspace default to `mship test` in TDD step commands.** Today writing-plans emits `pytest tests/...`; in a mship workspace, emit `mship test` (or `mship test -- <pytest-args>`). Most impactful change.
2. **Implementer-prompt template (`subagent-driven-development/implementer-prompt.md`) instructs the subagent to prefer `mship test` when a `mothership.yaml` is present.**
3. **`mship test` accepts a pytest passthrough cleanly** so the substitution is trivial in plans (`mship test -- -k myfilter -v`). If this works today, document it.
4. **`mship finish` could detect commit-journal entries with test-pass signals** ("all passing", "N passed") and treat them as a softer form of evidence, with a `--source=journal` annotation. Narrower / opportunistic — but doesn't require workflow changes.

The bigger lever is (1)+(2). (4) is the auto-detect fallback for sessions that still drop through.

## Acceptance

- writing-plans skill emits `mship test` in TDD steps when generating plans in a mship workspace.
- subagent-driven-development's implementer-prompt mentions `mship test` preference.
- `mship test` argument-passthrough syntax is documented (or fixed if it's awkward today).

## Related

- Discovered during #104.
- `src/mship/core/test_evidence.py` (where the warning fires).
- The vendored skills in `src/mship/skills/writing-plans/` and `src/mship/skills/subagent-driven-development/`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subagent-driven workflow bypasses `mship test`; finish loses test-evidence trail #142

Observed

Why this matters

Why it happened on #104

Suggested direction

Acceptance

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

subagent-driven workflow bypasses mship test; finish loses test-evidence trail #142

Description

Observed

Why this matters

Why it happened on #104

Suggested direction

Acceptance

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

subagent-driven workflow bypasses `mship test`; finish loses test-evidence trail #142