Skip to content

mship export --redacted: minimal responsible-defaults redaction on export #102

@atomikpanda

Description

@atomikpanda

Problem

mship export <task> produces a bundle of task artifacts (journal, plan, spec, state, diffs) that users often want to share externally — with coworkers, in bug reports, with AI services, in hiring portfolios, in talks. Today there is no pre-flight redaction, which makes "share this task" an implicit audit of the user's secret hygiene. Concretely, an exported bundle can carry:

  • API keys / bearer tokens pasted into journal messages or hypothesis claims
  • .env-style key=value lines copied into evidence blocks
  • SSH / PGP private keys if accidentally staged then committed before redact
  • GitHub PATs, AWS access keys, Stripe-style sk_live_ keys
  • Plain passwords in quoted error output
  • Internal client / customer names the user doesn't want in a public portfolio link

The current workaround ("grep the bundle before sharing") is manual and error-prone — exactly the kind of thing a substrate should handle at the boundary.

Proposal

Add a minimal --redacted flag to mship export:

mship export <task> --redacted [--format zip|dir] [--include diagnostics]

Redact using documented deterministic regex patterns only (v1 scope — no heuristics, no ML classifier, no entropy-based detection):

  • sk_live_[a-zA-Z0-9]+ / sk_test_[a-zA-Z0-9]+ (Stripe-style)
  • ghp_[A-Za-z0-9]{36} / gho_ / ghu_ / ghs_ (GitHub tokens)
  • AKIA[0-9A-Z]{16} (AWS access key IDs) + nearby aws_secret_access_key values
  • -----BEGIN [A-Z ]+PRIVATE KEY----------END …----- blocks
  • Bearer [A-Za-z0-9._\-]+ in journal lines / evidence
  • .env-style lines matching (?i)(API_KEY|SECRET|PASSWORD|TOKEN|CREDENTIAL)=\S+
  • Optional user-configured list of client/customer name strings (one pattern per line in ~/.config/mship/redact.patterns or mothership.yaml#redact.patterns)

Redaction replaces the match with <REDACTED:kind> (e.g. <REDACTED:github-token>) so the shape of the artifact remains legible.

Scope cuts (explicit anti-goals for v1)

  • No heavyweight classifier. No ML, no trufflehog-style entropy scoring, no AST-aware parsing. Just documented regex patterns.
  • No interactive review. Redaction is deterministic and non-interactive — if you want to review, diff the redacted bundle against the unredacted one yourself.
  • No partial redaction modes. Either --redacted (all documented patterns apply) or not. No --redact github,aws pick-list in v1.
  • No automatic redaction of unflagged exports. Explicit opt-in; we do not want to silently mutate artifacts a user expected to be faithful.
  • No cross-task redaction history / audit log. One export, one pass, done.

Why "minimal responsible-defaults" and not a big redaction engine

The substrate thesis says: mship owns the hand-off boundary; heavyweight semantic analysis belongs in downstream tools. This issue is the smallest useful thing that prevents the common "oops I shared my key" failure, without pretending to be a DLP product. Documented patterns mean users know exactly what is and isn't caught — no false security.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions