Skip to content

Commit 2740836

Browse files
committed
docs: plan docs, watchman FD notes, and MISTAKES lessons
Backfill of planning + lessons-learned docs accumulated over recent sessions: - superpowers/plans: external-validator --cmd, --damage-original/undo, OLE2 deeper parsing, PDF per-stream decrypt (Bug #64), RAW preview-JPEG wiring, ZIP residual-byte coverage. - docs/watchman-fd-exhaustion-notes.md: the watchman over-watch / FD-exhaustion failure mode and remediation. - docs/universal string representation idea.md: design note. - MISTAKES.md: jj+watchman fsmonitor snapshot-drop lesson + commit-gate rules. Docs only; no code change.
1 parent 51c7edf commit 2740836

9 files changed

Lines changed: 2082 additions & 0 deletions

MISTAKES.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,36 @@
11
# MISTAKES.md
22

3+
## 2026-06-02 — jj + stale Watchman fsmonitor silently drops files from commits
4+
5+
**What happened:** While committing the animated-WebP fix, the
6+
`deps/libwebp/build.zig` change (adding demux.c to the build) was on disk
7+
(4876 bytes, demux present) but jj's working-copy `@` kept the OLD content
8+
(4291 bytes). `jj commit <paths>` and `jj squash` both said "Nothing
9+
changed"; `jj file show -r <commit>` confirmed the committed build.zig was
10+
the original. The validator commit thus referenced demux.h that the lib
11+
never built — broken on fresh checkout — and I pushed it before noticing.
12+
13+
**Root cause:** this repo had `fsmonitor.backend = "watchman"` in jj config.
14+
Watchman's view was stale (same Watchman gremlin from the May-30 crisis), so
15+
it never reported `deps/libwebp/build.zig` as changed, and jj trusted
16+
Watchman and skipped snapshotting it — even after `touch` and appending real
17+
bytes. The `.git-old` tracked-then-gitignored flood made `jj status` noisy,
18+
which masked the problem.
19+
20+
**How to apply (the rule):**
21+
1. Proved the file content actually landed in the COMMIT, not just on disk:
22+
`jj file show -r <change> <path> | grep <marker>` (or compare byte sizes
23+
of `jj file show -r @ <path>` vs the on-disk file). A green `nix build`
24+
does NOT prove this — nix reads the working tree (disk), so it builds the
25+
correct bytes even when jj/the commit has the stale ones.
26+
2. If jj refuses to snapshot a known-changed file, run with the fsmonitor
27+
disabled: `jj --config fsmonitor.backend=none status` (forces a direct
28+
filesystem scan). That immediately surfaced the real diff.
29+
3. Fixed permanently for this repo: `jj config set --repo fsmonitor.backend none`.
30+
4. Don't leave large dirs (.git-old) tracked-but-gitignored; untrack them
31+
(`jj file untrack .git-old`) so `jj status` stays readable.
32+
33+
334
A running log of mistakes made while working on `validate`, so future sessions
435
(and future me) don't repeat them. Newest first.
536

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# `--cmd "<external> {}"` — benchmark validate against external tools
2+
3+
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4+
5+
**Goal:** Let the corruption-test-coverage harness invoke an *arbitrary external validator* per corruption trial, so we can benchmark `validate`'s detection rate against third-party tools (ImageMagick `identify`, `jpeginfo`, `ffmpeg -v error`, `qpdf --check`, `pdfinfo`, etc.) using the *exact same* corruption harness and statistics. This produces apples-to-apples comparison tables for marketing ("validate catches X% where jpeginfo catches Y%") and finds gaps where an external tool catches something we miss.
6+
7+
**Why it belongs in the test-coverage subsystem:** the sniper/bolter/shotgun loop already (1) corrupts a copy of a file, (2) runs the in-process validator, (3) records caught/missed, (4) aggregates a rate. `--cmd` swaps step 2 for "spawn external command on the corrupted bytes; exit 0 = missed (validator said OK), non-zero = caught." Everything else — trial count, seeding, Wilson CIs, TSV output — is reused unchanged.
8+
9+
**Semantics:**
10+
- `--cmd "jpeginfo -c {}"``{}` is replaced with the path to the corrupted temp file for each trial.
11+
- **Exit code 0** = external tool reported the (corrupted) file as *valid***MISS** (it failed to catch the corruption).
12+
- **Non-zero exit** = external tool reported an error → **CAUGHT**.
13+
- If `{}` is absent from the command string, the corrupted bytes are piped to the tool's **stdin** (for tools that read `-`).
14+
- A per-trial **timeout** (default 10s, override `--cmd-timeout <sec>`) guards against hangs; a timeout counts as CAUGHT (the tool choked) but is tallied separately and reported, per the "no silent cap" rule.
15+
16+
**Architecture (per Peter's standing rule — logic in Zig, through the FFI):**
17+
```
18+
C CLI (--cmd parse) ──► FFI entry ──► Zig core test_coverage
19+
└─ corrupt copy → spawn child → map exit code → tally
20+
```
21+
The Zig core owns child-process spawning (`std.process.Child`), the `{}` substitution, stdin piping, and the timeout. The C CLI only parses the flag and passes the command string + timeout through the FFI.
22+
23+
**Tech Stack:** Zig 0.15.2 (`std.process.Child` — note PascalCase `.Pipe`/`.Inherit`/`.Ignore` per ZIG_RECENT_API_CHANGES.md). Existing `src/core/test_coverage.zig` corruption loop. FFI in `ffi/c_api.zig`, header `ffi/validate_core.h`, CLI `cli/main.c`.
24+
25+
---
26+
27+
## File Structure
28+
29+
**Will be modified:**
30+
- `src/core/test_coverage.zig` — add an "external command" validation backend alongside the in-process one; substitution + spawn + timeout + exit-code mapping.
31+
- `ffi/c_api.zig` — extend the test-coverage FFI entry with `cmd` (nullable C string) + `cmd_timeout_sec` (u32) params, OR add a sibling `validate_test_coverage_cmd` entry.
32+
- `ffi/validate_core.h` — declare the new param(s)/entry.
33+
- `cli/main.c` — parse `--cmd "<str>"` and `--cmd-timeout <sec>`; route to the FFI.
34+
- `docs/` — a short "benchmarking against external tools" doc + example invocations.
35+
36+
**Will be created:**
37+
- `tests/cli/cmd_external_validator` — Bash CLI test (part of `./test`).
38+
39+
---
40+
41+
## Phase 1 — Zig core: external-command backend
42+
43+
### Task 1.1 — Failing test: substitution + exit-code mapping (pure, no spawn)
44+
45+
The `{}` substitution and exit→verdict mapping are pure functions — test them without spawning anything.
46+
47+
- [ ] **Step 1 (failing test):** In `test_coverage.zig` test block, assert `substituteCmd("jpeginfo -c {}", "/tmp/x")``{"jpeginfo", "-c", "/tmp/x"}` (argv split). Assert `substituteCmd("foo bar", path)` (no `{}`) signals "pipe to stdin". Assert `verdictFromExit(0) == .missed`, `verdictFromExit(1) == .caught`.
48+
- [ ] **Step 2:** Implement `substituteCmd` (split on spaces — quoting handled in Task 1.4) and `verdictFromExit`. Tests pass; commit.
49+
50+
### Task 1.2 — Failing test: spawn a real, deterministic child
51+
52+
Use `/usr/bin/false` (always exit 1 = CAUGHT) and `/usr/bin/true` (always exit 0 = MISS) as deterministic stand-ins — no external tool dependency, fully reproducible.
53+
54+
- [ ] **Step 1 (failing test):** A coverage run with `cmd = "true {}"` against a corrupted file must record **0% detection** (true always says "valid" → every trial missed). A run with `cmd = "false {}"` must record **100% detection**.
55+
- [ ] **Step 2:** Implement the spawn path: write the corrupted bytes to a temp file (RAM-backed `$TMPDIR`), substitute `{}`, spawn via `std.process.Child`, capture exit code, map to verdict. Reuse the existing corruption + tally loop. Tests pass; commit.
56+
57+
### Task 1.3 — Timeout handling
58+
59+
- [ ] **Step 1 (failing test):** `cmd = "sleep 30"` with `cmd_timeout_sec = 1` must terminate the child and count it as CAUGHT-via-timeout, tallied separately. (Test must NOT actually wait 30s — the timeout fires at ~1s. This is condition-based, not a sleep hack: we're asserting the timeout *mechanism* kills the child, using a child that would otherwise outlive it.)
60+
- [ ] **Step 2:** Implement timeout: spawn, poll for completion with a deadline, kill on expiry (`child.kill()`), record timeout tally. Tests pass; commit.
61+
62+
### Task 1.4 — stdin piping + quoted-arg handling
63+
64+
- [ ] **Step 1 (failing test):** `cmd = "cat"` (no `{}`) must pipe corrupted bytes to stdin; with a trivial wrapper that exits non-zero on certain content, assert the verdict maps correctly. Also assert a quoted arg with spaces (`--label "a b"`) survives argv splitting.
65+
- [ ] **Step 2:** Implement stdin piping (`.stdin_behavior = .Pipe`, write bytes, close) when `{}` absent; implement minimal shell-like quote handling in `substituteCmd` (respect `"..."` so paths/labels with spaces work — Peter's CLI convention). Tests pass; commit.
66+
67+
---
68+
69+
## Phase 2 — FFI surface
70+
71+
### Task 2.1 — Extend the FFI entry
72+
73+
- [ ] **Step 1:** Decide: add params to the existing `validate_test_coverage` entry vs. a sibling `validate_test_coverage_cmd`. Prefer extending (nullable `cmd` = NULL means "use in-process validator"; non-NULL = external).
74+
- [ ] **Step 2 (failing test):** A C-level FFI test (or Zig test calling through the FFI boundary) passing `cmd = "false {}"` returns a 100% detection result struct.
75+
- [ ] **Step 3:** Implement; declare in `ffi/validate_core.h` (`const char *cmd`, `uint32_t cmd_timeout_sec`). Build clean. Commit.
76+
77+
---
78+
79+
## Phase 3 — C CLI parsing
80+
81+
### Task 3.1 — Parse `--cmd` / `--cmd-timeout`
82+
83+
- [ ] **Step 1 (failing CLI test):** `tests/cli/cmd_external_validator` asserts:
84+
- `validate --test-coverage sniper --cmd "false {}" --count 20 <file>` reports ~100% detection.
85+
- `validate --test-coverage sniper --cmd "true {}" --count 20 <file>` reports ~0% detection.
86+
- `--cmd-timeout 1 --cmd "sleep 30"` returns promptly (well under 30s) with timeout-tallied results.
87+
- `--cmd "tool with \"spaced arg\" {}"` parses without error.
88+
- [ ] **Step 2:** Parse the flags in `cli/main.c` (non-positional, any-order, later-overrides-earlier per CLI conventions); route through the FFI. Emit the external-tool name in the stderr summary header.
89+
- [ ] **Step 3:** Add the test to the `./test` runner. Tests pass; commit.
90+
91+
### Task 3.2 — Help + JSON output
92+
93+
- [ ] **Step 1:** Add `--cmd` and `--cmd-timeout` to `-h`/`--help`.
94+
- [ ] **Step 2:** Ensure the comparison result (detection rate, timeout count, trial count, tool name) is available in the existing `--json` output mode so it pipes into tooling.
95+
- [ ] **Step 3:** Commit.
96+
97+
---
98+
99+
## Phase 4 — Benchmark doc + example matrix
100+
101+
- [ ] **Step 1:** Write `docs/benchmarking-against-external-tools.md` with copy-paste invocations comparing validate vs. jpeginfo (JPEG), qpdf --check (PDF), ffmpeg -v error (video). Use `--seed` for reproducibility.
102+
- [ ] **Step 2:** (Optional, not in `./test`) a `bm`-style script that runs validate-vs-external across a few formats and emits a comparison table. Document that these need the external tools installed (note them in `flake.nix` if we want CI to have them).
103+
- [ ] **Step 3:** Commit.
104+
105+
---
106+
107+
## Out of scope (parked)
108+
109+
- **Parallel trials across cores** for `--cmd` runs (external spawn is slower than in-process; could parallelize later). Keep sequential first for deterministic tallying.
110+
- **Capturing/diffing external tool stderr** for richer reports — exit code is the contract for now.
111+
- **GUI exposure** — Peter: CLI-only feature.
112+
113+
---
114+
115+
## Risk + rollback
116+
117+
| Risk | Mitigation |
118+
|---|---|
119+
| External tool hangs | Per-trial timeout (default 10s) kills the child; tallied separately and reported (no silent cap). |
120+
| Shell-injection via `--cmd` | We do NOT invoke a shell — argv is split in Zig and passed directly to `execve`-equivalent. `{}` is replaced with a path we control. Document that `--cmd` runs the user's own command with their privileges (same trust as any CLI they type). |
121+
| Temp-file churn | Corrupted bytes written to RAM-backed `$TMPDIR`; cleaned each trial. |
122+
| Quoting edge cases | Task 1.4 + 3.1 test spaced/quoted args explicitly. |
123+
124+
**Rollback path:** the external backend is additive (NULL `cmd` = old in-process behavior). Reverting the C-CLI parse alone disables the feature without touching the core.
125+
126+
---
127+
128+
## Self-review checklist
129+
130+
- [x] Logic in Zig core, routed through FFI, C CLI only parses — matches Peter's architecture rule.
131+
- [x] Exit-code contract explicit (0 = miss, non-zero = caught; timeout = caught-tallied-separately).
132+
- [x] Deterministic tests use `/usr/bin/true` / `false` / `sleep` — no dependency on external validators being installed, no sleep-as-timing-hack (timeout test asserts the mechanism).
133+
- [x] No shell invocation — argv split in Zig (injection-safe).
134+
- [x] Quoted/spaced args tested per CLI conventions.
135+
- [x] JSON output + help updated.
136+
- [x] CLI test added to `./test`.
137+
138+
---
139+
140+
**Reproduce + verify:**
141+
```bash
142+
./build
143+
nix develop -c zig build test -- --test-filter "cmd"
144+
tests/cli/cmd_external_validator
145+
zig-out/bin/validate --test-coverage sniper --cmd "false {}" --count 20 ground_truth_examples/jpeg/*.jpg
146+
# Expected: ~100% (false always exits non-zero = always "caught")
147+
```

0 commit comments

Comments
 (0)