What happened: A dep (sqlite3) drifted (pinned to a moving heads/main
branch; upstream 3.51.0->3.53.2), breaking Garnix on every platform. I pinned
it to a commit SHA + updated build.zig.zon .hash, ran
nix build .#checks.aarch64-darwin.test -> GREEN, pushed. Garnix STILL failed
on every platform with the same dep-tree FOD hash mismatch. Root cause: the
flake.nix zigDepsHash (whole-dep-tree fixed-output-derivation hash) ALSO
changes when a dep moves, but my machine reused the CACHED FOD output, so the
local build never re-derived it and never saw the mismatch. Garnix, building
clean, did. Cost a whole extra fix+push+CI cycle.
How to apply (the rule): When ANY Zig dependency changes (build.zig.zon
url/hash), the flake.nix zigDepsHash is almost certainly stale too. Do NOT
trust a local nix build green — it can be cache-masked. Force the FOD to
re-derive: set zigDepsHash to sha256-AAAA...A= (fakeHash), nix build,
copy the printed got: hash back in, rebuild. Commit build.zig.zon AND
flake.nix together. (Same class of trap as the jj/watchman "local snapshot
masks reality" bug: a cached local layer hid the true state.)
Bonus rule: never pin a dep to refs/heads/main|master (a moving branch)
— it silently drifts and breaks CI later. Pin to an immutable commit SHA or
tag. (zlib + openmpt in this repo still violate this — flagged for follow-up.)
What happened: While committing the animated-WebP fix, the
deps/libwebp/build.zig change (adding demux.c to the build) was on disk
(4876 bytes, demux present) but jj's working-copy @ kept the OLD content
(4291 bytes). jj commit <paths> and jj squash both said "Nothing
changed"; jj file show -r <commit> confirmed the committed build.zig was
the original. The validator commit thus referenced demux.h that the lib
never built — broken on fresh checkout — and I pushed it before noticing.
Root cause: this repo had fsmonitor.backend = "watchman" in jj config.
Watchman's view was stale (same Watchman gremlin from the May-30 crisis), so
it never reported deps/libwebp/build.zig as changed, and jj trusted
Watchman and skipped snapshotting it — even after touch and appending real
bytes. The .git-old tracked-then-gitignored flood made jj status noisy,
which masked the problem.
How to apply (the rule):
- Proved the file content actually landed in the COMMIT, not just on disk:
jj file show -r <change> <path> | grep <marker>(or compare byte sizes ofjj file show -r @ <path>vs the on-disk file). A greennix builddoes NOT prove this — nix reads the working tree (disk), so it builds the correct bytes even when jj/the commit has the stale ones. - If jj refuses to snapshot a known-changed file, run with the fsmonitor
disabled:
jj --config fsmonitor.backend=none status(forces a direct filesystem scan). That immediately surfaced the real diff. - Fixed permanently for this repo:
jj config set --repo fsmonitor.backend none. - Don't leave large dirs (.git-old) tracked-but-gitignored; untrack them
(
jj file untrack .git-old) sojj statusstays readable.
A running log of mistakes made while working on validate, so future sessions
(and future me) don't repeat them. Newest first.
What happened: While wiring V=5 AES-256 decryption into the PDF font and
image deep validators, I edited the files with fragile multi-substitution perl
(a heredoc with an unbalanced } terminator for the font file; a perl with 9
chained s/// for the image file where only 3 matched). Both silently
corrupted the source — the font file was emptied to 0 lines, the image file
truncated to 13 — yet git commit returned exit 0 for both, because git
will happily commit a broken/empty file. I saw commit=0, assumed success,
and moved on. The breakage only surfaced on the next nix build:
pdf_font_validator has no member validatePdfFonts,
pdf_image_validator has no member ImageValidationResult.
Why it matters: a green commit=0 means git recorded the change, NOT
the change is correct or even compiles. Trusting it shipped two
non-compiling commits to yolo.
How to apply (the rule):
- Gate EVERY commit on a green build first. Run
nix build .#checks.<system>.test(e.g.aarch64-darwin) and confirm exit 0 beforegit add/git commit. If the build fails, do not commit. - For TDD steps, also red-proof: break the new assertion, confirm the build FAILS, restore, confirm it passes — then commit.
- Prefer
python3exact-string replace with acount == 1assertion over multi-substitution perl. The assertion aborts the write if the anchor isn't uniquely present, so a bad anchor leaves the file untouched instead of silently mangling it. One edit → one build → one commit. - Avoid bash heredocs embedded in perl
-e; quoting/terminator errors there fail in ways that still produce output and a zero-ish exit.
Recovery that worked: git reset --hard <last-known-good-commit> (the two
broken commits were the two HEAD commits, nothing good above them), verify the
build is green at that commit, then redo each edit with the assert-gated
exact-match approach above, building green before each commit.
- The codescan Read hook intercepts
Readon.zigfiles (and even some/tmppaths) — copy to a.txt/.viewname or usecat/sedvia Bash to read source when the MCP read is inconvenient. - codescan
replace_linesrequires a FRESHread_fileimmediately before the write; any intervening edit invalidates the version hash and the write is (correctly) rejected as stale. Read → write back-to-back, no batching. - Bash tool output intermittently drops when several calls are batched in one
message — use sequential single calls and write results to
/tmpfiles when it matters.
- Wrote a PCRE non-capturing group
(?:...)inside a Lua pattern (zig_catalog.lua). Lua patterns are not regex; escaped-quote string scanning needs a manual walker. Caught by the first red test run. - Byte-truncated a localized string for a meta description (
s:sub(1,120)) — would have emitted invalid UTF-8 on ja/zh pages. Caught in review before ship. Rule: never byte-slice translated text; pass full strings and let consumers truncate at display time. - Misused
capture(dotfiles capture.bash): it populatesout/err/rcand requires them declared in caller scope — notSTDOUT/RETURN_CODE. Read the helper's header before first use.
- Empty-FOD-from-broken-sandbox masquerades as a "platform-divergent hash."
framework-nixos's nix sandbox couldn't fetch, so the zigDeps FOD produced
an EMPTY
p/o/tmptree — whose sha256 is stable and real-looking, sonix buildkept reporting it as "linux's hash" ≠ darwin's. No real divergence existed. Tell: the suspiciousgot:hash equalsmktemp -d; mkdir p o tmp; nix hash path --sri. Trust darwin/Garnix (working sandboxes) for FOD hashes; distrust framework-nixos-sourced ones. - One "test SEGV" was FIVE bugs, each masking the next. Once the compiler stopped crashing (use_llvm), real errors surfaced one at a time. Re-run after each fix; read the NEW top error, don't assume one symptom = one cause.
- Zig 0.16 self-hosted x86_64 Debug backend SEGVs on large test binaries →
compile.use_llvm = trueon the test step, gated by a comptimezig_versiontripwire that@compileErrors on >0.16 so the workaround self-expires. - pthread stack minimum is TLS-inflated. ~827KB static TLS (libjxl/libvpx/
openmpt) lives inside each thread stack →
Thread.spawn(.stack_size=256KB)EINVALs → Zigunreachable→ abort. Measurereadelf -lW <bin> | grep TLS. - Duplicate module from transitive+direct shared dep (validate + tiffz both
b.dependency("jpegz")) → Zig 0.16file exists in modules 'jpegz'/'jpegz0', sandbox SEGVs. Fix: one owner re-exports (tiffz pub const jpegz), consumers reach it transitively. Single instance, no dual-pin drift. - jpegz
linkSystemLibrary("jpeg"/"openjp2")is unconditional → blocks mingw-staticcross. Real fix = Zig-vendor the C libs (option A), not nix static-mingw overrides.
Typed rg -rn "pat" file out of grep muscle memory. In ripgrep -r is
--replace REPLACEMENT (NOT recursive — rg recurses by default), so -rn
parsed as --replace=n: every match was substituted with the literal "n".
Spent real cycles misattributing this to codescan, then to Claude Code's
Bash parser, before a hexdump of redirected output proved rg itself wrote
the "n" — and rg --help showed -r REPLACEMENT. Lesson: for ripgrep use
rg -n (or --no-filename/-l etc.); NEVER -r unless you actually want
match replacement. When output looks corrupted, hexdump the bytes before
blaming a tool — and check your own flags first.