Document a cleanroom-derivable PAR2 file format specification and algorithm (implementation-ready, no code).
- PAR2_SPECIFICATION.md describes the full PAR2 packet format (core + optional), data conventions, and recovery algorithm at implementation detail.
- PAR2_SPECIFICATION.md includes explicit recovery math and slice ordering rules from primary sources.
- Cleanroom approach and sources are recorded here.
- flake.nix provides Zig toolchain, fuzzing tools, and par2cmdline for compatibility tests.
- Licensing notes recorded for MD5 and par2cmdline dev tooling vs distribution.
- TOOLCHAIN.md created with build presets and safe C++ subset rules.
- PROJECT_PLAN.md created with TDD implementation steps.
- Use only published specifications and public documentation; do not read or rely on any implementation source code.
- Record all sources and dates accessed.
- Keep derived text as a paraphrase (no verbatim copy beyond short, necessary excerpts).
- Parity Volume Set Specification 2.0 (Parchive, 2003-05-11) official spec (SourceForge) — accessed 2025-12-24.
- Bilingual mirror of spec (for optional packet sections) — accessed 2025-12-24.
- Parchive project site (context, reference implementation note) — accessed 2025-12-24.
- Library of Congress format description (format context) — accessed 2025-12-24.
- MD5 licensing note: implementation now uses Zig stdlib (
std.crypto.hash.Md5), so no RFC 1321 code is shipped; keep attribution note only if a standalone RFC 1321 implementation is added later. - par2cmdline is GPL; confirmed test-only usage (not shipped), not linkable for Mac App Store distribution.
- 2025-12-24: Collected official spec metadata and a mirror for detailed packet/algorithm content.
- 2025-12-24: Draft PAR2_SPECIFICATION.md.
- 2025-12-24: Add flake.nix safety toolchain and par2cmdline for compatibility testing.
- 2025-12-24: Draft TOOLCHAIN.md with safety policy and build presets.
- 2025-12-24: Draft PROJECT_PLAN.md with TDD implementation steps.
- 2026-01-01: Added README archival-use guidance section (non-normative).
- CLI recover command uses core recovery API and writes recovered output to disk or stdout.
- CLI tar streaming (
--tar) for create/recover with tests. - File-backed store adapter for streaming disk access.
- Full-file recovery integration test with larger fixture vs par2.
- CLI tests using Zig 0.15 process API or bash harness.
- LuaJIT CLI wrapper (FFI against C ABI) with bash integration test.
- Optional packet support: parse FileSlic/RFSC/PkdMain/PkdRecvS; emit FileSlic (flag) and PkdMain/PkdRecvS (flag).
- Expanded integration interoperability tests (multi-file, volume-only, no-RFSC, seeded data).
- Streaming encode for file-backed store (avoid loading all slices in memory).
- Corrupt PAR2 recovery data + data slices and verify recovery with par2cmdline and par2z.
- Add LuaJIT to dev shell for LuaJIT CLI integration tests.
- Entropy Shield: add C API parity blob inputs for verify/recover (multiple in-memory par2 files).
- Entropy Shield: expand error codes (parity missing file, parity corrupt) and map source-missing vs parity-missing.
- Entropy Shield: update verify/recover stream APIs + tests for multi-blob parity inputs.
- Entropy Shield: document any global/shared state for concurrency expectations.
- Entropy Shield: add ESMd metadata packet (create + parse + C API + tests + spec).
- Entropy Shield: investigate C API create memory growth and report (looped create/destroy + output callbacks).
- Entropy Shield: add no-leak regression test for C API create (paths/memory + output callback) if needed.
- Entropy Shield: report peak RSS per run for synthetic loop (if no leak found).
- Empirically verify par2cmdline-turbo flag behavior (no source code).
- Implement behavior for
-B(basepath),-R(recurse),-m(memory),-v/-q(verbosity). - Implement recovery file splitting flags:
-u,-l,-n, and-f(first recovery block).
-Bstores relative paths and is required for verify/repair to search basepath; files outside basepath are ignored with a warning (error if none remain).-Ris create-only; verify/repair reject it.-u(uniform) evens recovery blocks across files; can combine with-n.-nsplits evenly acrossnvolumes;-lis incompatible with-n;-uincompatible with-l.-foffsets recovery block indices and volume names (e.g.-f5starts atvol05+...).
- Verify packet hash before parsing packets (skip invalid hash).
- Enforce single recovery_set_id when loading main + volume files.
- Guard against GF16 exponent exhaustion (TooManySlices).
- Add overflow checks for recovery block planning and slice count.
- Free temporary buffers in core APIs for long-lived clients.
- Remove 1 GiB cap in par2 file load (read exact file size).
-
par2z-cli verifymaps inputs by FileDesc name (order-independent) with CLI tests for reversed input order. (src/cli.zig,tests/tests.zig) - Buffer FileDesc/IFSC packets received before Main; attach after Main/PkdMain. (
src/core/api.zig,tests/tests.zig) - Accept space-separated short flags (
-s 4096,-r 10, etc.) in create parsing. (src/cli.zig,tests/tests.zig) - Sanitize absolute paths in FileDesc by storing basename; verify on-disk packets. (
src/cli.zig,tests/tests.zig) - Detect basename ambiguity; require exact path matches to disambiguate. (
src/cli.zig,tests/tests.zig)
- Ignore duplicate Main packets to avoid resetting attached FileDesc/IFSC when volume files also contain Main. (
src/core/api.zig,tests/tests.zig) - memtest output label matches units (bytes). (
memtest)
- Optional platform-specific SIMD intrinsics (x86_64 SSE2/AVX2, ARM NEON) behind target checks; keep portable SIMD + scalar fallback as default.
- Replace platform-specific MD5 bindings with
std.crypto.hash.Md5(pure Zig, portable); deletesrc/core/md5_macos.zigandsrc/core/md5_linux.zigafter migration. - Optimize GF16 mul/pow to avoid
% 65535(conditional subtract or doubled LUT). - CRC32: replace bit-loop with 256-entry lookup table.
- Make
isMissingIndexO(1) (hash set or bitmap) in recovery hot path. - Remove per-slice
page_allocatorin RS hot loops; accept scratch allocator/buffer or use arena reset per batch. - Use a persistent
std.Thread.Poolinstead of per-chunk thread spawn/join.
- Split
src/ops.zigintocreate.zig,verify.zig,recover.zig,common.zig. - Consolidate duplicated
verify*StoreandcomputeRecoverySlices*functions (generic/store interface). - Normalize error naming across modules for validation failures.
- Either remove
checked.zigor standardize on checked wrappers across codebase. - Reduce temp allocations in
findMismatchedSlices(two-pass or exact-size allocation). - Remove empty
src/ffi/dir or implement it (decide).
- Add tests for
LimitedAllocatoredge cases (cap exhaustion, resize). - Add direct tests for
transliterateAscii/mapLatin1. - Add edge-case tests for
volumePathandvolumeIndexWidth. - Add tests for error paths in streaming ops (
recoverStreams,verifyStreams). - Add tests for C API error messages (
par2_*_last_error). - Add thread-safety tests for concurrent volume building.
- Remove or relocate
data.binif it’s a stray artifact (confirm intended use). - Simplify repeated path-building helpers into shared util.
- Reduce verbose
whileloops / redundant casts where safe.
Support true streaming inputs/outputs (no temp file spooling), suitable for SQLite-backed storage or in-memory pipelines.
- Forward-only output is supported; no requirement for random access.
- RFSC emission in streaming mode:
- Buffer the first 16 KiB of each output stream.
- Emit RFSC after 16 KiB is available (or skip if total output < 16 KiB).
- If output supports random access, optional in-place patching is allowed but not required.
- Streaming inputs are modeled as logical files: name + length + read-at callback.
- Streaming outputs are modeled as per-file outputs: open(path) → writer/close.
- Define stream interfaces in core/ops (InputFileStream, OutputStreamOpener) with strict bounds/overflow checks.
- Implement streaming create for main file (emit packets directly to OutputStream without temp files).
- Implement streaming volume emit with buffered RFSC (16 KiB) and late emission.
- Implement streaming recover output (write recovered slices to OutputStream).
- Implement streaming verify path (read-at without file paths).
- Add tests for streaming create/recover/verify with in-memory sinks (small fixtures).
- Add SQLite adapter example (in docs/tests) showing zero-disk usage.
Expose a stable C API with separate handles for create/verify/recover, supporting file paths and in-memory/streaming inputs, optional memory caps, configurable threading, and last-error strings.
-
include/par2.hdocuments the C ABI: handles, options, callbacks, error codes. -
src/lib.zigimplements C ABI with separate handles (create/verify/recover). - Supports file-path inputs and memory/streaming inputs (read-at callback).
- Recover can write to file path (default: directory of par2 file) or write callback.
- Optional memory cap and optional custom allocator callbacks.
- Threading configurable (0 = all cores).
- TDD: add unit tests for C API behaviors (memory input + verify + recover happy path).
Six findings dropped in inbox/. All verified against source. Fixed:
- Error handling —
writeTarHeadersilent tar corruption for >8 GiB files (bufPrint ... catch {}→ undefined memory). Extracted testablecore.tar(writeOctalField/buildHeader) returningerror.FileTooLargeForTar. (commit) - Duplicated code — two hand-rolled
writeU64Le→std.mem.writeInt. (commit) - Algorithmic complexity —
recovery_setattach was O(files²) on the verify/recover path →AutoHashMap(file_id → *FileEntry), O(1) attach. (commit) - Inadequate tests — added path-traversal security test for
hasTraversalSegment(made itpub). (commit)
Done (2026-06-03):
- Futile test coverage — moved gf16/crc32 benchmark loops out of
testblocks intopub runBenchmarks/runBenchmarkfns +src/tools/microbench.zig, runnable viazig build bench-microor./bm; fixed the Zig 0.16std.time.nanoTimestampremoval (std.Io.Timestamp.now(io, .awake)). Also discovered the core inline tests were dormant (tests/tests.zigis a separate module, so_ = core.xcan't pull them in) — added atest {}block incore/mod.zig+ atest-corebuild target gated into./test. 14 kernel correctness/parity tests (gf16 SIMD, crc32, packet_types) now run in CI. Follow-up (done 2026-06-03): swept the remaining dormant inline tests too — addedtest-ops(16: outputPath path-safety, transliterateAscii, volumePath) andtest-cli(21: arg parsing) build targets viatest {}blocks inops.zig, all gated into./test. All 51 previously-dormant tests are 0.16-clean and pass; none needed commenting out../testnow runs 210 tests across 4 binaries (was 159). No core modules besides gf16/crc32/packet_types carry inline tests. - Suboptimal/disorganized — merged
buildVolume/buildVolumeStreamtwins into onestore: anytypefn (comptime store dispatch); extractedderiveCreatePlan/printCreateDefaults/appendMainPacketsshared bycreateandcreateStreams(removed ~100 lines of verbatim duplication).create288->197,createStreams307->216;create.zig1396->1168. Dropped a deadmax_file_lenaccumulator. Behavior unchanged (suite + CLI roundtrips green).