Skip to content

feat: migrate AOT-side Pauli mask storage to runtime-width arenas#52

Open
bachase wants to merge 7 commits intomainfrom
feat/issue-45-pr2-runtime-qubits
Open

feat: migrate AOT-side Pauli mask storage to runtime-width arenas#52
bachase wants to merge 7 commits intomainfrom
feat/issue-45-pr2-runtime-qubits

Conversation

@bachase
Copy link
Copy Markdown
Contributor

@bachase bachase commented May 2, 2026

Summary

PR2 of the issue #45 staged migration. Replaces fixed-width BitMask<N> Pauli storage in HIR, NoiseChannel, and ConstantPool with runtime-width PauliMaskArenas, and shrinks HeisenbergOp to a fixed 16 bytes. The frontend writes stim PauliString rows directly into arena slots via stim_to_mask_view, so qubits beyond kMaxInlineQubits round-trip without truncation.

This is a clean rewrite of the work in #50, organized so each commit is correct on landing — no patches-on-patches. #50 will be closed in favor of this.

Commit summary

  1. chore: document arena resize invalidation — comment in pauli_arena.h.
  2. feat(util): tighten copy_from semantics, add MaskBuf test helpercopy_from throws (not asserts) on truncation; production same-size case has zero overhead. 1-word MaskBuf test helper.
  3. feat: migrate Pauli mask storage to runtime-width arenas — the big one. HirModule + ConstantPool arenas, NoiseChannel becomes {handle, prob}, frontend writes stim → arena directly with no PauliBitMask intermediate, optimizer/backend/SVM read masks via arena, HeisenbergOp shrinks to 16 B. Includes high-qubit regression tests at n = 70, 150, 200, 513.
  4. feat(frontend): reject circuits above the VM axis ceiling earlytrace() rejects n > 65536 so we don't allocate a giant Stim TableauSimulator before lower() would reject anyway.
  5. fix(python): keep HirModule alive through HeisenbergOp wrappersPyHeisenbergOp holds nb::object module_owner; both __getitem__ and __iter__ capture nb::borrow(self). Includes Python regression tests for both.
  6. chore(optimizer): clear noise_channel_masks in RemoveNoisePass — replace the arena with an empty one after stripping noise.
  7. test: add surface d=11 r=11 benchmark above the old 128-qubit capn ≈ 274, exercises the regime the migration unlocks.

Test plan

  • uv tool run pre-commit run --all-files
  • ctest --test-dir build/cmake -E Bench --timeout 60 — 691 cases pass, full suite in ~3 s
  • uv run pytest tests/python/ — 597 cases pass
  • Local Release benches (n=10 samples)

Bench (Release, this dev machine)

Benchmark PR0 baseline This branch Δ
QV-10 x100 shots 24.74 ms 22.9 ms -7.4%
cultivation-d5 x1000 shots 38.95 ms 39.7 ms +1.8%
surface-d7-r7 p=1e-3 x10000 shots 71.74 ms 72.3 ms +0.8%
surface-d5-r5 p=0.05 x10000 shots 81.68 ms 82.4 ms +0.9%
exp-val 20q 200 probes x100000 shots 138.59 ms 190.4 ms +37%
surface-d11-r11 p=1e-3 x1000 shots (n/a, > old cap) 18.8 ms

EXP_VAL is the only meaningful regression. As flagged in the migration plan: the previously fully-unrolled BitMask<128> popcount/XOR is now a runtime-bounded loop, and the (X, Z, sign) tuple is split across three storage regions. PR3 will add template specialization on common num_words values for the hot APPLY_PAULI / EXP_VAL paths to recover this.

What's still pending (PR3)

  • SVM frame p_x / p_z runtime sizing.
  • Removal of CLIFFT_MAX_QUBITS and BitMask<N>.
  • Template specialization on num_words for hot mask widths.

🤖 Generated with Claude Code

bachase added 2 commits May 2, 2026 21:19
Note that resizing PauliMaskArena's storage vectors after construction
would invalidate outstanding views, so the capacity is fixed by design.
Any future need to grow post-construction must move to a stable-handle
representation first.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
Make MutableMaskView::copy_from refuse to silently lose source bits:
the source must fit within the destination width. Narrower sources
are zero-padded; wider sources throw unless every excess word is
already zero. Callers that genuinely want to truncate must do so
explicitly via std::span (e.g.
dst.copy_from({other.words.first(dst.num_words())})).

This eliminates a Release-only footgun: the previous version asserted
in Debug but silently dropped high bits in Release. Production hot
paths always pass same-width spans, so the new check has zero cost
on the success path.

Add a 1-word MaskBuf helper in test_helpers.h with implicit
conversion to MaskView. Tests use it as a temporary mask source for
HirModule builders. Single-word coverage handles every existing
test pattern; tests with wider patterns (rare) can construct a
std::array<uint64_t, N> directly.

Add regression tests for the new copy_from semantics: zero-pad
narrower source, accept wider source with zero excess, throw on
wider source with set bits above destination width.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://unitaryfoundation.github.io/clifft/pr-preview/pr-52/

Built to branch gh-pages at 2026-05-03 12:06 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@bachase bachase force-pushed the feat/issue-45-pr2-runtime-qubits branch from e354e0b to 139abbe Compare May 2, 2026 21:48
bachase added 4 commits May 3, 2026 11:39
Replace fixed-width BitMask<N> Pauli storage in HIR, NoiseChannel, and
ConstantPool with PauliMaskArena, indexed by opaque PauliMaskHandle.
HeisenbergOp shrinks from a width-dependent struct (32 B at N=64) to
a fixed 16 B layout (4-byte handle + 8-byte payload + 4-byte header),
pinned by static_assert. The frontend writes stim PauliString rows
directly into the arena slot via stim_to_mask_view, so qubits beyond
kMaxInlineQubits round-trip without truncation.

- HirModule(num_qubits, num_pauli_masks, num_noise_channels) pre-sizes
  both arenas at construction. The default constructor yields empty
  arenas. A counting pre-pass over the parsed Circuit produces the
  conservative upper bounds.
- Op factories (make_*) become private; HirModule exposes append_*
  builders that take MaskView pairs (for tests / callers with mask
  data in hand) and append_*_empty builders that claim a slot and
  return the op for the caller to fill via mask_at(op). The frontend
  uses the latter to avoid any fixed-width intermediate.
- NoiseChannel becomes {handle, prob}. HirModule and ConstantPool each
  own a noise_channel_masks arena. claim_noise_channel_mask /
  claim_empty_noise_channel_mask mirror the pauli_masks claim path.
- Module-bound accessors hir.destab_mask(op) / hir.stab_mask(op) /
  hir.sign(op) replace the old per-op methods. mask_at(op) returns a
  MutablePauliMaskView for in-place mutation. Optimizer passes that
  repurpose existing ops use new demote_to_tgate /
  demote_to_phase_rotation helpers that preserve the mask handle while
  updating type and payload.
- ConstantPool::pauli_masks and exp_val_masks become PauliMaskArena.
  lower() pre-counts CONDITIONAL_PAULI / EXP_VAL emissions and noise
  channel totals to size the arenas. SVM exec_apply_pauli, exec_noise,
  and exec_exp_val read masks via arena.at(handle); the per-frame
  bridge in apply_pauli_to_frame iterates min(num_words, kMaxInlineWords)
  to keep the still-inline SVM frame compatible (frame migrates in a
  later PR).
- VirtualFrame::map_pauli applies pending gates directly on the stim
  PauliString's u64 buffer as a runtime-width MaskView, with no
  PauliBitMask intermediate that would silently truncate at
  kMaxInlineQubits. map_noise_channel uses runtime-width scratch
  vectors for the same reason. apply_gate_to_pauli takes
  MutableMaskView so single-qubit operations read/write the correct
  word at any axis index.
- Optimizer commutation/peephole walk arena views via hir.* accessors;
  apply_virtual_s_downstream conjugates noise channel masks through
  hir.noise_channel_masks.mut_at.
- Introspection (format_pauli_mask, format_hir_op) takes an HirModule
  reference so it can resolve the mask handle.

Tests: rewrite test_hir.cc, test_optimizer.cc, test_backend.cc,
test_frontend.cc, test_svm.cc to use HirModule builders + module-bound
accessors. test_helpers.h gains a 1-word MaskBuf and width-tolerant
operator==(MaskView, BitMask<N>) / operator==(MaskView, uint64_t) to
keep test idioms ergonomic.

Add high-qubit regression tests in test_frontend.cc:
  - n=200, T after H on q150 -- destab bit 150 must be set.
  - parametrized walk at n=70 (word-boundary), n=150 (above old fixed
    width), n=200, n=513 (multi-word case).
These would have caught silent frontend truncation before this commit
and lock down the round-trip going forward.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
trace() now refuses num_qubits > 65536 instead of letting Stim's
TableauSimulator allocate O(n^2) bits of tableau before lower() rejects.
The check matches lower()'s ceiling exactly so both ends agree.

Add a test asserting trace() throws at n = 65537. The matching lower()
test already feeds an empty HIR with the high qubit count to keep
itself fast; that's still a valid contract test for lower() in
isolation.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
PyHeisenbergOp now holds an nb::object module_owner that pins the
owning HirModule's Python refcount. Each wrapper handed out by
HirModule.__getitem__ or __iter__ captures nb::borrow(self) so the
wrapper-via-__iter__ pattern (where the iterator and the list of
items go out of scope while the user still holds an op) doesn't
dereference freed arena memory.

The wasm bindings switch to format_hir_op(op, hir) to match the
HirModule-bound formatter signature.

Add Python regression tests for both __iter__ and __getitem__ that
let the trace() result go out of scope, force a GC pass, then access
the wrapper. Without the lifetime fix these would crash on use of
freed memory.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
After the pass removes noise ops and clears noise_sites, the slots in
hir.noise_channel_masks are no longer reachable but remain allocated.
Replace the arena with an empty one so the pass leaves no dead arena
weight behind.

Add a CHECK in the existing strip test that noise_channel_masks.size()
is zero after the pass.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
@bachase bachase force-pushed the feat/issue-45-pr2-runtime-qubits branch from 139abbe to e4cf947 Compare May 3, 2026 11:40
For each mask-carrying op type (MEASURE, MPP, CONDITIONAL_PAULI, NOISE,
EXP_VAL), trace a minimal circuit that produces one such op touching
qubits at kMaxInlineQubits, assert trace() preserves the high-qubit
support in the HIR mask, and assert lower() rejects with the SVM-
frame-width error.

These both lock down the gate's per-op-type semantics today and
double as PR3 task stubs: when the SVM frame migrates to runtime-width
storage, each REQUIRE_THROWS_AS becomes a Stim-oracle equivalence
check.

Includes a cross-word case (MPP X63 * X128) that exercises the
multi-target build_pauli_string path; a fixed-width intermediate
would clip the high target.

Drops the surface d=11 r=11 benchmark added in the previous commit
of this branch -- with the gate at kMaxInlineQubits, n=274 can no
longer compile. The bench will be re-added in the migration PR that
lifts the gate.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
@bachase bachase force-pushed the feat/issue-45-pr2-runtime-qubits branch from e4cf947 to eeb5c81 Compare May 3, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant