feat: migrate AOT-side Pauli mask storage to runtime-width arenas by bachase · Pull Request #52 · unitaryfoundation/clifft

bachase · 2026-05-02T21:39:07Z

Summary

PR2 of the issue #45 staged migration. Replaces fixed-width BitMask<N> Pauli storage in HIR, NoiseChannel, and ConstantPool with runtime-width PauliMaskArenas, and shrinks HeisenbergOp to a fixed 16 bytes. The frontend writes stim PauliString rows directly into arena slots via stim_to_mask_view, so qubits beyond kMaxInlineQubits round-trip without truncation.

This is a clean rewrite of the work in #50, organized so each commit is correct on landing — no patches-on-patches. #50 will be closed in favor of this.

Commit summary

chore: document arena resize invalidation — comment in pauli_arena.h.
feat(util): tighten copy_from semantics, add MaskBuf test helper — copy_from throws (not asserts) on truncation; production same-size case has zero overhead. 1-word MaskBuf test helper.
feat: migrate Pauli mask storage to runtime-width arenas — the big one. HirModule + ConstantPool arenas, NoiseChannel becomes {handle, prob}, frontend writes stim → arena directly with no PauliBitMask intermediate, optimizer/backend/SVM read masks via arena, HeisenbergOp shrinks to 16 B. Includes high-qubit regression tests at n = 70, 150, 200, 513.
feat(frontend): reject circuits above the VM axis ceiling early — trace() rejects n > 65536 so we don't allocate a giant Stim TableauSimulator before lower() would reject anyway.
fix(python): keep HirModule alive through HeisenbergOp wrappers — PyHeisenbergOp holds nb::object module_owner; both __getitem__ and __iter__ capture nb::borrow(self). Includes Python regression tests for both.
chore(optimizer): clear noise_channel_masks in RemoveNoisePass — replace the arena with an empty one after stripping noise.
test: add surface d=11 r=11 benchmark above the old 128-qubit cap — n ≈ 274, exercises the regime the migration unlocks.

Test plan

uv tool run pre-commit run --all-files
ctest --test-dir build/cmake -E Bench --timeout 60 — 691 cases pass, full suite in ~3 s
uv run pytest tests/python/ — 597 cases pass
Local Release benches (n=10 samples)

Bench (Release, this dev machine)

Benchmark	PR0 baseline	This branch	Δ
`QV-10 x100 shots`	24.74 ms	22.9 ms	-7.4%
`cultivation-d5 x1000 shots`	38.95 ms	39.7 ms	+1.8%
`surface-d7-r7 p=1e-3 x10000 shots`	71.74 ms	72.3 ms	+0.8%
`surface-d5-r5 p=0.05 x10000 shots`	81.68 ms	82.4 ms	+0.9%
`exp-val 20q 200 probes x100000 shots`	138.59 ms	190.4 ms	+37%
`surface-d11-r11 p=1e-3 x1000 shots`	(n/a, > old cap)	18.8 ms	—

EXP_VAL is the only meaningful regression. As flagged in the migration plan: the previously fully-unrolled BitMask<128> popcount/XOR is now a runtime-bounded loop, and the (X, Z, sign) tuple is split across three storage regions. PR3 will add template specialization on common num_words values for the hot APPLY_PAULI / EXP_VAL paths to recover this.

What's still pending (PR3)

SVM frame p_x / p_z runtime sizing.
Removal of CLIFFT_MAX_QUBITS and BitMask<N>.
Template specialization on num_words for hot mask widths.

🤖 Generated with Claude Code

Note that resizing PauliMaskArena's storage vectors after construction would invalidate outstanding views, so the capacity is fixed by design. Any future need to grow post-construction must move to a stable-handle representation first. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>

Make MutableMaskView::copy_from refuse to silently lose source bits: the source must fit within the destination width. Narrower sources are zero-padded; wider sources throw unless every excess word is already zero. Callers that genuinely want to truncate must do so explicitly via std::span (e.g. dst.copy_from({other.words.first(dst.num_words())})). This eliminates a Release-only footgun: the previous version asserted in Debug but silently dropped high bits in Release. Production hot paths always pass same-width spans, so the new check has zero cost on the success path. Add a 1-word MaskBuf helper in test_helpers.h with implicit conversion to MaskView. Tests use it as a temporary mask source for HirModule builders. Single-word coverage handles every existing test pattern; tests with wider patterns (rare) can construct a std::array<uint64_t, N> directly. Add regression tests for the new copy_from semantics: zero-pad narrower source, accept wider source with zero excess, throw on wider source with set bits above destination width. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>

github-actions · 2026-05-02T21:42:32Z

PR Preview Action v1.8.1
🚀 View preview at https://unitaryfoundation.github.io/clifft/pr-preview/pr-52/
Built to branch `gh-pages` at 2026-05-03 12:06 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Replace fixed-width BitMask<N> Pauli storage in HIR, NoiseChannel, and ConstantPool with PauliMaskArena, indexed by opaque PauliMaskHandle. HeisenbergOp shrinks from a width-dependent struct (32 B at N=64) to a fixed 16 B layout (4-byte handle + 8-byte payload + 4-byte header), pinned by static_assert. The frontend writes stim PauliString rows directly into the arena slot via stim_to_mask_view, so qubits beyond kMaxInlineQubits round-trip without truncation. - HirModule(num_qubits, num_pauli_masks, num_noise_channels) pre-sizes both arenas at construction. The default constructor yields empty arenas. A counting pre-pass over the parsed Circuit produces the conservative upper bounds. - Op factories (make_*) become private; HirModule exposes append_* builders that take MaskView pairs (for tests / callers with mask data in hand) and append_*_empty builders that claim a slot and return the op for the caller to fill via mask_at(op). The frontend uses the latter to avoid any fixed-width intermediate. - NoiseChannel becomes {handle, prob}. HirModule and ConstantPool each own a noise_channel_masks arena. claim_noise_channel_mask / claim_empty_noise_channel_mask mirror the pauli_masks claim path. - Module-bound accessors hir.destab_mask(op) / hir.stab_mask(op) / hir.sign(op) replace the old per-op methods. mask_at(op) returns a MutablePauliMaskView for in-place mutation. Optimizer passes that repurpose existing ops use new demote_to_tgate / demote_to_phase_rotation helpers that preserve the mask handle while updating type and payload. - ConstantPool::pauli_masks and exp_val_masks become PauliMaskArena. lower() pre-counts CONDITIONAL_PAULI / EXP_VAL emissions and noise channel totals to size the arenas. SVM exec_apply_pauli, exec_noise, and exec_exp_val read masks via arena.at(handle); the per-frame bridge in apply_pauli_to_frame iterates min(num_words, kMaxInlineWords) to keep the still-inline SVM frame compatible (frame migrates in a later PR). - VirtualFrame::map_pauli applies pending gates directly on the stim PauliString's u64 buffer as a runtime-width MaskView, with no PauliBitMask intermediate that would silently truncate at kMaxInlineQubits. map_noise_channel uses runtime-width scratch vectors for the same reason. apply_gate_to_pauli takes MutableMaskView so single-qubit operations read/write the correct word at any axis index. - Optimizer commutation/peephole walk arena views via hir.* accessors; apply_virtual_s_downstream conjugates noise channel masks through hir.noise_channel_masks.mut_at. - Introspection (format_pauli_mask, format_hir_op) takes an HirModule reference so it can resolve the mask handle. Tests: rewrite test_hir.cc, test_optimizer.cc, test_backend.cc, test_frontend.cc, test_svm.cc to use HirModule builders + module-bound accessors. test_helpers.h gains a 1-word MaskBuf and width-tolerant operator==(MaskView, BitMask<N>) / operator==(MaskView, uint64_t) to keep test idioms ergonomic. Add high-qubit regression tests in test_frontend.cc: - n=200, T after H on q150 -- destab bit 150 must be set. - parametrized walk at n=70 (word-boundary), n=150 (above old fixed width), n=200, n=513 (multi-word case). These would have caught silent frontend truncation before this commit and lock down the round-trip going forward. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>

trace() now refuses num_qubits > 65536 instead of letting Stim's TableauSimulator allocate O(n^2) bits of tableau before lower() rejects. The check matches lower()'s ceiling exactly so both ends agree. Add a test asserting trace() throws at n = 65537. The matching lower() test already feeds an empty HIR with the high qubit count to keep itself fast; that's still a valid contract test for lower() in isolation. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>

PyHeisenbergOp now holds an nb::object module_owner that pins the owning HirModule's Python refcount. Each wrapper handed out by HirModule.__getitem__ or __iter__ captures nb::borrow(self) so the wrapper-via-__iter__ pattern (where the iterator and the list of items go out of scope while the user still holds an op) doesn't dereference freed arena memory. The wasm bindings switch to format_hir_op(op, hir) to match the HirModule-bound formatter signature. Add Python regression tests for both __iter__ and __getitem__ that let the trace() result go out of scope, force a GC pass, then access the wrapper. Without the lifetime fix these would crash on use of freed memory. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>

After the pass removes noise ops and clears noise_sites, the slots in hir.noise_channel_masks are no longer reachable but remain allocated. Replace the arena with an empty one so the pass leaves no dead arena weight behind. Add a CHECK in the existing strip test that noise_channel_masks.size() is zero after the pass. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>

For each mask-carrying op type (MEASURE, MPP, CONDITIONAL_PAULI, NOISE, EXP_VAL), trace a minimal circuit that produces one such op touching qubits at kMaxInlineQubits, assert trace() preserves the high-qubit support in the HIR mask, and assert lower() rejects with the SVM- frame-width error. These both lock down the gate's per-op-type semantics today and double as PR3 task stubs: when the SVM frame migrates to runtime-width storage, each REQUIRE_THROWS_AS becomes a Stim-oracle equivalence check. Includes a cross-word case (MPP X63 * X128) that exercises the multi-target build_pauli_string path; a fixed-width intermediate would clip the high target. Drops the surface d=11 r=11 benchmark added in the previous commit of this branch -- with the gate at kMaxInlineQubits, n=274 can no longer compile. The bench will be re-added in the migration PR that lifts the gate. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>

bachase added 2 commits May 2, 2026 21:19

bachase mentioned this pull request May 2, 2026

feat(backend, frontend, noise): migrate AOT-side Pauli mask storage to arenas #50

Closed

3 tasks

bachase force-pushed the feat/issue-45-pr2-runtime-qubits branch from e354e0b to 139abbe Compare May 2, 2026 21:48

bachase added 4 commits May 3, 2026 11:39

bachase force-pushed the feat/issue-45-pr2-runtime-qubits branch from 139abbe to e4cf947 Compare May 3, 2026 11:40

bachase force-pushed the feat/issue-45-pr2-runtime-qubits branch from e4cf947 to eeb5c81 Compare May 3, 2026 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: migrate AOT-side Pauli mask storage to runtime-width arenas#52

feat: migrate AOT-side Pauli mask storage to runtime-width arenas#52
bachase wants to merge 7 commits intomainfrom
feat/issue-45-pr2-runtime-qubits

bachase commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-03 12:06 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bachase commented May 2, 2026

Summary

Commit summary

Test plan

Bench (Release, this dev machine)

What's still pending (PR3)

Uh oh!

github-actions Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-05-03 12:06 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 2, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-03 12:06 UTC.
Preview will be ready when the GitHub Pages deployment is complete.