feat: migrate AOT-side Pauli mask storage to runtime-width arenas#52
Open
feat: migrate AOT-side Pauli mask storage to runtime-width arenas#52
Conversation
Note that resizing PauliMaskArena's storage vectors after construction would invalidate outstanding views, so the capacity is fixed by design. Any future need to grow post-construction must move to a stable-handle representation first. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
Make MutableMaskView::copy_from refuse to silently lose source bits:
the source must fit within the destination width. Narrower sources
are zero-padded; wider sources throw unless every excess word is
already zero. Callers that genuinely want to truncate must do so
explicitly via std::span (e.g.
dst.copy_from({other.words.first(dst.num_words())})).
This eliminates a Release-only footgun: the previous version asserted
in Debug but silently dropped high bits in Release. Production hot
paths always pass same-width spans, so the new check has zero cost
on the success path.
Add a 1-word MaskBuf helper in test_helpers.h with implicit
conversion to MaskView. Tests use it as a temporary mask source for
HirModule builders. Single-word coverage handles every existing
test pattern; tests with wider patterns (rare) can construct a
std::array<uint64_t, N> directly.
Add regression tests for the new copy_from semantics: zero-pad
narrower source, accept wider source with zero excess, throw on
wider source with set bits above destination width.
Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
3 tasks
|
e354e0b to
139abbe
Compare
Replace fixed-width BitMask<N> Pauli storage in HIR, NoiseChannel, and
ConstantPool with PauliMaskArena, indexed by opaque PauliMaskHandle.
HeisenbergOp shrinks from a width-dependent struct (32 B at N=64) to
a fixed 16 B layout (4-byte handle + 8-byte payload + 4-byte header),
pinned by static_assert. The frontend writes stim PauliString rows
directly into the arena slot via stim_to_mask_view, so qubits beyond
kMaxInlineQubits round-trip without truncation.
- HirModule(num_qubits, num_pauli_masks, num_noise_channels) pre-sizes
both arenas at construction. The default constructor yields empty
arenas. A counting pre-pass over the parsed Circuit produces the
conservative upper bounds.
- Op factories (make_*) become private; HirModule exposes append_*
builders that take MaskView pairs (for tests / callers with mask
data in hand) and append_*_empty builders that claim a slot and
return the op for the caller to fill via mask_at(op). The frontend
uses the latter to avoid any fixed-width intermediate.
- NoiseChannel becomes {handle, prob}. HirModule and ConstantPool each
own a noise_channel_masks arena. claim_noise_channel_mask /
claim_empty_noise_channel_mask mirror the pauli_masks claim path.
- Module-bound accessors hir.destab_mask(op) / hir.stab_mask(op) /
hir.sign(op) replace the old per-op methods. mask_at(op) returns a
MutablePauliMaskView for in-place mutation. Optimizer passes that
repurpose existing ops use new demote_to_tgate /
demote_to_phase_rotation helpers that preserve the mask handle while
updating type and payload.
- ConstantPool::pauli_masks and exp_val_masks become PauliMaskArena.
lower() pre-counts CONDITIONAL_PAULI / EXP_VAL emissions and noise
channel totals to size the arenas. SVM exec_apply_pauli, exec_noise,
and exec_exp_val read masks via arena.at(handle); the per-frame
bridge in apply_pauli_to_frame iterates min(num_words, kMaxInlineWords)
to keep the still-inline SVM frame compatible (frame migrates in a
later PR).
- VirtualFrame::map_pauli applies pending gates directly on the stim
PauliString's u64 buffer as a runtime-width MaskView, with no
PauliBitMask intermediate that would silently truncate at
kMaxInlineQubits. map_noise_channel uses runtime-width scratch
vectors for the same reason. apply_gate_to_pauli takes
MutableMaskView so single-qubit operations read/write the correct
word at any axis index.
- Optimizer commutation/peephole walk arena views via hir.* accessors;
apply_virtual_s_downstream conjugates noise channel masks through
hir.noise_channel_masks.mut_at.
- Introspection (format_pauli_mask, format_hir_op) takes an HirModule
reference so it can resolve the mask handle.
Tests: rewrite test_hir.cc, test_optimizer.cc, test_backend.cc,
test_frontend.cc, test_svm.cc to use HirModule builders + module-bound
accessors. test_helpers.h gains a 1-word MaskBuf and width-tolerant
operator==(MaskView, BitMask<N>) / operator==(MaskView, uint64_t) to
keep test idioms ergonomic.
Add high-qubit regression tests in test_frontend.cc:
- n=200, T after H on q150 -- destab bit 150 must be set.
- parametrized walk at n=70 (word-boundary), n=150 (above old fixed
width), n=200, n=513 (multi-word case).
These would have caught silent frontend truncation before this commit
and lock down the round-trip going forward.
Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
trace() now refuses num_qubits > 65536 instead of letting Stim's TableauSimulator allocate O(n^2) bits of tableau before lower() rejects. The check matches lower()'s ceiling exactly so both ends agree. Add a test asserting trace() throws at n = 65537. The matching lower() test already feeds an empty HIR with the high qubit count to keep itself fast; that's still a valid contract test for lower() in isolation. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
PyHeisenbergOp now holds an nb::object module_owner that pins the owning HirModule's Python refcount. Each wrapper handed out by HirModule.__getitem__ or __iter__ captures nb::borrow(self) so the wrapper-via-__iter__ pattern (where the iterator and the list of items go out of scope while the user still holds an op) doesn't dereference freed arena memory. The wasm bindings switch to format_hir_op(op, hir) to match the HirModule-bound formatter signature. Add Python regression tests for both __iter__ and __getitem__ that let the trace() result go out of scope, force a GC pass, then access the wrapper. Without the lifetime fix these would crash on use of freed memory. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
After the pass removes noise ops and clears noise_sites, the slots in hir.noise_channel_masks are no longer reachable but remain allocated. Replace the arena with an empty one so the pass leaves no dead arena weight behind. Add a CHECK in the existing strip test that noise_channel_masks.size() is zero after the pass. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
139abbe to
e4cf947
Compare
For each mask-carrying op type (MEASURE, MPP, CONDITIONAL_PAULI, NOISE, EXP_VAL), trace a minimal circuit that produces one such op touching qubits at kMaxInlineQubits, assert trace() preserves the high-qubit support in the HIR mask, and assert lower() rejects with the SVM- frame-width error. These both lock down the gate's per-op-type semantics today and double as PR3 task stubs: when the SVM frame migrates to runtime-width storage, each REQUIRE_THROWS_AS becomes a Stim-oracle equivalence check. Includes a cross-word case (MPP X63 * X128) that exercises the multi-target build_pauli_string path; a fixed-width intermediate would clip the high target. Drops the surface d=11 r=11 benchmark added in the previous commit of this branch -- with the gate at kMaxInlineQubits, n=274 can no longer compile. The bench will be re-added in the migration PR that lifts the gate. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
e4cf947 to
eeb5c81
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR2 of the issue #45 staged migration. Replaces fixed-width
BitMask<N>Pauli storage in HIR, NoiseChannel, and ConstantPool with runtime-widthPauliMaskArenas, and shrinksHeisenbergOpto a fixed 16 bytes. The frontend writes stim PauliString rows directly into arena slots viastim_to_mask_view, so qubits beyondkMaxInlineQubitsround-trip without truncation.This is a clean rewrite of the work in #50, organized so each commit is correct on landing — no patches-on-patches. #50 will be closed in favor of this.
Commit summary
chore: document arena resize invalidation— comment inpauli_arena.h.feat(util): tighten copy_from semantics, add MaskBuf test helper—copy_fromthrows (not asserts) on truncation; production same-size case has zero overhead. 1-wordMaskBuftest helper.feat: migrate Pauli mask storage to runtime-width arenas— the big one. HirModule + ConstantPool arenas, NoiseChannel becomes{handle, prob}, frontend writes stim → arena directly with no PauliBitMask intermediate, optimizer/backend/SVM read masks via arena,HeisenbergOpshrinks to 16 B. Includes high-qubit regression tests at n = 70, 150, 200, 513.feat(frontend): reject circuits above the VM axis ceiling early—trace()rejectsn > 65536so we don't allocate a giant Stim TableauSimulator beforelower()would reject anyway.fix(python): keep HirModule alive through HeisenbergOp wrappers—PyHeisenbergOpholdsnb::object module_owner; both__getitem__and__iter__capturenb::borrow(self). Includes Python regression tests for both.chore(optimizer): clear noise_channel_masks in RemoveNoisePass— replace the arena with an empty one after stripping noise.test: add surface d=11 r=11 benchmark above the old 128-qubit cap—n ≈ 274, exercises the regime the migration unlocks.Test plan
uv tool run pre-commit run --all-filesctest --test-dir build/cmake -E Bench --timeout 60— 691 cases pass, full suite in ~3 suv run pytest tests/python/— 597 cases passBench (Release, this dev machine)
QV-10 x100 shotscultivation-d5 x1000 shotssurface-d7-r7 p=1e-3 x10000 shotssurface-d5-r5 p=0.05 x10000 shotsexp-val 20q 200 probes x100000 shotssurface-d11-r11 p=1e-3 x1000 shotsEXP_VAL is the only meaningful regression. As flagged in the migration plan: the previously fully-unrolled
BitMask<128>popcount/XOR is now a runtime-bounded loop, and the(X, Z, sign)tuple is split across three storage regions. PR3 will add template specialization on commonnum_wordsvalues for the hotAPPLY_PAULI/EXP_VALpaths to recover this.What's still pending (PR3)
p_x/p_zruntime sizing.CLIFFT_MAX_QUBITSandBitMask<N>.num_wordsfor hot mask widths.🤖 Generated with Claude Code