feat(backend, frontend, noise): migrate AOT-side Pauli mask storage to arenas#50
Closed
feat(backend, frontend, noise): migrate AOT-side Pauli mask storage to arenas#50
Conversation
Note that resizing PauliMaskArena's storage vectors after construction would invalidate outstanding views, so the capacity is fixed by design. Any future need to grow post-construction must move to a stable-handle representation first. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
Move HIR Pauli mask storage off compile-time-fixed BitMask<N> inline fields and onto a runtime-width PauliMaskArena owned by HirModule. - HirModule(num_qubits, num_pauli_masks) pre-allocates the arena from a counting pre-pass over the parsed Circuit. The default constructor yields an empty (zero-width, zero-capacity) arena. - HeisenbergOp shrinks from a width-dependent struct (32B at N=64) to a fixed 16-byte layout: 4-byte PauliMaskHandle + 8-byte payload union + 4-byte (type/flags/pad) header. New static_assert pins the size. - Op factories (make_tgate, make_measure, ...) become private; HirModule exposes append_* builders that allocate an arena slot, copy the mask data, append the op, and return a reference. Optimizer passes that repurpose existing ops (PHASE_ROTATION -> T_GATE etc.) use new module helpers (demote_to_tgate, demote_to_phase_rotation) that preserve the mask handle while updating type/payload. - Module-bound accessors hir.destab_mask(op) / hir.stab_mask(op) / hir.sign(op) replace the old op.destab_mask() / op.stab_mask() / op.sign() methods. mask_at(op) returns a MutablePauliMaskView for in-place mutation. NoiseChannel still holds inline PauliBitMask; it migrates with ConstantPool in a follow-up. - Pivot helpers and conjugation primitives now operate on MaskView, using the existing anti_commute / lowest_bit_at_or_above primitives. - Test helpers gain a MaskBuf rvalue type that wraps a uint64_t in a PauliBitMask and converts to MaskView, plus a uint64_t comparison operator on MaskView so existing assertion idioms keep working. The frontend still rejects num_qubits > kMaxInlineQubits; that ceiling moves to the backend in a follow-up. ConstantPool and SVM read paths are unchanged in this commit. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
Replace ConstantPool::pauli_masks and exp_val_masks (vector<PauliMask> of inline BitMask<N>) with PauliMaskArena, indexed by handles cast from the existing bytecode cp_mask_idx / cp_exp_val_idx fields. - Drop the PauliMask struct from backend.h. Bytecode cp_mask_idx is reinterpreted as a PauliMaskHandle into the relevant arena. - lower() pre-counts the number of CONDITIONAL_PAULI and EXP_VAL HIR ops and constructs the arenas with that capacity. Each emission claims the next slot in order. - exec_apply_pauli and exec_exp_val read masks via arena.at(handle) and operate on MaskView. apply_pauli_to_frame is reshaped to take MaskView and bridges into the still-inline SchrodingerState::p_x / p_z via per-word XOR/popcount loops (the SVM frame migrates in a later PR; the bridge is sized by the arena's num_words). - NoiseChannel still uses inline PauliBitMask in this commit; the exec_noise call site adapts via view(ch.destab_mask). The full NoiseChannel migration follows in a separate commit. EXP_VAL throughput regresses ~20% on the bench at this commit because the previously fully-unrolled BitMask<128> popcount is now a runtime- bounded loop and the (X, Z, sign) tuple is split across three storage regions. The migration plan addresses this in a follow-up PR via template specialization on common num_words values. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
Replace NoiseChannel's inline destab_mask / stab_mask fields with a
PauliMaskHandle into an arena. Both HirModule and ConstantPool now
own a noise_channel_masks PauliMaskArena that backs all noise channel
masks they reference; the NoiseChannel struct itself shrinks to
{handle, prob}.
- HirModule constructor takes an optional num_noise_channels capacity
and constructs a sized noise_channel_masks arena. A counting pass
in the front-end produces a conservative upper bound.
- ConstantPool gains a noise_channel_masks arena, sized in lower()
from sum(hir.noise_sites[i].channels.size()).
- claim_noise_channel_mask() returns a handle and writes (X, Z) into
the arena slot. Sign is unused for noise channels.
- VirtualFrame::map_noise_channel takes input/output MaskView pairs
and writes the virtual-frame-mapped Pauli into a destination view.
- exec_noise reads the channel mask through pool.noise_channel_masks
and feeds the views to apply_pauli_to_frame.
- Optimizer passes (commutation, peephole) walk channels via
hir.noise_channel_masks.{at,mut_at}.
- Tests construct test-local NoiseChannelMasks reference values and
compare arena views against PauliBitMask via a width-tolerant
operator== that pads the shorter side with zeros.
Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
trace() no longer rejects circuits whose num_qubits exceeds the old fixed mask width; HIR construction now supports any width. The 16-bit VM axis operands still cap bytecode emission, so lower() throws for num_qubits > 65536. Update test_frontend.cc to assert both behaviours. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
… mismatch in copy_from Two CI failures from the previous commit: 1. Python/Wasm bindings used the removed no-arg accessors (op.sign(), format_pauli_mask(op), format_hir_op(op)) which now require a HirModule reference. Wrap HeisenbergOp in a small PyHeisenbergOp that pairs the op with its owning module; HirModule's __getitem__ / __iter__ / as_dict produce wrappers. The Python surface (op_type, sign, pauli_string, as_dict) is unchanged. 2. Debug builds aborted because BasicMaskView::copy_from asserted that source and destination have identical widths. The HIR builders write a kMaxInlineWords-wide PauliBitMask into an arena slot sized for the circuit's actual num_qubits, so the two widths legitimately differ for narrow circuits. Loosen copy_from to iterate min size, zero-extend trailing destination words, and assert that any source words beyond the destination width are zero (catches data loss). Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
|
The previous test called trace() with num_qubits=65537, which allocates a Stim TableauSimulator of that size before reaching lower()'s ceiling check. On the CI runner this OOMed/swapped past the 300s ctest timeout. Local runs were slower than I noticed because --reporter compact hid the per-test time. The ceiling lives in lower(), so the test should feed it an empty HIR with the high qubit count directly and skip trace() entirely. Now returns instantly. Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
4 tasks
Contributor
Author
|
Superseded by #52 — clean rewrite organized so each commit is correct on landing (no patches-on-patches), includes regression tests for the issues found in this PR's review, and adds a large-n benchmark fixture. Closing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR2 of the issue #45 staged migration. Moves all AOT-side Pauli mask storage off compile-time-fixed
BitMask<N>and onto runtime-widthPauliMaskArena, shrinksHeisenbergOpto a fixed 16 bytes, and relocates the qubit-count ceiling from the frontend to the backend. SVM frame storage (p_x/p_zinSchrodingerState) andCLIFFT_MAX_QUBITSitself are unchanged here; PR3 finishes that.Five commits, each independently reviewable:
chore: document arena resize invalidation— comment inpauli_arena.hflagging the fixed-capacity invariant and the upgrade path if a future pass needs to grow the arena post-construction.feat(hir): migrate HIR Pauli mask storage to arena— HirModule owns a sizedpauli_masksarena.HeisenbergOpshrinks from a width-dependent struct (32 B at N=64) to a fixed 16 B (4-bytePauliMaskHandle+ 8-byte payload + 4-byte type/flags/pad), with a newstatic_assert. Op factories become private; HirModule exposesappend_*builders that allocate the slot, copy mask data, append the op, and return a reference. Module-bound accessorshir.destab_mask(op)/hir.stab_mask(op)/hir.sign(op)replace the old per-op methods. Optimizer passes that repurpose existing ops (PHASE_ROTATION → T_GATE) use newdemote_to_*helpers that preserve the mask handle.localize_pauli's helpers and conjugation primitives now operate onMaskView.feat(backend): migrate ConstantPool Pauli mask storage to arena—ConstantPool::pauli_masksandexp_val_masksbecome arenas, indexed by handles cast from the existing bytecodecp_mask_idx/cp_exp_val_idxfields.lower()pre-counts and allocates. SVMapply_pauli_to_frameandexec_exp_valread masks viaarena.at(handle)and operate onMaskView. The SVM framep_x/p_zis still inlinePauliBitMask(PR3), so a per-word XOR/popcount bridge sits inapply_pauli_to_frame.feat(noise): migrate NoiseChannel mask storage to arena— NoiseChannel becomes{handle, prob}. HirModule and ConstantPool each own anoise_channel_masksarena.VirtualFrame::map_noise_channeltakes input/output mask views.feat(backend): move qubit ceiling check from frontend to lower()—trace()accepts circuits at any width;lower()throws ifnum_qubits > 65536(theuint16_tbytecode-axis ceiling).What this enables
After this PR, the AOT pipeline (parse → trace → lower) is correct and well-defined for any
num_qubits ≤ 65536. Run-time execution still funnels through the inline-width SVM frame, so the user-visible feature (running circuits withnum_qubits > kMaxInlineQubits) lands fully with PR3.Test plan
uv tool run pre-commit run --all-filesctest --test-dir build/cmake— 685 cases pass, all[bench]cases excluded.Bench
vs PR0 baseline on this dev machine, 5–10 sample medians:
QV-10 x100 shotscultivation-d5 x1000 shotssurface-d7-r7 p=1e-3 x10000 shotssurface-d5-r5 p=0.05 x10000 shotsexp-val 20q 200 probes x100000 shotsFour of five benchmarks improved; the EXP_VAL bench regressed. The migration plan flagged this as expected: replacing the fully-unrolled
BitMask<128>popcount/XOR with a runtime-bounded loop loses some of the compiler's auto-vectorization, and splitting the (X, Z, sign) tuple across three storage regions costs an extra cache line per probe. PR3 will add template specialization on commonnum_wordsvalues (1, 2, 4, 8) for the hot APPLY_PAULI / EXP_VAL paths, which should recover this.Out of scope (PR3)
p_x/p_zruntime sizing.CLIFFT_MAX_QUBITSandBitMask<N>.🤖 Generated with Claude Code