Skip to content

feat(backend, frontend, noise): migrate AOT-side Pauli mask storage to arenas#50

Closed
bachase wants to merge 7 commits intomainfrom
feat/issue-45-pr2-hir-arena-storage
Closed

feat(backend, frontend, noise): migrate AOT-side Pauli mask storage to arenas#50
bachase wants to merge 7 commits intomainfrom
feat/issue-45-pr2-hir-arena-storage

Conversation

@bachase
Copy link
Copy Markdown
Contributor

@bachase bachase commented May 1, 2026

Summary

PR2 of the issue #45 staged migration. Moves all AOT-side Pauli mask storage off compile-time-fixed BitMask<N> and onto runtime-width PauliMaskArena, shrinks HeisenbergOp to a fixed 16 bytes, and relocates the qubit-count ceiling from the frontend to the backend. SVM frame storage (p_x/p_z in SchrodingerState) and CLIFFT_MAX_QUBITS itself are unchanged here; PR3 finishes that.

Five commits, each independently reviewable:

  1. chore: document arena resize invalidation — comment in pauli_arena.h flagging the fixed-capacity invariant and the upgrade path if a future pass needs to grow the arena post-construction.
  2. feat(hir): migrate HIR Pauli mask storage to arena — HirModule owns a sized pauli_masks arena. HeisenbergOp shrinks from a width-dependent struct (32 B at N=64) to a fixed 16 B (4-byte PauliMaskHandle + 8-byte payload + 4-byte type/flags/pad), with a new static_assert. Op factories become private; HirModule exposes append_* builders that allocate the slot, copy mask data, append the op, and return a reference. Module-bound accessors hir.destab_mask(op) / hir.stab_mask(op) / hir.sign(op) replace the old per-op methods. Optimizer passes that repurpose existing ops (PHASE_ROTATION → T_GATE) use new demote_to_* helpers that preserve the mask handle. localize_pauli's helpers and conjugation primitives now operate on MaskView.
  3. feat(backend): migrate ConstantPool Pauli mask storage to arenaConstantPool::pauli_masks and exp_val_masks become arenas, indexed by handles cast from the existing bytecode cp_mask_idx / cp_exp_val_idx fields. lower() pre-counts and allocates. SVM apply_pauli_to_frame and exec_exp_val read masks via arena.at(handle) and operate on MaskView. The SVM frame p_x/p_z is still inline PauliBitMask (PR3), so a per-word XOR/popcount bridge sits in apply_pauli_to_frame.
  4. feat(noise): migrate NoiseChannel mask storage to arena — NoiseChannel becomes {handle, prob}. HirModule and ConstantPool each own a noise_channel_masks arena. VirtualFrame::map_noise_channel takes input/output mask views.
  5. feat(backend): move qubit ceiling check from frontend to lower()trace() accepts circuits at any width; lower() throws if num_qubits > 65536 (the uint16_t bytecode-axis ceiling).

What this enables

After this PR, the AOT pipeline (parse → trace → lower) is correct and well-defined for any num_qubits ≤ 65536. Run-time execution still funnels through the inline-width SVM frame, so the user-visible feature (running circuits with num_qubits > kMaxInlineQubits) lands fully with PR3.

Test plan

  • uv tool run pre-commit run --all-files
  • ctest --test-dir build/cmake — 685 cases pass, all [bench] cases excluded.
  • Local Release benches: see "Bench" below.

Bench

vs PR0 baseline on this dev machine, 5–10 sample medians:

Benchmark PR0 baseline PR2 (this branch) Δ
QV-10 x100 shots 24.74 ms 22.8 ms -7.9%
cultivation-d5 x1000 shots 38.95 ms 39.5 ms +1.4%
surface-d7-r7 p=1e-3 x10000 shots 71.74 ms 69.3 ms -3.4%
surface-d5-r5 p=0.05 x10000 shots 81.68 ms 80.1 ms -1.9%
exp-val 20q 200 probes x100000 shots 138.59 ms ~177 ms +28%

Four of five benchmarks improved; the EXP_VAL bench regressed. The migration plan flagged this as expected: replacing the fully-unrolled BitMask<128> popcount/XOR with a runtime-bounded loop loses some of the compiler's auto-vectorization, and splitting the (X, Z, sign) tuple across three storage regions costs an extra cache line per probe. PR3 will add template specialization on common num_words values (1, 2, 4, 8) for the hot APPLY_PAULI / EXP_VAL paths, which should recover this.

Out of scope (PR3)

  • SVM frame p_x / p_z runtime sizing.
  • Removal of CLIFFT_MAX_QUBITS and BitMask<N>.
  • Template specialization for hot mask widths.

🤖 Generated with Claude Code

bachase added 6 commits May 1, 2026 19:35
Note that resizing PauliMaskArena's storage vectors after construction
would invalidate outstanding views, so the capacity is fixed by design.
Any future need to grow post-construction must move to a stable-handle
representation first.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
Move HIR Pauli mask storage off compile-time-fixed BitMask<N> inline
fields and onto a runtime-width PauliMaskArena owned by HirModule.

- HirModule(num_qubits, num_pauli_masks) pre-allocates the arena from
  a counting pre-pass over the parsed Circuit. The default constructor
  yields an empty (zero-width, zero-capacity) arena.
- HeisenbergOp shrinks from a width-dependent struct (32B at N=64) to
  a fixed 16-byte layout: 4-byte PauliMaskHandle + 8-byte payload union
  + 4-byte (type/flags/pad) header. New static_assert pins the size.
- Op factories (make_tgate, make_measure, ...) become private; HirModule
  exposes append_* builders that allocate an arena slot, copy the mask
  data, append the op, and return a reference. Optimizer passes that
  repurpose existing ops (PHASE_ROTATION -> T_GATE etc.) use new
  module helpers (demote_to_tgate, demote_to_phase_rotation) that
  preserve the mask handle while updating type/payload.
- Module-bound accessors hir.destab_mask(op) / hir.stab_mask(op) /
  hir.sign(op) replace the old op.destab_mask() / op.stab_mask() /
  op.sign() methods. mask_at(op) returns a MutablePauliMaskView for
  in-place mutation. NoiseChannel still holds inline PauliBitMask;
  it migrates with ConstantPool in a follow-up.
- Pivot helpers and conjugation primitives now operate on MaskView,
  using the existing anti_commute / lowest_bit_at_or_above primitives.
- Test helpers gain a MaskBuf rvalue type that wraps a uint64_t in a
  PauliBitMask and converts to MaskView, plus a uint64_t comparison
  operator on MaskView so existing assertion idioms keep working.

The frontend still rejects num_qubits > kMaxInlineQubits; that ceiling
moves to the backend in a follow-up. ConstantPool and SVM read paths
are unchanged in this commit.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
Replace ConstantPool::pauli_masks and exp_val_masks (vector<PauliMask>
of inline BitMask<N>) with PauliMaskArena, indexed by handles cast
from the existing bytecode cp_mask_idx / cp_exp_val_idx fields.

- Drop the PauliMask struct from backend.h. Bytecode cp_mask_idx is
  reinterpreted as a PauliMaskHandle into the relevant arena.
- lower() pre-counts the number of CONDITIONAL_PAULI and EXP_VAL HIR
  ops and constructs the arenas with that capacity. Each emission
  claims the next slot in order.
- exec_apply_pauli and exec_exp_val read masks via arena.at(handle)
  and operate on MaskView. apply_pauli_to_frame is reshaped to take
  MaskView and bridges into the still-inline SchrodingerState::p_x /
  p_z via per-word XOR/popcount loops (the SVM frame migrates in a
  later PR; the bridge is sized by the arena's num_words).
- NoiseChannel still uses inline PauliBitMask in this commit; the
  exec_noise call site adapts via view(ch.destab_mask). The full
  NoiseChannel migration follows in a separate commit.

EXP_VAL throughput regresses ~20% on the bench at this commit because
the previously fully-unrolled BitMask<128> popcount is now a runtime-
bounded loop and the (X, Z, sign) tuple is split across three storage
regions. The migration plan addresses this in a follow-up PR via
template specialization on common num_words values.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
Replace NoiseChannel's inline destab_mask / stab_mask fields with a
PauliMaskHandle into an arena. Both HirModule and ConstantPool now
own a noise_channel_masks PauliMaskArena that backs all noise channel
masks they reference; the NoiseChannel struct itself shrinks to
{handle, prob}.

- HirModule constructor takes an optional num_noise_channels capacity
  and constructs a sized noise_channel_masks arena. A counting pass
  in the front-end produces a conservative upper bound.
- ConstantPool gains a noise_channel_masks arena, sized in lower()
  from sum(hir.noise_sites[i].channels.size()).
- claim_noise_channel_mask() returns a handle and writes (X, Z) into
  the arena slot. Sign is unused for noise channels.
- VirtualFrame::map_noise_channel takes input/output MaskView pairs
  and writes the virtual-frame-mapped Pauli into a destination view.
- exec_noise reads the channel mask through pool.noise_channel_masks
  and feeds the views to apply_pauli_to_frame.
- Optimizer passes (commutation, peephole) walk channels via
  hir.noise_channel_masks.{at,mut_at}.
- Tests construct test-local NoiseChannelMasks reference values and
  compare arena views against PauliBitMask via a width-tolerant
  operator== that pads the shorter side with zeros.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
trace() no longer rejects circuits whose num_qubits exceeds the old
fixed mask width; HIR construction now supports any width. The 16-bit
VM axis operands still cap bytecode emission, so lower() throws for
num_qubits > 65536. Update test_frontend.cc to assert both behaviours.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
… mismatch in copy_from

Two CI failures from the previous commit:

1. Python/Wasm bindings used the removed no-arg accessors
   (op.sign(), format_pauli_mask(op), format_hir_op(op)) which now
   require a HirModule reference. Wrap HeisenbergOp in a small
   PyHeisenbergOp that pairs the op with its owning module; HirModule's
   __getitem__ / __iter__ / as_dict produce wrappers. The Python
   surface (op_type, sign, pauli_string, as_dict) is unchanged.
2. Debug builds aborted because BasicMaskView::copy_from asserted that
   source and destination have identical widths. The HIR builders
   write a kMaxInlineWords-wide PauliBitMask into an arena slot sized
   for the circuit's actual num_qubits, so the two widths legitimately
   differ for narrow circuits. Loosen copy_from to iterate min size,
   zero-extend trailing destination words, and assert that any source
   words beyond the destination width are zero (catches data loss).

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-02 21:39 UTC

The previous test called trace() with num_qubits=65537, which allocates
a Stim TableauSimulator of that size before reaching lower()'s
ceiling check. On the CI runner this OOMed/swapped past the 300s
ctest timeout. Local runs were slower than I noticed because
--reporter compact hid the per-test time.

The ceiling lives in lower(), so the test should feed it an empty HIR
with the high qubit count directly and skip trace() entirely. Now
returns instantly.

Assisted-by: Claude (Opus 4.7) <noreply@anthropic.com>
@bachase
Copy link
Copy Markdown
Contributor Author

bachase commented May 2, 2026

Superseded by #52 — clean rewrite organized so each commit is correct on landing (no patches-on-patches), includes regression tests for the issues found in this PR's review, and adds a large-n benchmark fixture. Closing.

@bachase bachase closed this May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant