Skip to content

Improve commutation checking of Pauli product rotations and measurements#15815

Merged
Cryoris merged 28 commits intoQiskit:mainfrom
alexanderivrii:opt_ppr_ppm_in_cc
Apr 23, 2026
Merged

Improve commutation checking of Pauli product rotations and measurements#15815
Cryoris merged 28 commits intoQiskit:mainfrom
alexanderivrii:opt_ppr_ppm_in_cc

Conversation

@alexanderivrii
Copy link
Copy Markdown
Member

@alexanderivrii alexanderivrii commented Mar 16, 2026

Summary

This PR improves commutation checking of pairs of Pauli-based objects, that is of PauliProductRotationGate and PauliProductMeasurement. Without this PR, we construct the generators for operations for PPRs and PPMs as SparseObservables and then check if the two SparseObservables commute. With this PR, we first instead construct the generatoors as Paulis (represented using Z and X components) and check if two Paulis commute. The latter check is quite a bit faster than the former for large gates.

Based on top of #15810.

Details and comments

I run this on the 100 representative HamLib benchmarks from benchpress using the following script

ham_records = json.load(open("./100_representative.json", "r"))

for i, h in enumerate(ham_records):
    nq = h.pop("ham_qubits")
    terms = h.pop("ham_hamlib_hamiltonian_terms")
    coefficients = h.pop("ham_hamlib_hamiltonian_coefficients")

    # Construct circuit from PPRs
    qc = QuantumCircuit(nq)
    for t, c in zip(terms, coefficients):
        ppr = PauliProductRotationGate(Pauli(t), c)
        qc.append(ppr, range(nq))

    # Convert to DAG
    dag = circuit_to_dag(qc)

    # Run commutative optimization on DAG, measuring the time
    time_start = time.perf_counter()
    dagt = CommutativeOptimization().run(dag)
    time_end = time.perf_counter()
    print(f"Test {i}: {time_end-time_start}")

Here is the plot showing improvement in runtime (where commutation checker is used within CommutativeOptimization) .

imrovement

Note that CommutativeOptimization does not actually improve the quality of any of these HamLib benchmarks, however it tends to make a huge number of commutativity checks.

Here is an additional experiment is in the spirit of our FT compiler pipeline (in which case CommutativeOptimization can remove/merge many gates).

qc = qft_circuit(1000)
qc = transpile(qc, basis_gates=get_clifford_gate_names()+["rz"])
qc = LitinskiTransformation(fix_clifford=False, use_ppr=True)(qc)
time_start = time.perf_counter()
qc = CommutativeOptimization()(qc)
time_end = time.perf_counter()
print(f"Time: {time_end-time_start:.4f}")

Without this PR, the average time for CommutativeOptimization is 3.46 seconds, with this PR it's 0.29.

LLM tools used

Used copilot to suggest various micro-optimization and alternative implementations, but in the end the implementation and all possible bugs are purely mine.

Additional optimization (as part of CommutativeOptimization)

In CommutativeOptimization pass we now sort qargs for PauliProductRotationGate and PauliProductMeasurement by qubit index. This allows an even more efficient implementation using the standard method to compute intersection of two sorted vectors.

For pairs of PPRs/PPMs we can construct generators as Paulis rather
than SparseObservables, and we can check commutativity by checking
the commutativity of Paulis.
@alexanderivrii alexanderivrii requested a review from a team as a code owner March 16, 2026 09:44
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Cryoris
  • @Qiskit/terra-core
  • @ajavadia

@alexanderivrii alexanderivrii added performance Changelog: Added Add an "Added" entry in the GitHub Release changelog. labels Mar 16, 2026
@alexanderivrii alexanderivrii added this to the 2.5.0 milestone Mar 16, 2026
@jan-an jan-an self-requested a review March 23, 2026 09:39
@ShellyGarion ShellyGarion added the fault tolerance related to fault tolerance compilation label Mar 26, 2026
Comment on lines +537 to +541
let max_q1 = qargs1.iter().map(|q| q.index()).max().unwrap_or(0);
let mut in_q1 = vec![usize::MAX; max_q1 + 1];
for (i, &q) in qargs1.iter().enumerate() {
in_q1[q.index()] = i;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this offline but I don't remember the answer: would it be faster to sort both qargs (log timing) and then iterate, rather than finding the max (linear)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this now and it became 30% slower, see #15815 for how the experiments were run.

Copy link
Copy Markdown
Member Author

@alexanderivrii alexanderivrii Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that for robustness it's best to avoid linear scaling with the total number of qubits in the circuit, so I changed the implementation to use a HashMap instead of a vector. For large Pauli strings over all of circuit qubits (as in the HamLib experiment mentioned in the summary) this makes the implementation about 5% slower, however for short Pauli strings with large qubit indices this improves the implementation. I also moved the optional reversing of the operations to be done earlier, so that we are filling the hashmap with the smaller number of qubits. See 06cbb55 and f4a9df6.

@alexanderivrii
Copy link
Copy Markdown
Member Author

alexanderivrii commented Mar 31, 2026

An update: 087362f implements the "sorted vectors" intersection in addition to "unsorted vectors" intersection. Since CommutativeOptimization now canonicalizes PPRs and PPMs by default (canonicalization for PPMs is added in de13108 in this PR), it uses "sorted vectors" intersection.

Running the experiment from the code snippet in this PRs summary (that is, running CommutativeOptimization on our 100 representative HamLib examples) further improves the time by about 25% (in addition to the improvement mentioned in the summary).

On the other hand, skipping canonicalization for PPRs/PPMs in CommutativeOptimization and sorting them inside commutation checker (as suggested in #15815) makes the total runtime slower by about 30%.

Note that CommutativeOptimization may check commutation of the same PPR with many other PPRs, so what these experiments show is that it's better to sort each PPR once and then use "sorted vectors" intersection, rather than sort each PPR every time it's used.

@alexanderivrii alexanderivrii requested review from a team and Cryoris March 31, 2026 16:08
Comment thread crates/transpiler/src/commutation_checker.rs Outdated
Copy link
Copy Markdown
Collaborator

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, but the test coverage is a bit thin. Could we (a) add tests checking PPM commutation with Gucci and (b) scramble the indices of the existing PPR tests in Gucci a bit more (right now there's only 1 index swap)?

* Improved tests for commutations between different types of pauli-based gates
* Added tests with varying qubit indices
@alexanderivrii alexanderivrii requested a review from Cryoris April 3, 2026 09:22
@alexanderivrii
Copy link
Copy Markdown
Member Author

Marking this "on hold" because we need to fix commutation of pauli product measurement instructions with the same clbit (see #16023).

@alexanderivrii alexanderivrii removed the on hold Can not fix yet label Apr 16, 2026
@alexanderivrii
Copy link
Copy Markdown
Member Author

Following the bugfix in #16023, I have updated the commutation checker to efficiently check the commutation of two PPMs writing to the same clbit (and removed the temporary function).

I have also updated the commutative optimization pass to only canonicalize but not try to merge PPM gates (in particular it does not need to worry about commutativity of two PPM gates).

@Cryoris, this is ready for review now.

@coveralls
Copy link
Copy Markdown

Coverage Report for CI Build 24510449984

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage increased (+0.01%) to 87.49%

Details

  • Coverage increased (+0.01%) from the base build.
  • Patch coverage: 4 uncovered changes across 1 file (115 of 119 lines covered, 96.64%).
  • 18 coverage regressions across 5 files.

Uncovered Changes

File Changed Covered %
crates/transpiler/src/passes/commutative_optimization.rs 63 59 93.65%

Coverage Regressions

18 previously-covered lines in 5 files lost coverage.

File Lines Losing Coverage Coverage
crates/circuit/src/parameter/symbol_expr.rs 6 73.93%
crates/qasm2/src/parse.rs 6 97.63%
crates/qasm2/src/lex.rs 4 91.77%
crates/circuit/src/parameter/parameter_expression.rs 1 90.53%
crates/transpiler/src/commutation_checker.rs 1 88.27%

Coverage Stats

Coverage Status
Relevant Lines: 119596
Covered Lines: 104634
Line Coverage: 87.49%
Coverage Strength: 979624.38 hits per line

💛 - Coveralls

Comment thread crates/transpiler/src/passes/commutative_optimization.rs Outdated
Comment on lines +306 to +309
// To check commutation of two Pauli-based gates, we extract their Pauli generators
// and check whether they commute.
// Note that we have previously removed all PPRs equivalent to identity up to a global
// phase, so this is both a necessary and a sufficient condition.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// To check commutation of two Pauli-based gates, we extract their Pauli generators
// and check whether they commute.
// Note that we have previously removed all PPRs equivalent to identity up to a global
// phase, so this is both a necessary and a sufficient condition.
// To check commutation of two Pauli-based gates, we extract their Pauli generators
// and check whether they commute. This is not done through the commutation checker,
// since we here know that the Pauli strings are sorted by qubit index already, which
// allows for a more efficient check.
//
// Note that we have previously removed all PPRs equivalent to identity up to a global
// phase, so this is both a necessary and a sufficient condition.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For additional clarity, I have moved the Pauli-based commutation code to a separate function, and improved comments/docstrings, see fb03d53.

Comment thread crates/transpiler/src/passes/commutative_optimization.rs
Comment thread crates/transpiler/src/passes/commutative_optimization.rs Outdated
Comment thread crates/transpiler/src/commutation_checker.rs
Comment thread crates/transpiler/src/commutation_checker.rs
Comment thread test/python/circuit/test_commutation_checker.py Outdated
* moving commutation of pauli-based gates to a separate function
* clarifying comments
@alexanderivrii alexanderivrii requested a review from Cryoris April 17, 2026 19:25
// As a result, commutation of Pauli generators is both a necessary and sufficient condition.
let (z1, x1, z2, x2) = match (op1, op2) {
(OperationRef::PauliProductMeasurement(_), _) => {
unreachable!(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreachable is for unreachable statements in the compiled path -- but this is well reachable if someone calls this function with PPMs. It would be nice to make this safer.

We could for example change the type of op1 to be a &PauliProductRotation which avoids the fallible case and gives the caller the responsibility of giving the right object

Copy link
Copy Markdown
Member Author

@alexanderivrii alexanderivrii Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is truly unreachable, after the change in a685517. Since we can't merge PPMs with anything, we no longer try to commute it with other gates, meaning that the first gate passed to commute is not a PPM. I have also added explicit tests for circuits with multiple PPMs.

Actually, are there any other gates that can't be merged/canceled with anything?

Copy link
Copy Markdown
Member Author

@alexanderivrii alexanderivrii Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further improved the unreachability message in a67a060.

@alexanderivrii alexanderivrii requested a review from Cryoris April 23, 2026 06:10
Comment on lines +572 to +573
let tol = 1e-12_f64.max(1. - approximation_degree);
let error_cutoff_fn = |_inst: &PackedInstruction| -> f64 { tol };
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not thrilled about removing identities here, especially if we might be using a different default tolerance than the RemoveIdentityEquiv pass. The other inconsistent case is if we have a Target and RemoveIdentityEquiv takes it into account but CommutativeOptimization doesn't.

In #16070 we should add this Target support. Until then, can we at least use the same constant from the remove_identity_equiv.rs file as tolerance, to ensure we query it from the same place?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d13a51a. I now think that we should have a single place of truth to define MINIMUM_TOL for all passes (currently every pass redefines the same constant).

circuits containing :class:`.PauliProductRotationGate` and :class:`.PauliProductMeasurement`
objects.
- |
The :class:`.CommutativeOptimization` transpiler pass now removes gates that are quivalent to
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The :class:`.CommutativeOptimization` transpiler pass now removes gates that are quivalent to
The :class:`.CommutativeOptimization` transpiler pass now removes gates that are equivalent to

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I can't type a sentence without a typo -- fixed in d13a51a. (And apparently, I can't also type a commit message with typos.)

Copy link
Copy Markdown
Collaborator

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements, Sasha!

@Cryoris Cryoris enabled auto-merge April 23, 2026 12:02
@Cryoris Cryoris added this pull request to the merge queue Apr 23, 2026
Merged via the queue into Qiskit:main with commit e8f058e Apr 23, 2026
26 checks passed
@github-project-automation github-project-automation Bot moved this from In review to Done in Qiskit 2.5 Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: Added Add an "Added" entry in the GitHub Release changelog. fault tolerance related to fault tolerance compilation performance

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants