Improve commutation checking of Pauli product rotations and measurements#15815
Improve commutation checking of Pauli product rotations and measurements#15815Cryoris merged 28 commits intoQiskit:mainfrom
Conversation
For pairs of PPRs/PPMs we can construct generators as Paulis rather than SparseObservables, and we can check commutativity by checking the commutativity of Paulis.
|
One or more of the following people are relevant to this code:
|
| let max_q1 = qargs1.iter().map(|q| q.index()).max().unwrap_or(0); | ||
| let mut in_q1 = vec![usize::MAX; max_q1 + 1]; | ||
| for (i, &q) in qargs1.iter().enumerate() { | ||
| in_q1[q.index()] = i; | ||
| } |
There was a problem hiding this comment.
I think we discussed this offline but I don't remember the answer: would it be faster to sort both qargs (log timing) and then iterate, rather than finding the max (linear)?
There was a problem hiding this comment.
I tried this now and it became 30% slower, see #15815 for how the experiments were run.
There was a problem hiding this comment.
I agree that for robustness it's best to avoid linear scaling with the total number of qubits in the circuit, so I changed the implementation to use a HashMap instead of a vector. For large Pauli strings over all of circuit qubits (as in the HamLib experiment mentioned in the summary) this makes the implementation about 5% slower, however for short Pauli strings with large qubit indices this improves the implementation. I also moved the optional reversing of the operations to be done earlier, so that we are filling the hashmap with the smaller number of qubits. See 06cbb55 and f4a9df6.
|
An update: 087362f implements the "sorted vectors" intersection in addition to "unsorted vectors" intersection. Since Running the experiment from the code snippet in this PRs summary (that is, running On the other hand, skipping canonicalization for PPRs/PPMs in Note that |
Cryoris
left a comment
There was a problem hiding this comment.
The changes look good to me, but the test coverage is a bit thin. Could we (a) add tests checking PPM commutation with Gucci and (b) scramble the indices of the existing PPR tests in Gucci a bit more (right now there's only 1 index swap)?
* Improved tests for commutations between different types of pauli-based gates * Added tests with varying qubit indices
|
Marking this "on hold" because we need to fix commutation of pauli product measurement instructions with the same clbit (see #16023). |
|
Following the bugfix in #16023, I have updated the commutation checker to efficiently check the commutation of two PPMs writing to the same clbit (and removed the temporary function). I have also updated the commutative optimization pass to only canonicalize but not try to merge PPM gates (in particular it does not need to worry about commutativity of two PPM gates). @Cryoris, this is ready for review now. |
Coverage Report for CI Build 24510449984Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage increased (+0.01%) to 87.49%Details
Uncovered Changes
Coverage Regressions18 previously-covered lines in 5 files lost coverage.
Coverage Stats
💛 - Coveralls |
| // To check commutation of two Pauli-based gates, we extract their Pauli generators | ||
| // and check whether they commute. | ||
| // Note that we have previously removed all PPRs equivalent to identity up to a global | ||
| // phase, so this is both a necessary and a sufficient condition. |
There was a problem hiding this comment.
| // To check commutation of two Pauli-based gates, we extract their Pauli generators | |
| // and check whether they commute. | |
| // Note that we have previously removed all PPRs equivalent to identity up to a global | |
| // phase, so this is both a necessary and a sufficient condition. | |
| // To check commutation of two Pauli-based gates, we extract their Pauli generators | |
| // and check whether they commute. This is not done through the commutation checker, | |
| // since we here know that the Pauli strings are sorted by qubit index already, which | |
| // allows for a more efficient check. | |
| // | |
| // Note that we have previously removed all PPRs equivalent to identity up to a global | |
| // phase, so this is both a necessary and a sufficient condition. |
There was a problem hiding this comment.
For additional clarity, I have moved the Pauli-based commutation code to a separate function, and improved comments/docstrings, see fb03d53.
* moving commutation of pauli-based gates to a separate function * clarifying comments
| // As a result, commutation of Pauli generators is both a necessary and sufficient condition. | ||
| let (z1, x1, z2, x2) = match (op1, op2) { | ||
| (OperationRef::PauliProductMeasurement(_), _) => { | ||
| unreachable!( |
There was a problem hiding this comment.
unreachable is for unreachable statements in the compiled path -- but this is well reachable if someone calls this function with PPMs. It would be nice to make this safer.
We could for example change the type of op1 to be a &PauliProductRotation which avoids the fallible case and gives the caller the responsibility of giving the right object
There was a problem hiding this comment.
This line is truly unreachable, after the change in a685517. Since we can't merge PPMs with anything, we no longer try to commute it with other gates, meaning that the first gate passed to commute is not a PPM. I have also added explicit tests for circuits with multiple PPMs.
Actually, are there any other gates that can't be merged/canceled with anything?
There was a problem hiding this comment.
Further improved the unreachability message in a67a060.
| let tol = 1e-12_f64.max(1. - approximation_degree); | ||
| let error_cutoff_fn = |_inst: &PackedInstruction| -> f64 { tol }; |
There was a problem hiding this comment.
I'm still not thrilled about removing identities here, especially if we might be using a different default tolerance than the RemoveIdentityEquiv pass. The other inconsistent case is if we have a Target and RemoveIdentityEquiv takes it into account but CommutativeOptimization doesn't.
In #16070 we should add this Target support. Until then, can we at least use the same constant from the remove_identity_equiv.rs file as tolerance, to ensure we query it from the same place?
There was a problem hiding this comment.
Done in d13a51a. I now think that we should have a single place of truth to define MINIMUM_TOL for all passes (currently every pass redefines the same constant).
| circuits containing :class:`.PauliProductRotationGate` and :class:`.PauliProductMeasurement` | ||
| objects. | ||
| - | | ||
| The :class:`.CommutativeOptimization` transpiler pass now removes gates that are quivalent to |
There was a problem hiding this comment.
| The :class:`.CommutativeOptimization` transpiler pass now removes gates that are quivalent to | |
| The :class:`.CommutativeOptimization` transpiler pass now removes gates that are equivalent to |
There was a problem hiding this comment.
Oops, I can't type a sentence without a typo -- fixed in d13a51a. (And apparently, I can't also type a commit message with typos.)
Cryoris
left a comment
There was a problem hiding this comment.
Thanks for the improvements, Sasha!
Summary
This PR improves commutation checking of pairs of Pauli-based objects, that is of
PauliProductRotationGateandPauliProductMeasurement. Without this PR, we construct the generators for operations for PPRs and PPMs asSparseObservables and then check if the twoSparseObservables commute. With this PR, we first instead construct the generatoors as Paulis (represented using Z and X components) and check if two Paulis commute. The latter check is quite a bit faster than the former for large gates.Based on top of #15810.Details and comments
I run this on the 100 representative HamLib benchmarks from
benchpressusing the following scriptHere is the plot showing improvement in runtime (where commutation checker is used within
CommutativeOptimization) .Note that
CommutativeOptimizationdoes not actually improve the quality of any of these HamLib benchmarks, however it tends to make a huge number of commutativity checks.Here is an additional experiment is in the spirit of our FT compiler pipeline (in which case
CommutativeOptimizationcan remove/merge many gates).Without this PR, the average time for
CommutativeOptimizationis3.46seconds, with this PR it's0.29.LLM tools used
Used copilot to suggest various micro-optimization and alternative implementations, but in the end the implementation and all possible bugs are purely mine.
Additional optimization (as part of
CommutativeOptimization)In
CommutativeOptimizationpass we now sortqargsforPauliProductRotationGateandPauliProductMeasurementby qubit index. This allows an even more efficient implementation using the standard method to compute intersection of two sorted vectors.