Make Optimize1qGatesDecomposition multithreaded by mtreinish · Pull Request #15567 · Qiskit/qiskit

mtreinish · 2026-01-14T01:20:55Z

Summary

This commit switches to make optimize1qgatesdecomposition a parallel transpiler pass. After we collect the 1q runs in the dag the step of computing the unitary matrix for each run and synthesizing it has no data dependency and can be run in parallel without issue. However updating the dag with synthesis results is still serial so there are limits to how much can be parallelized here. Additionally, in the previous serial version the target euler bases to try were computed eagerly the first time a qubit with a run was encountered. This would force a data dependency between the threads which either require locking or precomputing the target bases, which is what this commit does. This means that instantiating the pass object is slower and we're potentially doing more work up front than is strictly necessary. However this does have the advantage of being amortizable over multiple executions of the pass which before it was not.

A quick experiment was run to determine that when there are roughly 100,000 runs to process (all runs of 20 gates) is the crossover point for the parallel version vs the serial version. This was used to set a run count that is used to select between a serial and parallel version of the algorithm.

Details and comments

TODO:

Fix error cases to address test failures
Document that the pass is multithreaded in both Python and C
Adjust run count number used to switch to parallel, 100,000 number may not be great (I feel it's too high)

This commit switches to make optimize1qgatesdecomposition a parallel transpiler pass. After we collect the 1q runs in the dag the step of computing the unitary matrix for each run and synthesizing it has no data dependency and can be run in parallel without issue. However updating the dag with synthesis results is still serial so there are limits to how much can be parallelized here. Additionally, in the previous serial version the target euler bases to try were computed eagerly the first time a qubit with a run was encountered. This would force a data dependency between the threads which either require locking or precomputing the target bases, which is what this commit does. This means that instantiating the pass object is slower and we're potentially doing more work up front than is strictly necessary. However this does have the advantage of being amortizable over multiple executions of the pass which before it was not. A quick experiment was run to determine that when there are roughly 100,000 runs to process (all runs of 20 gates) is the crossover point for the parallel version vs the serial version. This was used to set a run count that is used to select between a serial and parallel version of the algorithm.

This commit reworks the change in logic from the previous commit to no longer pre-compute the euler basis set for each qubit regardless of whether it's used or not. The state object used to store the basis gates and euler basis sets is kept as this enables more efficient patterns on multiple runs of the pass. Now the state uses OnceLock to enable each thread to lazily populate the state on the first run of a qubit. This saves the construction time overhead if qubits never have runs but keeps the advantages of reused state.

In earlier commits a crossover value of 100,000 runs was used to switch between serial and parallel runs. This was based on a scaling experiment that indicated this was about when parallel became faster. But further testing is showing this not to be as clear cut. Until we make a determination around that and finalize the implementation this commit leaves the value there as a TODO and the pass is always multithreaded unless in a multiprocessing context.

In the earlier commit moving to use lazy initialization this wasn't tested in a parallel context previously and the method of initialization wasn't atomic which led to a race condition between threads trying to populate runs on the same qubit concurrently. This commit fixes this by adjusting the OnceLock usage to properly use the API for initialization to fix this issue.

coveralls · 2026-01-14T21:14:42Z

Pull Request Test Coverage Report for Build 21009131987

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

241 of 302 (79.8%) changed or added relevant lines in 5 files are covered.
589 unchanged lines in 8 files lost coverage.
Overall coverage decreased (-0.04%) to 88.279%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/transpiler/src/passes/optimize_1q_gates_decomposition.rs	219	280	78.21%

Files with Coverage Reduction	New Missed Lines	%
crates/transpiler/src/passes/optimize_1q_gates_decomposition.rs	1	81.79%
crates/synthesis/src/euler_one_qubit_decomposer.rs	3	90.79%
crates/cext/src/dag.rs	4	86.79%
crates/qasm2/src/lex.rs	6	91.0%
crates/circuit/src/converters.rs	9	92.37%
crates/qasm2/src/parse.rs	12	96.15%
crates/transpiler/src/passes/disjoint_layout.rs	33	90.09%
crates/circuit/src/dag_circuit.rs	521	85.16%

Totals
Change from base Build 20969635435:	-0.04%
Covered Lines:	97086
Relevant Lines:	109976

💛 - Coveralls

mtreinish added this to the 2.4.0 milestone Jan 14, 2026

mtreinish added the on hold Can not fix yet label Jan 14, 2026

github-project-automation Bot added this to Qiskit 2.4 Jan 14, 2026

mtreinish added performance Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Jan 14, 2026

github-project-automation Bot moved this to Ready in Qiskit 2.4 Jan 14, 2026

mtreinish added 3 commits January 14, 2026 14:47

Merge remote-tracking branch 'origin/main' into parallel-o1qgd

417a094

mtreinish modified the milestones: 2.4.0, 2.5.0 Mar 11, 2026

ShellyGarion removed this from Qiskit 2.4 Mar 12, 2026

ShellyGarion added this to Qiskit 2.5 Mar 12, 2026

github-project-automation Bot moved this to Ready in Qiskit 2.5 Mar 12, 2026

mtreinish added 2 commits March 27, 2026 08:26

Merge remote-tracking branch 'origin/main' into parallel-o1qgd

2d52562

Fix rustfmt

386024a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Optimize1qGatesDecomposition multithreaded#15567

Make Optimize1qGatesDecomposition multithreaded#15567
mtreinish wants to merge 7 commits intoQiskit:mainfrom
mtreinish:parallel-o1qgd

mtreinish commented Jan 14, 2026 •

edited

Loading

Uh oh!

coveralls commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mtreinish commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details and comments

Uh oh!

coveralls commented Jan 14, 2026

Pull Request Test Coverage Report for Build 21009131987

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mtreinish commented Jan 14, 2026 •

edited

Loading