Skip to content

Use nalgebra::Matrix4 as output for instructions_to_matrix#15871

Merged
alexanderivrii merged 2 commits intoQiskit:mainfrom
mtreinish:use-nalgebra-arrays-for-consolidate
Mar 25, 2026
Merged

Use nalgebra::Matrix4 as output for instructions_to_matrix#15871
alexanderivrii merged 2 commits intoQiskit:mainfrom
mtreinish:use-nalgebra-arrays-for-consolidate

Conversation

@mtreinish
Copy link
Copy Markdown
Member

@mtreinish mtreinish commented Mar 24, 2026

Summary

This commit switches the return type of
convert_2q_block_matrix::instructions_to_matrix() to return a nalgebra
Matrix4 rather than a dynamicly allocated ndarray Array2. The function
explicitly returns a 4x4 Complex64 matrix. Using an nalgebra fixed size
array is stack allocated and we avoid needing a dynamic allocation. This
should also speedup matrix multiplication since nalgebra can leverage
simd better (either directly or implicitly via the compiler) because it
knows the fixed operations needed. We were already setup towards doing
since #13649 which moved to using nalgebra internally for the 1q
component, but it didn't update the whole path in that PR to use an
nalgebra array for everything.

Ideally we'd be using nalgebra data types throughout the two qubit
decomposers too. We should still use faer for the more involved linear
algebra operations in the module but we should be using Matrix4 and
Matrix2 for fixed sized matrices where we know the size of the matrices
in that module and generate faer MatRefs using
qiskit_synthesis::linalg::nalgebra_to_faer() to do linear algebra with
the matrix. This PR is an incremental step towards doing that.

Details and comments

This PR is based on top of #15858 and will need to be rebased after that merges. In the meantime you can view the contents of just this PR by looking at the HEAD commit: 29a718d Rebased now

@mtreinish mtreinish added this to the 2.5.0 milestone Mar 24, 2026
@mtreinish mtreinish requested a review from a team as a code owner March 24, 2026 20:50
@mtreinish mtreinish added on hold Can not fix yet performance Changelog: None Do not include in the GitHub Release changelog. Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Mar 24, 2026
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@mtreinish
Copy link
Copy Markdown
Member Author

I ran a pair of quick asv benchmarks, although I don't trust these numbers since it seems like I had a fair amount of system noise and the variability on these numbers seems high. I'll do more thorough benchmarking after the parent PR merges and this is unblocked.

Benchmarks that have improved:

| Change   | Before [572045f5] <fixed-size-2q-matrices~1>   | After [29a718d8] <use-nalgebra-arrays-for-consolidate>   |   Ratio | Benchmark (Parameter)                                             |
|----------|------------------------------------------------|----------------------------------------------------------|---------|-------------------------------------------------------------------|
| -        | 8.58±0.08ms                                    | 7.77±0.07ms                                              |    0.91 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')   |
| -        | 78.4±1ms                                       | 68.9±2ms                                                 |    0.88 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz') |
| -        | 69.8±2ms                                       | 59.9±1ms                                                 |    0.86 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')            |
| -        | 435±20ms                                       | 367±10ms                                                 |    0.84 | utility_scale.UtilityScaleBenchmarks.time_qft('cz')               |
| -        | 250±9ms                                        | 204±5ms                                                  |    0.81 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')              |

Benchmarks that have stayed the same:

| Change   | Before [572045f5] <fixed-size-2q-matrices~1>   | After [29a718d8] <use-nalgebra-arrays-for-consolidate>   | Ratio   | Benchmark (Parameter)                                                         |
|----------|------------------------------------------------|----------------------------------------------------------|---------|-------------------------------------------------------------------------------|
|          | 36.5±0.8s                                      | 27.4±0.4s                                                | ~0.75   | utility_scale.UtilityScaleBenchmarks.time_hwb12('cz')                         |
|          | 61.5±1ms                                       | 66.8±2ms                                                 | 1.09    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')            |
|          | 6.12±0.03s                                     | 6.46±0.04s                                               | 1.06    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cx')                    |
|          | 6.08±0.02s                                     | 6.46±0.02s                                               | 1.06    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cz')                    |
|          | 6.05±0.04s                                     | 6.41±0.1s                                                | 1.06    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('ecr')                   |
|          | 5.85±0.1ms                                     | 6.11±0.2ms                                               | 1.05    | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')                       |
|          | 63.8±5ms                                       | 66.4±3ms                                                 | 1.04    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                        |
|          | 247±2ms                                        | 256±2ms                                                  | 1.04    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                           |
|          | 383±20ms                                       | 394±10ms                                                 | 1.03    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                          |
|          | 16.7±0.2s                                      | 16.9±0.1s                                                | 1.02    | utility_scale.UtilityScaleBenchmarks.time_hwb12('cx')                         |
|          | 83.7±0.4ms                                     | 85.4±1ms                                                 | 1.02    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')               |
|          | 27.7±0.6ms                                     | 28.2±0.9ms                                               | 1.02    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')  |
|          | 456±20ms                                       | 459±4ms                                                  | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cz')                   |
|          | 28.5±1ms                                       | 28.9±0.9ms                                               | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')  |
|          | 141±2ms                                        | 142±3ms                                                  | 1.01    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                          |
|          | 26.0±0.6ms                                     | 25.9±0.4ms                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                      |
|          | 84.7±2ms                                       | 84.8±0.3ms                                               | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                |
|          | 5.91±0.3ms                                     | 5.83±0.1ms                                               | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')                        |
|          | 26.2±1ms                                       | 26.0±0.3ms                                               | 0.99    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                       |
|          | 85.5±2ms                                       | 84.6±0.7ms                                               | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                |
|          | 62.1±2ms                                       | 60.7±0.9ms                                               | 0.98    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                       |
|          | 22.4±0.9ms                                     | 22.0±0.8ms                                               | 0.98    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                       |
|          | 25.4±0.4s                                      | 24.9±0.3s                                                | 0.98    | utility_scale.UtilityScaleBenchmarks.time_hwb12('ecr')                        |
|          | 461±20ms                                       | 453±9ms                                                  | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('ecr')                  |
|          | 60.6±2ms                                       | 59.6±0.9ms                                               | 0.98    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')             |
|          | 8.05±0.2ms                                     | 7.85±0.1ms                                               | 0.97    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')              |
|          | 462±8ms                                        | 449±30ms                                                 | 0.97    | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                            |
|          | 189±8ms                                        | 182±5ms                                                  | 0.96    | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                         |
|          | 6.13±0.2ms                                     | 5.83±0.2ms                                               | 0.95    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')                        |
|          | 478±20ms                                       | 447±3ms                                                  | 0.94    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cx')                   |
|          | 8.41±0.3ms                                     | 7.91±0.1ms                                               | 0.94    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')               |
|          | 29.3±0.4ms                                     | 27.6±0.2ms                                               | 0.94    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr') |
|          | 404±20ms                                       | 378±8ms                                                  | 0.94    | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                            |
|          | 473±10ms                                       | 441±5ms                                                  | 0.93    | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                           |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

Copy link
Copy Markdown
Member

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matthew, overall this looks good to me, just two minor questions.

Comment on lines +94 to +98
let OperationRef::Gate(gate) = inst.op.view() else {
return Err(QiskitError::new_err(
"Can't compute matrix of non-unitary op",
));
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also consider OperationRef::Operation variant here (and if so, then also in get_matrix_from_inst)?

We should also extend this to PauliProductRotation, but this can be done for all relevant functions in a separate PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mirrored what was already there. I think we do want to expand this for any unitary operation. So this will be PPR and Operations that have a unitary matrix. Ideally we'd probably cover PPR and unitary OperationRef::Operations in PackedInstruction::try_matrix_as_nalgebra_2q this check is just the fallback for whether we call out to python and use quantum_info.Operator to compute the unitary from a gate's definition.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said I think we should fix this in a separate PR that fixes all the matrix methods

Comment thread crates/synthesis/src/linalg/mod.rs Outdated
Comment on lines +47 to +68
static IDENTITY_2Q: Matrix4<Complex64> = Matrix4::new(
// Row 1
Complex64::ONE,
Complex64::ZERO,
Complex64::ZERO,
Complex64::ZERO,
// Row 2
Complex64::ZERO,
Complex64::ONE,
Complex64::ZERO,
Complex64::ZERO,
// Row 3
Complex64::ZERO,
Complex64::ZERO,
Complex64::ONE,
Complex64::ZERO,
// Row 4
Complex64::ZERO,
Complex64::ZERO,
Complex64::ZERO,
Complex64::ONE,
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this matrix: no need to think whether the elements are passed in row-major or column-major order 😄

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I would have been able to use Matrix4::identity() here, but it's not a const function so I couldn't use it in a static context. This was the best way I could come up with, but I also do a double take every time I work with nalgebra about row major vs col major because I've gotten it backwards too many times.

This commit switches the return type of
convert_2q_block_matrix::instructions_to_matrix() to return a nalgebra
Matrix4 rather than a dynamicly allocated ndarray Array2. The function
explicitly returns a 4x4 Complex64 matrix. Using an nalgebra fixed size
array is stack allocated and we avoid needing a dynamic allocation. This
should also speedup matrix multiplication since nalgebra can leverage
simd better (either directly or implicitly via the compiler) because it
knows the fixed operations needed. We were already setup towards doing
since Qiskit#13649 which moved to using nalgebra internally for the 1q
component, but it didn't update the whole path in that PR to use an
nalgebra array for everything.

Ideally we'd be using nalgebra data types throughout the two qubit
decomposers too. We should still use faer for the more involved linear
algebra operations in the module but we should be using Matrix4 and
Matrix2 for fixed sized matrices where we know the size of the matrices
in that module and generate faer MatRefs using
qiskit_synthesis::linalg::nalgebra_to_faer() to do linear algebra with
the matrix. This PR is an incremental step towards doing that.
@mtreinish mtreinish force-pushed the use-nalgebra-arrays-for-consolidate branch from 29a718d to 35f8d27 Compare March 25, 2026 13:30
@mtreinish mtreinish removed the on hold Can not fix yet label Mar 25, 2026
Copy link
Copy Markdown
Member

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM!

@alexanderivrii alexanderivrii added this pull request to the merge queue Mar 25, 2026
Merged via the queue into Qiskit:main with commit 3e80326 Mar 25, 2026
25 checks passed
@mtreinish mtreinish deleted the use-nalgebra-arrays-for-consolidate branch March 25, 2026 15:10
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Mar 26, 2026
In the recently merged Qiskit#15871 we updated the consolidate blocks pass to
use Matrix4 in the common path of a 2q block being consolidated. This is
like > 90% of what the pass does when run in the preset pass manager.
However, there were uncommon cases in the pass around the handling of
blocks of a single gate that are outside of the target which were not
updated to use nalgebra arrays if it's a fixed size 1q or 2q gate. This
commit updates these uncommon paths so that we're always returning an
nalgebra matrix in the output UnitaryGate if the block being consolidated
is a single qubit or two qubits.
github-merge-queue Bot pushed a commit that referenced this pull request Mar 27, 2026
In the recently merged #15871 we updated the consolidate blocks pass to
use Matrix4 in the common path of a 2q block being consolidated. This is
like > 90% of what the pass does when run in the preset pass manager.
However, there were uncommon cases in the pass around the handling of
blocks of a single gate that are outside of the target which were not
updated to use nalgebra arrays if it's a fixed size 1q or 2q gate. This
commit updates these uncommon paths so that we're always returning an
nalgebra matrix in the output UnitaryGate if the block being consolidated
is a single qubit or two qubits.
gadial pushed a commit to gadial/qiskit that referenced this pull request Mar 29, 2026
…t#15881)

In the recently merged Qiskit#15871 we updated the consolidate blocks pass to
use Matrix4 in the common path of a 2q block being consolidated. This is
like > 90% of what the pass does when run in the preset pass manager.
However, there were uncommon cases in the pass around the handling of
blocks of a single gate that are outside of the target which were not
updated to use nalgebra arrays if it's a fixed size 1q or 2q gate. This
commit updates these uncommon paths so that we're always returning an
nalgebra matrix in the output UnitaryGate if the block being consolidated
is a single qubit or two qubits.
@github-project-automation github-project-automation Bot moved this from Ready to Done in Qiskit 2.5 Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: None Do not include in the GitHub Release changelog. mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants