Skip to content

Use nalgebra Matrix2 internally in the TwoQubitBasisDecomposer#15928

Merged
raynelfss merged 4 commits intoQiskit:mainfrom
mtreinish:matrix2-basis-decomp
Apr 10, 2026
Merged

Use nalgebra Matrix2 internally in the TwoQubitBasisDecomposer#15928
raynelfss merged 4 commits intoQiskit:mainfrom
mtreinish:matrix2-basis-decomp

Conversation

@mtreinish
Copy link
Copy Markdown
Member

Summary

This commit moves to using Matrix2 as the array type used internally for the TwoQubitBasisDecomposer. Matrix2 is a fixed size stack allocated matrix type that has several performance advantages especially for matmul because the compiler can reason about a fixed number of operations and better optimize the implementation. Similarly we avoid a lot of heap allocations. This will improve the runtime performance of the two qubit basis decomposer.

This is part of the ongoing effort to move to using nalgebra's fixed size matrix types Matrix4 and Matrix2 inside all of the two qubit decomposer paths. We will still use faer for the involved linear algebra such as eigenvalue decomposition where it is faster and more numerically stable. This doesn't get us all the way to this goal, it's just another step on the journey.

There are still places in the module that are using ndarray as the array types, this is mostly because they're used with either the Weyl decomposition or the one qubit euler decomposition. In particular there are a couple of duplicate methods either prefixed or postfixed with nalgebra to either return or convert to/from an nalgebra object which are temporary while we're in the middle of the transition. The goal is to remove these as we migrate the rest of the two qubit decomposers to be using nalgebra for the storage type.

Details and comments

This commit moves to using Matrix2 as the array type used internally for
the TwoQubitBasisDecomposer. Matrix2 is a fixed size stack allocated
matrix type that has several performance advantages especially for
matmul because the compiler can reason about a fixed number of
operations and better optimize the implementation. Similarly we avoid a
lot of heap allocations. This will improve the runtime performance of
the two qubit basis decomposer.

This is part of the ongoing effort to move to using nalgebra's fixed
size matrix types Matrix4 and Matrix2 inside all of the two qubit
decomposer paths. We will still use faer for the involved linear
algebra such as eigenvalue decomposition where it is faster and more
numerically stable. This doesn't get us all the way to this goal, it's
just another step on the journey.

There are still places in the module that are using ndarray as the array
types, this is mostly because they're used with either the Weyl
decomposition or the one qubit euler decomposition. In particular there
are a couple of duplicate methods either prefixed or postfixed with
nalgebra to either return or convert to/from an nalgebra object which
are temporary while we're in the middle of the transition. The goal is
to remove these as we migrate the rest of the two qubit decomposers to
be using nalgebra for the storage type.
@mtreinish mtreinish added this to the 2.5.0 milestone Mar 31, 2026
@mtreinish mtreinish requested a review from a team as a code owner March 31, 2026 23:37
@mtreinish mtreinish added performance Changelog: None Do not include in the GitHub Release changelog. Rust This PR or issue is related to Rust code in the repository labels Mar 31, 2026
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @levbishop

@mtreinish
Copy link
Copy Markdown
Member Author

mtreinish commented Mar 31, 2026

I ran some asv benchmarks and there wasn't much of an improvement. Asv didn't have any confidence in a significant change in the numbers it reported. But as I said in the commit message this is just one step towards enabling using Matrix2 and Matrix4 for the array type in the decomposer everywhere.

Benchmarks that have stayed the same:

| Change   | Before [29fdcad7]    | After [8dfbc15d]    |   Ratio | Benchmark (Parameter)                                                                           |
|----------|----------------------|---------------------|---------|-------------------------------------------------------------------------------------------------|
|          | 3.07±0.01ms          | 3.09±0.02ms         |    1.01 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')                                          |
|          | 3.06±0.01ms          | 3.09±0.01ms         |    1.01 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')                                          |
|          | 150±0.4ms            | 151±0.4ms           |    1.01 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                                            |
|          | 33.4±0.2ms           | 33.2±0.1ms          |    1    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                                          |
|          | 33.8±0.2ms           | 33.8±0.2ms          |    1    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                                         |
|          | 3.07±0.01ms          | 3.08±0.02ms         |    1    | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')                                         |
|          | 20.6±0.02s           | 20.6±0.02s          |    1    | utility_scale.UtilityScaleBenchmarks.time_hwb12('cz')                                           |
|          | 19.6±0.05s           | 19.5±0.01s          |    1    | utility_scale.UtilityScaleBenchmarks.time_hwb12('ecr')                                          |
|          | 222±3ms              | 222±0.8ms           |    1    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('ecr')                                    |
|          | 179±0.7ms            | 179±0.5ms           |    1    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                                            |
|          | 171±0.5ms            | 171±0.5ms           |    1    | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                                           |
|          | 295±0.6ms            | 294±0.8ms           |    1    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                             |
|          | 397±1ms              | 396±0.6ms           |    1    | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                                              |
|          | 398±0.3ms            | 397±0.7ms           |    1    | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                                             |
|          | 47.8±0.1ms           | 47.8±0.5ms          |    1    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')                              |
|          | 3.63±0.01ms          | 3.60±0.01ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)                   |
|          | 5.67±0.03ms          | 5.62±0.03ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                   |
|          | 5.27±0.01ms          | 5.22±0.01ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)                   |
|          | 5.47±0.03ms          | 5.43±0.01ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)                   |
|          | 14.4±0.1ms           | 14.3±0.05ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0) |
|          | 19.1±0.06ms          | 18.9±0.1ms          |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1) |
|          | 4.05±0.01ms          | 3.99±0.01ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2) |
|          | 6.25±0.1ms           | 6.16±0.1ms          |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                        |
|          | 36.5±0.3ms           | 36.3±0.3ms          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                                          |
|          | 11.5±0.2ms           | 11.4±0.1ms          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                                         |
|          | 13.2±0.1ms           | 13.1±0.2ms          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                                         |
|          | 15.3±0.01s           | 15.2±0.02s          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_hwb12('cx')                                           |
|          | 223±2ms              | 221±0.9ms           |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cz')                                     |
|          | 3.79±0.02ms          | 3.73±0.01ms         |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')                                |
|          | 41.2±0.1ms           | 40.8±0.2ms          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')                                 |
|          | 13.2±0.05ms          | 13.1±0.01ms         |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')                    |
|          | 13.2±0.04ms          | 13.1±0.03ms         |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                    |
|          | 13.3±0.07ms          | 13.1±0.08ms         |    0.99 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')                   |
|          | 249±0.6ms            | 247±0.7ms           |    0.99 | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                                             |
|          | 301±1ms              | 299±1ms             |    0.99 | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                            |
|          | 44.3±0.1ms           | 43.8±0.2ms          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')                               |
|          | 50.1±0.5ms           | 49.8±0.2ms          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')                               |
|          | 4.24±0.01ms          | 4.16±0ms            |    0.98 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3) |
|          | 9.18±0.1ms           | 8.95±0.07ms         |    0.98 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                        |
|          | 13.5±0.09ms          | 13.2±0.2ms          |    0.98 | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                                        |
|          | 224±3ms              | 221±1ms             |    0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cx')                                     |
|          | 3.80±0.03ms          | 3.74±0.01ms         |    0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')                                 |
|          | 3.81±0.01ms          | 3.73±0.01ms         |    0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                 |
|          | 41.4±0.3ms           | 40.6±0.2ms          |    0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                                  |
|          | 41.3±0.2ms           | 40.7±0.1ms          |    0.98 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                                  |
|          | 384±2ms              | 377±0.6ms           |    0.98 | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                                              |
|          | 3.04±0.1s            | 2.95±0.07s          |    0.97 | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cz')                                      |
|          | 21.2±0.2ms           | 20.4±0.08ms         |    0.96 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                        |
|          | 19.0±0.2ms           | 18.0±0.08ms         |    0.95 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)                        |
|          | 3.12±0.02s           | 2.93±0.07s          |    0.94 | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cx')                                      |
|          | 3.11±0.06s           | 2.88±0.03s          |    0.93 | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('ecr')                                     |

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Copy link
Copy Markdown
Contributor

@raynelfss raynelfss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find anything particularly concerning in the code here. I just had a very minor comment.

Comment thread crates/synthesis/src/two_qubit_decompose/basis_decomposer.rs Outdated
Copy link
Copy Markdown
Member

@ShellyGarion ShellyGarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor comments.
In addition, is there a plan to convert the other two-qubit decomosers (TwoQubitWyelDecomposer and TwoQubitControlledUDecomposer) into nalgeba as well?

use super::common::{
DEFAULT_FIDELITY, IPZ, TraceToFidelity, rx_matrix, rz_matrix, transpose_conjugate,
};
use super::common::{DEFAULT_FIDELITY, TraceToFidelity, rx_matrix_nalgebra, rz_matrix_nalgebra};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that IPZ has been moved to common.rs since it's also used in weyl_decompositions.rs

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary just for this PR as the version in common.rs is replaced with a Matrix2 version in the next PR: #15960 That PR moves all the static IPZ, IPY, and IPX definitions to use Matrix2 instead of [[Complex64; 2]; 2] as all the usage has been moved to nalgebra in that PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this was just a bad mege with PR #15880, but if you prefer to merge this PR now and fix it in #15960 then it's OK with me.

use qiskit_circuit::{NoBlocks, Qubit};
use qiskit_util::alias::GateArray1Q;
use qiskit_util::complex::{C_M_ONE, C_ONE, IM, M_IM, c64};
use qiskit_util::complex::{C_M_ONE, C_ONE, C_ZERO, IM, M_IM, c64};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't define IPZ in this file, then C_ZERO is not needed here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary just for this PR as the version in common.rs is replaced with a Matrix2 version in the next PR: #15960 That PR moves all the static IPZ, IPY, and IPX definitions to use Matrix2 instead of [[Complex64; 2]; 2] as all the usage has been moved to nalgebra in that PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[c64(0., FRAC_1_SQRT_2), c64(FRAC_1_SQRT_2, 0.)],
[c64(-FRAC_1_SQRT_2, 0.), c64(0., -FRAC_1_SQRT_2)],
];
static IPZ: Matrix2<Complex64> = Matrix2::new(IM, C_ZERO, C_ZERO, M_IM);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that IPZ has been moved to common.rs since it's also used in weyl_decompositions.rs

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary just for this PR as the version in common.rs is replaced with a Matrix2 version in the next PR: #15960 That PR moves all the static IPZ, IPY, and IPX definitions to use Matrix2 instead of [[Complex64; 2]; 2] as all the usage has been moved to nalgebra in that PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static IPZ: Matrix2<Complex64> = Matrix2::new(IM, C_ZERO, C_ZERO, M_IM);

static HGATE: Matrix2<Complex64> =
Matrix2::new(H_GATE[0][0], H_GATE[0][1], H_GATE[1][0], H_GATE[1][1]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name HGATE may be a bit confusing (with H_GATE). Maybe call it HGATE_matrix or H_matrix ?
also, why not use the matrix method of the standard gates here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename the variable name I just picked something that wouldn't conflict with the static being imported here. But it needs to be all capital letters as a static, rustfmt will complain otherwise.

The reason I didn't use the StandardGate::matrix() method here is that the matrix method is not defined as a const method and I can't call it from a static context. The matrix method in particular is written to return an owned Array2<Complex64> and we wouldn't be able to use that in a const function anyway because it relies on dynamic memory allocation. We might be able to make the matrix_as_static_1q and matrix_as_static_2q methods const functions, but it would require everything in them to be defined as const functions.

I wanted this to be a static so we only have a single Matrix2 that is use by reference for the Hadamard matrix when we need it. This avoid allocating a temporary 2x2 matrix in any form when we need it. The problem with the existing static is it's a [[Complex64; 2]; 2] which isn't compatible with nalgebra types for matrix multiplication or other operations, so I needed a static that was a Matrix2 type for this use case.

Copy link
Copy Markdown
Member

@ShellyGarion ShellyGarion Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe HGATE_MATRIX? or H_GATE_MATRIX?

#[inline]
pub fn ndarray_to_matrix2<T: Copy>(view: ArrayView2<T>) -> Matrix2<T> {
Matrix2::new(view[[0, 0]], view[(0, 1)], view[(1, 0)], view[(1, 1)])
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in https://github.com/Qiskit/qiskit/blob/main/crates/synthesis/src/linalg/mod.rs there are several methods to convert ndarrays to and from nalgbra and faer.
perhaps it's worth to move this function there too?

Comment thread crates/synthesis/src/two_qubit_decompose/basis_decomposer.rs Outdated
@mtreinish
Copy link
Copy Markdown
Member Author

In addition, is there a plan to convert the other two-qubit decomosers (TwoQubitWyelDecomposer and TwoQubitControlledUDecomposer) into nalgeba as well?

Yes as I mentioned in the commit message/PR summary this is just an incremental step towards migrating to use nalgebra for fixed size small matrices in the two qubit decomposer code. I have the second PR open for the TwoQubitWeylDecomposition already: #15960. Several of your comments here are already fixed in that PR as some of the statics move to the common module in that PR. As I mentioned in the PR summary there are several temporary things I did in this PR to make it work as a standalone PR. This is just an initial incremental step that is self contained and things get a bit cleaner when all the pieces are finalized. I wanted to do this as smaller PRs per decomposer because the logic is particularly dense in all of these modules and keeping the changes as minimal as possible makes much easier to follow.

Copy link
Copy Markdown
Member

@ShellyGarion ShellyGarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some minor minor comments on the names and moving some code (I think this was just a bad mege after PR #15880), but if you prefer to merge this PR now and fix it in #15960 then it's OK with me.

@raynelfss raynelfss added this pull request to the merge queue Apr 10, 2026
Merged via the queue into Qiskit:main with commit a2c1136 Apr 10, 2026
26 checks passed
@mtreinish mtreinish deleted the matrix2-basis-decomp branch April 13, 2026 11:26
@ShellyGarion ShellyGarion self-assigned this Apr 14, 2026
@github-project-automation github-project-automation Bot moved this from Ready to Done in Qiskit 2.5 Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: None Do not include in the GitHub Release changelog. performance Rust This PR or issue is related to Rust code in the repository

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants