Skip to content

Rust implementation of the QPY module#14166

Merged
alexanderivrii merged 143 commits intoQiskit:mainfrom
gadial:rust_qpy
Jan 18, 2026
Merged

Rust implementation of the QPY module#14166
alexanderivrii merged 143 commits intoQiskit:mainfrom
gadial:rust_qpy

Conversation

@gadial
Copy link
Copy Markdown
Contributor

@gadial gadial commented Apr 3, 2025

Summary

Adds a rust-based implementation of qpy serialization.

Addresses #13131

Details and comments

We use rust structs along with the binrw library to specify and write the byte representation of data (this turns out to be somewhat simpler than how it's currently done in python). There are still many python-dependent elements we need to take into consideration:

  1. Backwards compatibility is currently not supported; whenever an older version is passed, the Python version is used instead. The goal of this PR is to establish a baseline from which it would be relatively easy to add support for previous versions.
  2. File header is still created in Python space; the entrypoint for the rust code is write_circuit and read_circuit (this is being kept until a later PR adds backward compatability)
  3. numpy object serialization is done via python call to numpy's save.

In this PR we make the distinction between packing an object and serializing it:

  • serializing an object means generating a bytes representation for it (a Vec<u8> object, aliased to Bytes in the module.
  • packing an object means extracting its data fields and generating a corresponding rust struct, taken from the formats.rs files. This might involve serializing some of the object's fields, and packing other fields.

Our goal is to minimize serialization as much as possible, allowing us to generate a large packed struct reflecting the structure of the qpy file as much as possible before serializing; the code can be further improved to reflect this, although I think this can wait to a future PR.

The main obstacle to minimizing serialization is the relatively ad-hoc manner in which the Python QPY code is written; data is treated as a sequence of bytes until the very last minute, when it is deserialized based on the logic of the current situation. In most cases this translates smoothly to our Rust framework, but not always.

Future Work

  1. Backwards compatibility at least up to QPY 13.
  2. QPY 18 version, which will enable many small improvements listed in QPY 18 Possible changes #15524
  3. Additional tests, in particular comparing the python and rust outputs while we still have python qpy in place.

Benchmarking results

Taking the ASV benchmarks for qasm3 and using them with qpy (both for dumping and for dumping + loading flow) yields on my local machine


All benchmarks:

| Change   | Before [7b6c2206] <main>   | After [542982f7] <rust_qpy>   |   Ratio | Benchmark (Parameter)                                                                   |
|----------|----------------------------|-------------------------------|---------|-----------------------------------------------------------------------------------------|
| -        | 658±10ms                   | 307±6ms                       |    0.47 | qpy.CustomGateBenchmarks.time_dump(200, 100) [IBM-PW0KY600/virtualenv-py3.12]           |
| -        | 751±20ms                   | 292±6ms                       |    0.39 | qpy.CustomGateBenchmarks.time_dump(200, 100) [IBM-PW0KY600/virtualenv-py3.13]           |
| -        | 1.52±0.1s                  | 510±10ms                      |    0.34 | qpy.CustomGateBenchmarks.time_dump_and_load(200, 100) [IBM-PW0KY600/virtualenv-py3.12]  |
| -        | 1.68±0.02s                 | 494±20ms                      |    0.29 | qpy.CustomGateBenchmarks.time_dump_and_load(200, 100) [IBM-PW0KY600/virtualenv-py3.13]  |
| -        | 167±2μs                    | 38.9±0.9μs                    |    0.23 | qpy.ParameterizedBenchmarks.time_dump(20, 1) [IBM-PW0KY600/virtualenv-py3.12]           |
| -        | 184±5μs                    | 33.7±1μs                      |    0.18 | qpy.ParameterizedBenchmarks.time_dump(20, 1) [IBM-PW0KY600/virtualenv-py3.13]           |
| -        | 1.32±0.03ms                | 236±8μs                       |    0.18 | qpy.ParameterizedBenchmarks.time_dump(20, 10) [IBM-PW0KY600/virtualenv-py3.12]          |
| -        | 1.48±0.05ms                | 218±10μs                      |    0.15 | qpy.ParameterizedBenchmarks.time_dump(20, 10) [IBM-PW0KY600/virtualenv-py3.13]          |
| -        | 670±8μs                    | 129±10μs                      |    0.19 | qpy.ParameterizedBenchmarks.time_dump(20, 5) [IBM-PW0KY600/virtualenv-py3.12]           |
| -        | 778±40μs                   | 113±4μs                       |    0.14 | qpy.ParameterizedBenchmarks.time_dump(20, 5) [IBM-PW0KY600/virtualenv-py3.13]           |
| -        | 394±10μs                   | 82.1±2μs                      |    0.21 | qpy.ParameterizedBenchmarks.time_dump(50, 1) [IBM-PW0KY600/virtualenv-py3.12]           |
| -        | 432±20μs                   | 72.1±3μs                      |    0.17 | qpy.ParameterizedBenchmarks.time_dump(50, 1) [IBM-PW0KY600/virtualenv-py3.13]           |
| -        | 3.52±0.08ms                | 653±80μs                      |    0.19 | qpy.ParameterizedBenchmarks.time_dump(50, 10) [IBM-PW0KY600/virtualenv-py3.12]          |
| -        | 3.75±0.2ms                 | 513±10μs                      |    0.14 | qpy.ParameterizedBenchmarks.time_dump(50, 10) [IBM-PW0KY600/virtualenv-py3.13]          |
| -        | 1.67±0.03ms                | 307±8μs                       |    0.18 | qpy.ParameterizedBenchmarks.time_dump(50, 5) [IBM-PW0KY600/virtualenv-py3.12]           |
| -        | 1.93±0.08ms                | 273±10μs                      |    0.14 | qpy.ParameterizedBenchmarks.time_dump(50, 5) [IBM-PW0KY600/virtualenv-py3.13]           |
| -        | 450±7μs                    | 88.2±3μs                      |    0.2  | qpy.ParameterizedBenchmarks.time_dump_and_load(20, 1) [IBM-PW0KY600/virtualenv-py3.12]  |
| -        | 495±30μs                   | 81.7±3μs                      |    0.16 | qpy.ParameterizedBenchmarks.time_dump_and_load(20, 1) [IBM-PW0KY600/virtualenv-py3.13]  |
| -        | 3.17±0.06ms                | 428±20μs                      |    0.14 | qpy.ParameterizedBenchmarks.time_dump_and_load(20, 10) [IBM-PW0KY600/virtualenv-py3.12] |
| -        | 3.58±0.2ms                 | 422±40μs                      |    0.12 | qpy.ParameterizedBenchmarks.time_dump_and_load(20, 10) [IBM-PW0KY600/virtualenv-py3.13] |
| -        | 1.83±0.1ms                 | 223±6μs                       |    0.12 | qpy.ParameterizedBenchmarks.time_dump_and_load(20, 5) [IBM-PW0KY600/virtualenv-py3.12]  |
| -        | 1.86±0.03ms                | 239±10μs                      |    0.13 | qpy.ParameterizedBenchmarks.time_dump_and_load(20, 5) [IBM-PW0KY600/virtualenv-py3.13]  |
| -        | 1000±30μs                  | 162±9μs                       |    0.16 | qpy.ParameterizedBenchmarks.time_dump_and_load(50, 1) [IBM-PW0KY600/virtualenv-py3.12]  |
| -        | 1.09±0.03ms                | 152±10μs                      |    0.14 | qpy.ParameterizedBenchmarks.time_dump_and_load(50, 1) [IBM-PW0KY600/virtualenv-py3.13]  |
| -        | 8.09±0.1ms                 | 1.09±0.04ms                   |    0.13 | qpy.ParameterizedBenchmarks.time_dump_and_load(50, 10) [IBM-PW0KY600/virtualenv-py3.12] |
| -        | 8.88±0.5ms                 | 897±50μs                      |    0.1  | qpy.ParameterizedBenchmarks.time_dump_and_load(50, 10) [IBM-PW0KY600/virtualenv-py3.13] |
| -        | 4.23±0.1ms                 | 502±40μs                      |    0.12 | qpy.ParameterizedBenchmarks.time_dump_and_load(50, 5) [IBM-PW0KY600/virtualenv-py3.12]  |
| -        | 4.45±0.1ms                 | 567±30μs                      |    0.13 | qpy.ParameterizedBenchmarks.time_dump_and_load(50, 5) [IBM-PW0KY600/virtualenv-py3.13]  |
| -        | 41.5±0.8ms                 | 11.1±0.7ms                    |    0.27 | qpy.RandomBenchmarks.time_dump(20, 1024, 0) [IBM-PW0KY600/virtualenv-py3.12]            |
| -        | 42.1±0.7ms                 | 9.28±0.3ms                    |    0.22 | qpy.RandomBenchmarks.time_dump(20, 1024, 0) [IBM-PW0KY600/virtualenv-py3.13]            |
| -        | 41.8±1ms                   | 11.3±0.4ms                    |    0.27 | qpy.RandomBenchmarks.time_dump(20, 1024, 42) [IBM-PW0KY600/virtualenv-py3.12]           |
| -        | 43.5±0.7ms                 | 9.48±0.2ms                    |    0.22 | qpy.RandomBenchmarks.time_dump(20, 1024, 42) [IBM-PW0KY600/virtualenv-py3.13]           |
| -        | 10.8±0.3ms                 | 2.77±0.09ms                   |    0.26 | qpy.RandomBenchmarks.time_dump(20, 256, 0) [IBM-PW0KY600/virtualenv-py3.12]             |
| -        | 11.0±0.5ms                 | 2.36±0.1ms                    |    0.22 | qpy.RandomBenchmarks.time_dump(20, 256, 0) [IBM-PW0KY600/virtualenv-py3.13]             |
| -        | 10.5±0.5ms                 | 2.77±0.06ms                   |    0.26 | qpy.RandomBenchmarks.time_dump(20, 256, 42) [IBM-PW0KY600/virtualenv-py3.12]            |
| -        | 10.8±0.2ms                 | 2.58±0.1ms                    |    0.24 | qpy.RandomBenchmarks.time_dump(20, 256, 42) [IBM-PW0KY600/virtualenv-py3.13]            |
| -        | 114±2ms                    | 16.5±0.5ms                    |    0.14 | qpy.RandomBenchmarks.time_dump_and_load(20, 1024, 0) [IBM-PW0KY600/virtualenv-py3.12]   |
| -        | 120±2ms                    | 15.8±0.7ms                    |    0.13 | qpy.RandomBenchmarks.time_dump_and_load(20, 1024, 0) [IBM-PW0KY600/virtualenv-py3.13]   |
| -        | 114±3ms                    | 16.6±0.2ms                    |    0.15 | qpy.RandomBenchmarks.time_dump_and_load(20, 1024, 42) [IBM-PW0KY600/virtualenv-py3.12]  |
| -        | 143±30ms                   | 15.1±0.7ms                    |    0.11 | qpy.RandomBenchmarks.time_dump_and_load(20, 1024, 42) [IBM-PW0KY600/virtualenv-py3.13]  |
| -        | 28.7±0.3ms                 | 4.55±0.1ms                    |    0.16 | qpy.RandomBenchmarks.time_dump_and_load(20, 256, 0) [IBM-PW0KY600/virtualenv-py3.12]    |
| -        | 30.9±2ms                   | 4.04±0.3ms                    |    0.13 | qpy.RandomBenchmarks.time_dump_and_load(20, 256, 0) [IBM-PW0KY600/virtualenv-py3.13]    |
| -        | 28.8±0.5ms                 | 4.37±0.2ms                    |    0.15 | qpy.RandomBenchmarks.time_dump_and_load(20, 256, 42) [IBM-PW0KY600/virtualenv-py3.12]   |
| -        | 30.5±1ms                   | 4.06±0.2ms                    |    0.13 | qpy.RandomBenchmarks.time_dump_and_load(20, 256, 42) [IBM-PW0KY600/virtualenv-py3.13]   |

@gadial gadial added mod: qpy Related to QPY serialization Rust This PR or issue is related to Rust code in the repository labels Apr 3, 2025
@gadial gadial added this to the 2.1.0 milestone Apr 3, 2025
@gadial gadial self-assigned this Apr 3, 2025
@gadial gadial requested a review from a team as a code owner April 3, 2025 09:47
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @mtreinish
  • @nkanazawa1989

@gadial gadial marked this pull request as draft April 3, 2025 09:47
@eliarbel eliarbel linked an issue Apr 24, 2025 that may be closed by this pull request
@eliarbel
Copy link
Copy Markdown
Member

eliarbel commented May 4, 2025

Hi @gadial, this is still in draft mode so I just gave it a brief look. I have a few general questions and comments at this stage:

  1. With the way the code is written now it's geared towards being called from Python, e.g. with the file header written in interface.py, the Python calls in py_write_circuit, pack_circuit_header and more. Maybe it's just an artifact of this PR still being WIP, but with the C-API for circuit construction (this Add initial C API for circuit construction #14006 and follow-ons) we should consider a C-api equivalent for the dump workflow or some aspects of it.
  2. For the circuit header handling in pack_circuit_header maybe QuantumCircuitData in converters or a similar approach might be useful here instead of all the getattr calls.
  3. Generally speaking about 1 & 2: assuming we'll still need some Python calls, can we make the Rust-logic as Python-free as possible, say by having a layer to extract as much info as possible from the Python objects before passing it to the actual pack functions? I realize this might not be that straightforward, but still worth trying. Eventually we should have enough of the data model Python-free so it makes sense to design for that already.
  4. Is use_rust in dump just a WIP thing for debug or do you plan to keep Python's write_circuit as an option? If so why? Otherwise it would be good to have the formats defined only in one place. Currently formats.rs mirrors formats.py which makes sense for the WIP status but the duplication is not good for the longer term. Just mentioning.

@gadial
Copy link
Copy Markdown
Contributor Author

gadial commented May 4, 2025

Thanks @eliarbel

  1. That's mostly at artifact of originally planning a smaller scope for this PR (in order to proceed incrementally instead of one huge PR) - I'm still trying to aim for the MVP, although this can be changed if needed since the header is a relatively easy part.
  2. Thanks, I'll try looking into it once the PR is passing.
  3. It's a little tricky since we deal with many different data types, each with its own Pythonic quirks, and some (e.g. parameters) are already transitioning to Rust. My priorities in this PR is to flush out the QPY structure as much as possible (i.e. by using detailed structs) and keeping the actual data extraction in specific serialize_X functions which can be amended on demand as we move more logic from Python to Rust. But I thing adding new extractors is out of scope for this PR which aims to establish the baseline.
  4. That's most definitely a WIP thing - I use it to test both Rust and Python generate the same output. At the end I believe we should remove all the Python code related to QPY dumping, but we'll have to keep the formats until we handle reading as well.

Hi @gadial, this is still in draft mode so I just gave it a brief look. I have a few general questions and comments at this stage:

  1. With the way the code is written now it's geared towards being called from Python, e.g. with the file header written in interface.py, the Python calls in py_write_circuit, pack_circuit_header and more. Maybe it's just an artifact of this PR still being WIP, but with the C-API for circuit construction (this Add initial C API for circuit construction #14006 and follow-ons) we should consider a C-api equivalent for the dump workflow or some aspects of it.
  2. For the circuit header handling in pack_circuit_header maybe QuantumCircuitData in converters or a similar approach might be useful here instead of all the getattr calls.
  3. Generally speaking about 1 & 2: assuming we'll still need some Python calls, can we make the Rust-logic as Python-free as possible, say by having a layer to extract as much info as possible from the Python objects before passing it to the actual pack functions? I realize this might not be that straightforward, but still worth trying. Eventually we should have enough of the data model Python-free so it makes sense to design for that already.
  4. Is use_rust in dump just a WIP thing for debug or do you plan to keep Python's write_circuit as an option? If so why? Otherwise it would be good to have the formats defined only in one place. Currently formats.rs mirrors formats.py which makes sense for the WIP status but the duplication is not good for the longer term. Just mentioning.

@coveralls
Copy link
Copy Markdown

coveralls commented May 11, 2025

Pull Request Test Coverage Report for Build 21109706843

Details

  • 3635 of 4171 (87.15%) changed or added relevant lines in 27 files are covered.
  • 485 unchanged lines in 10 files lost coverage.
  • Overall coverage decreased (-0.4%) to 87.935%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/qpy/src/consts.rs 63 64 98.44%
qiskit/qpy/binary_io/circuits.py 12 13 92.31%
qiskit/qpy/binary_io/value.py 0 1 0.0%
crates/circuit/src/parameter/parameter_expression.rs 21 27 77.78%
crates/qpy/src/expr.rs 203 210 96.67%
crates/qpy/src/formats.rs 87 108 80.56%
crates/qpy/src/annotations.rs 75 106 70.75%
crates/qpy/src/circuit_writer.rs 892 949 93.99%
crates/qpy/src/bytes.rs 125 183 68.31%
crates/qpy/src/py_methods.rs 353 420 84.05%
Files with Coverage Reduction New Missed Lines %
crates/circuit/src/parameter/parameter_expression.rs 1 87.09%
qiskit/circuit/library/pauli_product_measurement.py 1 93.02%
crates/qasm2/src/lex.rs 2 92.8%
crates/circuit/src/parameter/symbol_expr.rs 3 72.92%
qiskit/circuit/annotation.py 4 86.67%
crates/qasm2/src/parse.rs 6 97.56%
qiskit/qpy/common.py 9 46.67%
qiskit/qpy/type_keys.py 12 71.88%
qiskit/qpy/binary_io/value.py 173 44.79%
qiskit/qpy/binary_io/circuits.py 274 57.29%
Totals Coverage Status
Change from base Build 21064565528: -0.4%
Covered Lines: 100180
Relevant Lines: 113925

💛 - Coveralls

@gadial gadial mentioned this pull request Jan 8, 2026
16 tasks
Copy link
Copy Markdown
Member

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gadi! I I am in favor of merging this PR as is, but I'll defer to Eli to take another look to make sure that all of this comments are addressed.

Comment thread crates/qpy/src/circuit_reader.rs
Comment thread crates/qpy/src/circuit_reader.rs Outdated
Comment thread crates/qpy/src/circuit_reader.rs
Comment thread crates/qpy/src/py_methods.rs Outdated
.getattr(intern!(py, "__class__"))?
.getattr(intern!(py, "__name__"))?
.extract::<String>(),
OperationRef::PauliProductMeasurement(_) => imports::PAULI_PRODUCT_MEASUREMENT
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you wanted to change this to avoid going through python.

Comment thread crates/qpy/src/consts.rs Outdated
Comment thread releasenotes/notes/add_qpy_rust_implementation-2f6ff2e9f52ed2e6.yaml Outdated
Copy link
Copy Markdown
Member

@eliarbel eliarbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard and amazing work Gadi!
I'm happy the get this merged into main so I will go ahead and approve.

@alexanderivrii any more comments?

Copy link
Copy Markdown
Member

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more immediate comments on my part.

@alexanderivrii alexanderivrii added this pull request to the merge queue Jan 18, 2026
Merged via the queue into Qiskit:main with commit 654cb87 Jan 18, 2026
23 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Done in Qiskit 2.4 Jan 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: None Do not include in the GitHub Release changelog. mod: qpy Related to QPY serialization Rust This PR or issue is related to Rust code in the repository

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Port QPY serialization to Rust

10 participants