Gpu acceleration#15930
Closed
dmeliksetian wants to merge 2 commits intoQiskit:mainfrom
Closed
Conversation
Offloads the intermediate boolean tensor operations in SparsePauliOp.compose() to the GPU when CuPy is available and the tensor size exceeds _GPU_COMPOSE_THRESHOLD (5M elements), falling back silently to NumPy otherwise. - Add HAS_CUPY lazy optional to qiskit.utils.optionals - Add GPU/CPU correctness tests in TestSparsePauliOpGPU - Add CPU-vs-GPU ASV benchmarks in SparsePauliOpGPUComposeBench Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Defer cp.asnumpy() in the qargs branch until after cp.repeat and scatter assignment, so the full embedding stays on GPU. Previously x3/z3/phase were transferred back to CPU before np.repeat was called. Also expand ASV benchmarks with SparsePauliOpGPUComposeQargsBench to measure CPU vs GPU speedup for the qargs path across varying total/sub qubit counts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
|
Thank you for opening a new pull request. Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone. One or more of the following people are relevant to this code:
|
Member
|
See justification in #15929 (comment). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Offloads the boolean tensor operations in
SparsePauliOp.compose()to the GPU via CuPy when available and the intermediate tensor size exceeds 5,000,000 elements (self.size × other.size × num_qubits). Falls back silently to NumPy when CuPy is not installed or the tensor is below the threshold — no behaviour change for existing users.Closes #15929
Details and comments
Changes
qiskit/utils/optionals.py— addHAS_CUPYlazy optional testerSparsePauliOp.compose()— selectxp = cupy / numpybased on tensor size; keep the entireqargsbranch (includingrepeat+ scatter assignment) on GPU, transferring back to CPU only once before constructingBasePaulitest/python/.../test_sparse_pauli_op.py— addTestSparsePauliOpGPUcorrectness tests (skip when CuPy not installed) coveringqargs=None,front=True,qargsset, and CPU/GPU paritytest/benchmarks/quantum_info.py— addSparsePauliOpGPUComposeBenchandSparsePauliOpGPUComposeQargsBenchwith pairedtime_compose_cpu/time_compose_gpumethods for direct ASV comparisonBenchmarks
Measured on AMD Ryzen 9 7950X + NVIDIA RTX PRO 4500 Blackwell, CUDA 13.2,
cupy-cuda12x:qargs=None:qargsset (e.g.apply_layout):Test plan
stestr run quantum_info.operators.symplectic.test_sparse_pauli_op.TestSparsePauliOpGPU— all 4 pass with CuPy installedstestr run quantum_info.operators.symplectic.test_sparse_pauli_op.TestSparsePauliOpMethods— 272 pass, 2 skip (pre-existing), 0 failasv run --python=same --bench SparsePauliOpGPUCompose— both benchmark classes report GPU speedup