Fix QPY loading delay integers durations incorrectly#16076
Fix QPY loading delay integers durations incorrectly#16076mtreinish merged 8 commits intoQiskit:mainfrom
Conversation
This commit fixes an issue with parsing QPY payloads which have circuits
that contain delays with duration of units dt. Durations of dt are
integers and that is preserved in the QPY data. However our rust data
model doesn't have support for an integer in a Rust PackedInstruction's
parameters (this is an inconsistency which we should arguably fix but
that is separate from this bugfix). To workaround this the QPY reader in
rust was sticking the integer into a ParameterExpression as a constant
without any symbols. This resulted in storing the integer in Rust but
treating the object as a ParameterExpression and not an int, which in
Qiskit's rust data model is mapped to a Param::Obj (indicating a Python
object parameter). This mismatch in types was not really noticeable to
Python because the ParameterExpression with the constant integer was
coerced to an integer when it's passed to Python. However, this would
break underlying assumptions for Rust code that is interacting with the
delay. For example, the experimental rust qasm3 exporter would encounter
the ParameterExpression on the delay and error because it can't handle
parameter expressions yet. However fundamentally it could because this
is just an integer. The reproducer for this failure is:
```python
import io
from qiskit import qpy, qasm3, QuantumCircuit
qc = QuantumCircuit(1)
qc.delay(1, 0)
qasm3.dumps_experimental(qc)
with io.BytesIO() as fptr:
qpy.dump(qc, fptr)
fptr.seek(0)
qc2 = qpy.load(fptr)[0]
qasm3.dumps_experimental(qc2)
```
The OQ3 experimental exporter is wrong not to handle
`ParameterExpression::try_as_value` returning an `int`, but it's _more_
wrong that QPY is producing a `ParameterExpression` on deserialisation
in the first place.
To fix this issue this commit removes the conversion of the `int` in the
qpy payload into a ParameterExpression and just retains an integer until
we write out the Delay's PackedInstruction where we convert that rust
int into a python int for the parameter. Doing this unraveled a deeper
issue in how endianess is handled in QPY. In general everything in QPY
is supposed to be encoded using network byte order (i.e. big endian).
However, in the case of instructions' parameters there was a mistake
made in QPY where the integer and float values for an instruction's
parameters were encoded in little endian. All other uses of floats or
ints are correctly big endian. When the raw int was returned to Python
it was incorrectly assuming all integers were a big endian bytes value.
To fix this an endian arg is added to the function which is converting
the bytes arrays into a `GenericValue` enum for floats and ints. Then
the callers of this function in circuit_reader are updated to explicitly
assert what endianess the data in the circuit payload is if there are
any floats or ints. This is `Endian::Little` for any instruction params
that are in the parameters list explicitly and `Endian::Big` everywhere
else. At the same time the handling of flipping the endianess of value
in several places was fixed because this was no longer necessary as the
data was read now using the correct byte order.
This fundamentally stems from all the base values being stored in a
single binrw generic value pack which is trying to encode all the primitive
types in a single place. Ideally we should be handling the primitive types
explicitly for each data pack field. But this was not
changed to keep the diff minimal for backport
Co-authored-by: Jake Lishman <jake.lishman@ibm.com>
|
One or more of the following people are relevant to this code:
|
All the entries in this list either take zero parameters, or would have caused program control flow to enter `unpack_control_flow` and _not_ `unpack_py_instruction`, so this switching logic is dead.
| qc = QuantumCircuit(1, name="delay dt") | ||
| qc.delay(1, 0) | ||
| qc.delay(9223372036854775806, 0) | ||
| return qc |
There was a problem hiding this comment.
Worth mentioning that with the current ParameterExpression casting behaviour on exposure to Python space, this test doesn't actually test the typing fix of int/expr in this PR, but it does test the endianness changes.
There was a problem hiding this comment.
It is why I used both 1 and i64::MAX - 1 very specifically here. I wrote this to find the endianess issue originally and then debug it.
| @all_qpy_combinations(QPY_RUST_READ_MIN_VERSION) | ||
| def test_delay_expr_roundtrip(self, version, write_with, read_with): | ||
| stretch_expr = QuantumCircuit(1, name="stretch_expr_delay_circuit") | ||
| s = expr.Stretch(uuid.uuid4(), "a") | ||
| stretch = stretch_expr.add_stretch(s) | ||
| stretch_expr.delay(stretch, 0) | ||
| stretch_expr.delay(expr.add(Duration.dt(200), stretch), 0) | ||
| stretch_expr.delay(expr.sub(Duration.ns(3.14159), stretch), 0) |
There was a problem hiding this comment.
This test doesn't have an assert in it. The fact that it passes lint is a little worrying to me - there should be an "unused argument" complaint, right?
There was a problem hiding this comment.
Fixed in 4e5e175, though the comment about lint remains.
There was a problem hiding this comment.
Hmm, yeah I'm not sure why the ruff didn't catch this, that is odd.
| Fixed an issue in the :func:`.qpy.load` function when reading QPY payloads using | ||
| QPY format versions >=13 that contained a circuit which contained :class:`.Delay` | ||
| instructions that had an integer duration value with a duration unit of ``"dt"`` | ||
| were incorrectly loaded through a :class:`.ParameterExpression` as part of the | ||
| deserialization process. This typically wasn't visible through Python as the | ||
| :class:`.ParameterExpression` was coerced into an integer when it was accessed | ||
| from Python. But for code acting on the instruction internally via Rust this could | ||
| cause issues as it was diverging from underlying expectations around | ||
| :class:`.Delay` instructions. One concrete example of this is if you passed a | ||
| circuit loaded via :func:`.qpy.load` to the :func:`.qasm3.dumps_experimental` | ||
| function this would cause an internal error. This has been fixed so now an integer | ||
| object is correctly created directly for the duration. |
There was a problem hiding this comment.
imo we should reduce the detail a little - we don't have to justify to users why some things didn't demonstrate the bug in the release note itself (the ParameterExpression stuff).
Fixed
:func:`.qpy.load`for QPY versions >=13, where:class:`.Delay`instructions with integer durations could deserialize to incorrect types. This could cause later code to raise errors, such as:func:`.qasm3.dumps_experimental`returning an error saying "Failed to parse parameter value". See`#16076 <https://github.com/Qiskit/qiskit/pull/16076>`__for more detail.
If not as reduced as (^) that, then at least the first sentence could do with a rework - there's enough restrictive descriptive clauses in it that, by the time we reach the verb, I think we've lost the verb agreement with the start of the sentence.
| } else { | ||
| Endian::Little | ||
| }; | ||
| let mut instruction_values = get_instruction_values(instruction, qpy_data, endian)?; |
There was a problem hiding this comment.
The set of values on lines 711-716 is mostly equivalent to the set of values that cause this function (unpack_py_instruction) to not be entered in the first place: the calling function instead dispatches to unpack_control_flow if one of those names match (via recognize_instruction_type). The subsequent handling of IfElseOp and friends in this function is then also dead.
However, recognize_instruction_type doesn't know about BoxOp, so that does come down this path (and the handling of BoxOp in unpack_control_flow is then also presumably dead...).
I think my suggestion for the purposes of this bugfix is just to delete this bit of the dead code, and pass Endian::Little hard-coded here - I tried it locally to success. We can deal with the dead code on the 2.5 branch.
| @@ -1171,8 +1184,12 @@ fn deserialize_pauli_evolution_gate( | |||
| sparse_pauli_op_pack, | |||
| )) => { | |||
| // formats::PauliDataPack::SparsePauliOp(sparse_pauli_op_pack) => { | |||
| pub(crate) fn unpack_biguint(big_int_pack: BigIntPack, endian: Endian) -> BigUint { | ||
| match endian { | ||
| Endian::Little => BigUint::from_bytes_le(&big_int_pack.bytes), | ||
| Endian::Big => BigUint::from_bytes_be(&big_int_pack.bytes), | ||
| } |
There was a problem hiding this comment.
The only place that biguint should turn up is in the expr system, which is always network order.
| ValueType::Integer => { | ||
| // a little tricky since this can be either i64 or biguint | ||
| let result = bytes.try_into(); | ||
| if let Ok(value) = result { | ||
| Ok(GenericValue::Int64(value)) | ||
| if bytes.len() <= 8 { | ||
| let mut bytes_array: [u8; 8] = [0; 8]; | ||
| for (idx, byte) in bytes.iter().enumerate() { | ||
| bytes_array[idx] = *byte; | ||
| } | ||
| match endian { | ||
| Endian::Little => Ok(GenericValue::Int64(i64::from_le_bytes(bytes_array))), | ||
| Endian::Big => Ok(GenericValue::Int64(i64::from_be_bytes(bytes_array))), | ||
| } | ||
| } else { | ||
| load_biguint_value(bytes) | ||
| load_biguint_value(bytes, endian) | ||
| } |
There was a problem hiding this comment.
The comment about "this can be either i64 or biguint" is betraying that this function is being used generically over more than one format key, which isn't safe parsing - the two contexts are different, and "i means i64" is INSTRUCTION_PARAM while "i means biguint" is the expression system.
Those should be parsed with unambiguous context, so we should do that in a rework.
(The commit message already mentions this, just pointing it out for the future.)
The `BigUint` keys are only valid in contexts where they are guaranteed to be in network order. The `pack_biguint` function knew this, but `unpack_biguint` mistakenly gained an `endian` argument, which was required to always be `Big`.
|
I've pushed up three commits that do most of my suggestions. The only other thing for me ahead of merge is the release note, but I didn't want to unilaterally change that. |
jakelishman
left a comment
There was a problem hiding this comment.
Matt signed off (offline) on the release note change, so done in c2de7e9, and I think this is reasonable to merge now.
Approving but not enqueuing for Matt/somebody to sign off on my last code changes too.
…skit#16084) This commit fixes a mismatch in the Param::eq() method where a Python object param was potentially viewed as equal to a ParameterExpression param. When comparing a python object parameter with a ParameterExpression it would convert the ParameterExpression into a pyobject and then uses Python's == to evaluate the equality. Historically this was something that was done because the ParameterExpression object was also a Python object. However, since the function was first written the source of truth for the inner ParameterExpression has moved to Rust. We're never in a situation anymore when a ParameterExpression parameter on an instruction should be treated as a Python object. While this legacy equality may seem harmless because the equality should typically just evaluate to false (albeit slowly through Python's equality check), there are edge cases where this is doing the incorrect comparison. Primarily the bug that Qiskit#16076 was fixing was not caught initially because in tests this legacy logic was converting a ParameterExpression that is a constant int expression with no symbols to a Python int during the conversion of the expression to Python. That was then being compared against a Param::Obj() of a true Python int. The equality check would incorrectly assert the ParameterExpression was the same as the python int. This is what masked the bug in Qiskit#16076 because this differs from how the rust data model treats these parameters.
This commit fixes an issue with parsing QPY payloads which have circuits that contain delays with duration of units dt. Durations of dt are integers and that is preserved in the QPY data. However our rust data model doesn't have support for an integer in a Rust PackedInstruction's parameters (this is an inconsistency which we should arguably fix but that is separate from this bugfix). To workaround this the QPY reader in rust was sticking the integer into a ParameterExpression as a constant without any symbols. This resulted in storing the integer in Rust but treating the object as a ParameterExpression and not an int, which in Qiskit's rust data model is mapped to a Param::Obj (indicating a Python object parameter). This mismatch in types was not really noticeable to Python because the ParameterExpression with the constant integer was coerced to an integer when it's passed to Python. However, this would break underlying assumptions for Rust code that is interacting with the delay. For example, the experimental rust qasm3 exporter would encounter the ParameterExpression on the delay and error because it can't handle parameter expressions yet. However fundamentally it could because this is just an integer. The reproducer for this failure is:
The OQ3 experimental exporter is wrong not to handle
ParameterExpression::try_as_valuereturning anint, but it's more wrong that QPY is producing aParameterExpressionon deserialisation in the first place.To fix this issue this commit removes the conversion of the
intin the qpy payload into a ParameterExpression and just retains an integer until we write out the Delay's PackedInstruction where we convert that rust int into a python int for the parameter. Doing this unraveled a deeper issue in how endianess is handled in QPY. In general everything in QPY is supposed to be encoded using network byte order (i.e. big endian). However, in the case of instructions' parameters there was a mistake made in QPY where the integer and float values for an instruction's parameters were encoded in little endian. All other uses of floats or ints are correctly big endian. When the raw int was returned to Python it was incorrectly assuming all integers were a big endian bytes value. To fix this an endian arg is added to the function which is converting the bytes arrays into aGenericValueenum for floats and ints. Then the callers of this function in circuit_reader are updated to explicitly assert what endianess the data in the circuit payload is if there are any floats or ints. This isEndian::Littlefor any instruction params that are in the parameters list explicitly andEndian::Bigeverywhere else. At the same time the handling of flipping the endianess of value in several places was removed because this was no longer necessary as the data was read now using the correct byte order.This fundamentally stems from all the base values being stored in a single binrw generic value pack which is trying to encode all the primitive types in a single place. Ideally we should be handling the primitive types explicitly for each data pack field. But this was not changed to keep the diff minimal for backport.
Co-authored-by: Jake Lishman jake.lishman@ibm.com
AI/LLM disclosure