Skip to content

Commit 8064d27

Browse files
jakelishmangadialCryoris
authored
Rewrite QPY ParameterExpression handling in pure Polish form (#15934)
* Rewrite QPY `ParameterExpression` handling in pure Polish form This patch is a fairly invasive change, but one that fixes the known edge cases of replay generation from `ParameterExpression` itself. This fixes several `pickle`, `copy`/`deepcopy` and QPY bugs around `ParameterExpression`. The Python-space QPY loader is modified in a small way to allow it to handle QPY-format permitted "replay" elements that act between two bare values; this was always allowed by the QPY spec, but couldn't be generated by Python-space QPY/`ParameterExpression`. Doing so actually significantly simplifies the loading logic, since it no longer needs to reflect certain arithmetic operations. The new Rust-space `ParameterExpression` uses binary operations with two numeric operands (e.g. `Add(2, 0)`) to safely propagate bare values through the QPY replay. This was not previously necessary in Python space because there was no public interface to create a bare-numeric `ParameterExpression` without replayable expressions having been tracked. Similarly, cancelled-out symbols are propagated in the replay with `Sub(x, x)` operations, which already must work across all compliant QPY loaders, and both Rust- and Python-space `ParameterExpression. History ------- The QPY replay was _intended_ to be a pure Polish-notation representation of the operations to take to rebuild the expression. This mostly worked fine at the time, but the patch was written under a lot of stress and without full CI tooling, due to it being in response to a security bug. This led to some less-than-ideal parts of the format specification, such as the start/end recursion opcodes in the QPY format not actually being necessary; they are actually zero-operand no-ops to the stack, but this wasn't noticed during the patch due to time pressure. Further, as `ParameterExpression` moved to Rust, the tracked internal "replay" list disappeared, since we could now reliably walk the expression tree and issue rebuild commands directly. This generated replay, however, _only_ examined the expression and not potentially cancelled-out parameters. All together, there were several problems affecting QPY (both Rust and Python), and `ParameterExpression`'s interaction with `pickle` and the copy module, mostly relating to situations where the resulting expression had cancelled-out parameters or had degraded to be a bare value. For example, expressions like `x - x` (or `0*x + 3`, or arbitrarily complex extensions of these) are supposed to retain a reference to `x` for binding purposes, but failed to do so after pickle or QPY roundtrips. Several of these expressions failed during QPY deserialisation, as the replay stack handling failed to cope with bare values. * Credit Gadi Gadi did much of the exploratory work of the previous commit, and wrote several of the test cases I pulled in - I forgot to include the line before. Co-authored-by: Gadi Aleksandrowicz <gadial@gmail.com> * Correct handling of vector-symbol names in map I was misled by `Symbol::name` not returning the complete name. We should probably rename that method in a follow-up. * Avoid symbol cloning in QPY replay output Using `Mul(sym, 0)` also cancels out the symbol, and doesn't require an internal clone. Co-authored-by: Julien Gacon <jul@zurich.ibm.com> * Make inner-worker recursive call clearer * Add hasher to `IndexSet` * Update comments * Expand `ParameterExpression` backwards-compatibility testing * Skip backwards-compatibility tests for broken Qiskit versions All versions of Qiskit that have Rust-space `ParameterExpression` (introduced in v2.2) up to (excluding) the roll-up commit of this PR (intended release in v2.4.0rc3) produce invalid QPY for expressions with cancellations. Here, we skip the known-bad versions, since there is no way to completely recover from such files; the cancelled-out expression _might_ be zero, or might truly have been an arbitrary numeric value (e.g. in the case `0*x + 1.5`, the `1.5` is completely lost in the broken versions of Qiskit). The previous version-handling mechanisms in the QPY backwards compatibility tests couldn't cope with an `rc` being an upper bound in a version range (since it only dealt with the "release" component), so here we also update the checking system to use the complete version of `packing.version` comparisons. * Add known-issue release note * Add missing dependency installation * Separate conditional test into separate file --------- Co-authored-by: Gadi Aleksandrowicz <gadial@gmail.com> Co-authored-by: Julien Gacon <jul@zurich.ibm.com>
1 parent 6be537f commit 8064d27

13 files changed

Lines changed: 532 additions & 439 deletions

File tree

crates/circuit/src/parameter/parameter_expression.rs

Lines changed: 87 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
use hashbrown::hash_map::Entry;
1414
use hashbrown::{HashMap, HashSet};
15+
use indexmap::IndexSet;
1516
use num_complex::Complex64;
1617
use pyo3::exceptions::{PyRuntimeError, PyTypeError, PyValueError, PyZeroDivisionError};
1718
use pyo3::types::{IntoPyDict, PyComplex, PyFloat, PyInt, PyNotImplemented, PySet, PyString};
@@ -134,7 +135,62 @@ impl fmt::Display for ParameterExpression {
134135
impl ParameterExpression {
135136
pub fn qpy_replay(&self) -> Vec<OPReplay> {
136137
let mut replay = Vec::new();
137-
qpy_replay(self, &self.name_map, &mut replay);
138+
let mut unused: IndexSet<_, ahash::RandomState> = self.name_map.values().cloned().collect();
139+
// The recursive inner `qpy_replay_inner` assumes it starts from a containing operation, so
140+
// fails to build a complete replay in the case it starts from a single symbol or value.
141+
match &self.expr {
142+
SymbolExpr::Value(v) => {
143+
let item = match *v {
144+
Value::Int(v) => OPReplay {
145+
op: OpCode::ADD,
146+
lhs: Some(ParameterValueType::Int(v)),
147+
rhs: Some(ParameterValueType::Int(0)),
148+
},
149+
Value::Real(v) => OPReplay {
150+
op: OpCode::ADD,
151+
lhs: Some(ParameterValueType::Float(v)),
152+
// `-0.0` is technically the identity element of floating-point addition;
153+
// `0.0 + x` is not bit-for-bit equal to `x` solely if `x` is `-0.0`.
154+
rhs: Some(ParameterValueType::Float(-0.0)),
155+
},
156+
Value::Complex(v) => OPReplay {
157+
op: OpCode::ADD,
158+
lhs: Some(ParameterValueType::Complex(v)),
159+
rhs: Some(ParameterValueType::Complex(Complex64 {
160+
re: -0.0,
161+
im: -0.0,
162+
})),
163+
},
164+
};
165+
replay.push(item);
166+
}
167+
SymbolExpr::Symbol(sym) => {
168+
unused.swap_remove(sym.as_ref());
169+
replay.push(OPReplay {
170+
op: OpCode::ADD,
171+
lhs: ParameterValueType::extract_from_expr(&self.expr),
172+
rhs: Some(ParameterValueType::Int(0)),
173+
});
174+
}
175+
SymbolExpr::Unary { .. } | SymbolExpr::Binary { .. } => {
176+
qpy_replay_inner(self, &self.name_map, &mut replay, &mut unused);
177+
}
178+
}
179+
// For any unused symbols, we'll add something like `(x * 0) + expr`. This sort of
180+
// cancellation is how unused symbols appear; it doesn't matter if the _actual_ cause was
181+
// `x - x` or whatever, because the end observable effect is the same.
182+
for symbol in unused {
183+
replay.push(OPReplay {
184+
op: OpCode::MUL,
185+
lhs: Some(ParameterValueType::from_symbol(symbol)),
186+
rhs: Some(ParameterValueType::Int(0)),
187+
});
188+
replay.push(OPReplay {
189+
op: OpCode::ADD,
190+
lhs: None,
191+
rhs: None,
192+
});
193+
}
138194
replay
139195
}
140196
}
@@ -188,21 +244,16 @@ impl ParameterExpression {
188244
///
189245
/// This only succeeds if the underlying expression is, in fact, only a symbol.
190246
pub fn try_to_symbol(&self) -> Result<Symbol, ParameterError> {
191-
if let SymbolExpr::Symbol(symbol) = &self.expr {
192-
Ok(symbol.as_ref().clone())
193-
} else {
194-
Err(ParameterError::NotASymbol)
195-
}
247+
self.try_to_symbol_ref().cloned()
196248
}
197249

198250
/// Try casting to a [Symbol], returning a reference.
199251
///
200252
/// This only succeeds if the underlying expression is, in fact, only a symbol.
201253
pub fn try_to_symbol_ref(&self) -> Result<&Symbol, ParameterError> {
202-
if let SymbolExpr::Symbol(symbol) = &self.expr {
203-
Ok(symbol.as_ref())
204-
} else {
205-
Err(ParameterError::NotASymbol)
254+
match &self.expr {
255+
SymbolExpr::Symbol(sym) if self.name_map.len() == 1 => Ok(sym.as_ref()),
256+
_ => Err(ParameterError::NotASymbol),
206257
}
207258
}
208259

@@ -257,17 +308,10 @@ impl ParameterExpression {
257308
}
258309

259310
/// Load from a sequence of [OPReplay]s. Used in serialization.
260-
pub fn from_qpy(
261-
replay: &[OPReplay],
262-
subs_operations: Option<Vec<(usize, HashMap<Symbol, ParameterExpression>)>>,
263-
) -> Result<Self, ParameterError> {
311+
pub fn from_qpy(replay: &[OPReplay]) -> Result<Self, ParameterError> {
264312
// the stack contains the latest lhs and rhs values
265313
let mut stack: Vec<ParameterExpression> = Vec::new();
266-
let subs_operations = subs_operations.unwrap_or_default();
267-
let mut current_sub_operation = subs_operations.len(); // we avoid using a queue since we only make one pass anyway
268-
for (i, inst) in replay.iter().enumerate() {
269-
let OPReplay { op, lhs, rhs } = inst;
270-
314+
for OPReplay { op, lhs, rhs } in replay {
271315
// put the values on the stack, if they exist
272316
if let Some(value) = lhs {
273317
stack.push(value.clone().into());
@@ -312,15 +356,6 @@ impl ParameterExpression {
312356
}
313357
};
314358
stack.push(result);
315-
//now check whether any substitutions need to be applied at this stage
316-
while current_sub_operation > 0 && subs_operations[current_sub_operation - 1].0 == i + 1
317-
{
318-
if let Some(exp) = stack.pop() {
319-
let sub_exp = exp.subs(&subs_operations[current_sub_operation - 1].1, false)?;
320-
stack.push(sub_exp);
321-
}
322-
current_sub_operation -= 1;
323-
}
324359
}
325360

326361
// once we're done, just return the last element in the stack
@@ -1370,49 +1405,18 @@ impl PyParameterExpression {
13701405
}
13711406
}
13721407

1373-
fn __getstate__(&self) -> PyResult<(Vec<OPReplay>, Option<ParameterValueType>)> {
1374-
// We distinguish in two cases:
1375-
// (a) This is indeed an expression which can be rebuild from the QPY replay. This means
1376-
// the replay is *not empty* and it contains all symbols.
1377-
// (b) This expression is in fact only a Value or a Symbol. In this case, the QPY replay
1378-
// will be empty and we instead pass a `ParameterValueType` for reconstruction.
1379-
let qpy = self._qpy_replay()?;
1380-
if !qpy.is_empty() {
1381-
Ok((qpy, None))
1382-
} else {
1383-
let value = ParameterValueType::extract_from_expr(&self.inner.expr);
1384-
if value.is_none() {
1385-
Err(PyValueError::new_err(format!(
1386-
"Failed to serialize the parameter expression: {self:?}"
1387-
)))
1388-
} else {
1389-
Ok((qpy, value))
1390-
}
1391-
}
1408+
fn __getstate__(&self) -> Vec<OPReplay> {
1409+
self._qpy_replay()
13921410
}
13931411

1394-
fn __setstate__(&mut self, state: (Vec<OPReplay>, Option<ParameterValueType>)) -> PyResult<()> {
1395-
// if there a replay, load from the replay
1396-
if !state.0.is_empty() {
1397-
let from_qpy = ParameterExpression::from_qpy(&state.0, None)?;
1398-
self.inner = from_qpy;
1399-
// otherwise, load from the ParameterValueType
1400-
} else if let Some(value) = state.1 {
1401-
let expr = ParameterExpression::from(value);
1402-
self.inner = expr;
1403-
} else {
1404-
return Err(PyValueError::new_err(
1405-
"Failed to read QPY replay or extract value.",
1406-
));
1407-
}
1412+
fn __setstate__(&mut self, state: Vec<OPReplay>) -> PyResult<()> {
1413+
self.inner = ParameterExpression::from_qpy(&state)?;
14081414
Ok(())
14091415
}
14101416

14111417
#[getter]
1412-
fn _qpy_replay(&self) -> PyResult<Vec<OPReplay>> {
1413-
let mut replay = Vec::new();
1414-
qpy_replay(&self.inner, &self.inner.name_map, &mut replay);
1415-
Ok(replay)
1418+
fn _qpy_replay(&self) -> Vec<OPReplay> {
1419+
self.inner.qpy_replay()
14161420
}
14171421
}
14181422

@@ -1872,6 +1876,13 @@ pub enum ParameterValueType {
18721876
}
18731877

18741878
impl ParameterValueType {
1879+
fn from_symbol(symbol: Symbol) -> Self {
1880+
if symbol.index.is_some() {
1881+
Self::VectorElement(PyParameterVectorElement { symbol })
1882+
} else {
1883+
Self::Parameter(PyParameter { symbol })
1884+
}
1885+
}
18751886
fn extract_from_expr(expr: &SymbolExpr) -> Option<ParameterValueType> {
18761887
if let Some(value) = expr.eval(true) {
18771888
match value {
@@ -1880,20 +1891,7 @@ impl ParameterValueType {
18801891
Value::Complex(c) => Some(ParameterValueType::Complex(c)),
18811892
}
18821893
} else if let SymbolExpr::Symbol(symbol) = expr {
1883-
match symbol.index {
1884-
None => {
1885-
let param = PyParameter {
1886-
symbol: symbol.as_ref().clone(),
1887-
};
1888-
Some(ParameterValueType::Parameter(param))
1889-
}
1890-
Some(_) => {
1891-
let param = PyParameterVectorElement {
1892-
symbol: symbol.as_ref().clone(),
1893-
};
1894-
Some(ParameterValueType::VectorElement(param))
1895-
}
1896-
}
1894+
Some(Self::from_symbol(symbol.as_ref().clone()))
18971895
} else {
18981896
// ParameterExpressions have the value None, as they must be constructed
18991897
None
@@ -2075,14 +2073,18 @@ fn filter_name_map(
20752073
}
20762074
}
20772075

2078-
pub fn qpy_replay(
2076+
fn qpy_replay_inner(
20792077
expr: &ParameterExpression,
20802078
name_map: &HashMap<String, Symbol>,
20812079
replay: &mut Vec<OPReplay>,
2080+
unused: &mut IndexSet<Symbol, ahash::RandomState>,
20822081
) {
20832082
match &expr.expr {
2084-
SymbolExpr::Value(_) | SymbolExpr::Symbol(_) => {
2085-
// nothing to do here, we only need to traverse instructions
2083+
// This function is written under the assumption that the top-level expression involves an
2084+
// operation, since `OPReplay` items correspond to operations that own their operands.
2085+
SymbolExpr::Value(_) => (),
2086+
SymbolExpr::Symbol(sym) => {
2087+
unused.swap_remove(sym.as_ref());
20862088
}
20872089
SymbolExpr::Unary { op, expr } => {
20882090
let op = match op {
@@ -2103,7 +2105,7 @@ pub fn qpy_replay(
21032105
let lhs = filter_name_map(expr, name_map);
21042106

21052107
// recurse on the instruction
2106-
qpy_replay(&lhs, name_map, replay);
2108+
qpy_replay_inner(&lhs, name_map, replay, unused);
21072109

21082110
let lhs_value = ParameterValueType::extract_from_expr(expr);
21092111

@@ -2129,8 +2131,8 @@ pub fn qpy_replay(
21292131
// recurse on the parameter expressions
21302132
let lhs = filter_name_map(lhs, name_map);
21312133
let rhs = filter_name_map(rhs, name_map);
2132-
qpy_replay(&lhs, name_map, replay);
2133-
qpy_replay(&rhs, name_map, replay);
2134+
qpy_replay_inner(&lhs, name_map, replay, unused);
2135+
qpy_replay_inner(&rhs, name_map, replay, unused);
21342136

21352137
// add the expression to the replay
21362138
match lhs_value {

crates/circuit/src/parameter/symbol_expr.rs

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
use hashbrown::HashMap;
1414
use pyo3::IntoPyObjectExt;
1515
use pyo3::exceptions::PyValueError;
16+
use std::borrow::Cow;
1617
use std::cmp::Ordering;
1718
use std::cmp::PartialOrd;
1819
use std::convert::From;
@@ -123,12 +124,17 @@ impl Symbol {
123124
&self.name
124125
}
125126

127+
pub fn fullname(&self) -> Cow<'_, str> {
128+
self.index
129+
.map(|i| Cow::Owned(format!("{}[{}]", &self.name, i)))
130+
.unwrap_or(Cow::Borrowed(&self.name))
131+
}
132+
126133
pub fn repr(&self, with_uuid: bool) -> String {
127-
match (self.index, with_uuid) {
128-
(Some(i), true) => format!("{}[{}]_{}", self.name, i, self.uuid.as_u128()),
129-
(Some(i), false) => format!("{}[{}]", self.name, i),
130-
(None, true) => format!("{}_{}", self.name, self.uuid.as_u128()),
131-
(None, false) => self.name.clone(),
134+
if with_uuid {
135+
format!("{}_{}", self.fullname(), self.uuid.as_u128())
136+
} else {
137+
self.fullname().into_owned()
132138
}
133139
}
134140
pub fn is_vector_element(&self) -> bool {

crates/qpy/src/formats.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -661,6 +661,11 @@ pub enum ParameterExpressionSymbolPack {
661661
Parameter(ParameterExpressionParameterSymbolPack),
662662
#[brw(magic = b'v')]
663663
ParameterVector(ParameterExpressionParameterVectorSymbolPack),
664+
/// This variant _should not_ exist; it is counter to the QPY spec, and has no semantic meaning.
665+
/// However, Qiskit 2.0 (with QPY 13 non-symengine serialisation but before Rust-space
666+
/// `ParameterExpression` or QPY) would populate the "symbol map" with the raw dictionaries
667+
/// given to `ParameterExpression.subs` calls, which include expressions. The equivalent "read"
668+
/// code would load up the entries, then immediately filter them out to make the symbol map.
664669
#[brw(magic = b'e')]
665670
ParameterExpression(ParameterExpressionParameterExpressionSymbolPack),
666671
}

0 commit comments

Comments
 (0)