Convert Sabre's lookahead to layer-based tracking#14911
Convert Sabre's lookahead to layer-based tracking#14911alexanderivrii merged 4 commits intoQiskit:mainfrom
lookahead to layer-based tracking#14911Conversation
|
One or more of the following people are relevant to this code:
|
Pull Request Test Coverage Report for Build 16969144205Details
💛 - Coveralls |
2fc9958 to
7d0661c
Compare
7d0661c to
5b4d9ce
Compare
alexanderivrii
left a comment
There was a problem hiding this comment.
Thanks Jake. The new heuristic looks very promising and I fully understand the motivation behind it from the PR summary.
A few general questions and comments:
-
Since this changes the behavior of the lookahead heuristic, would you like to add a release note?
-
I would really love to see some experimental data. Even though I am on board with the intuition behind the new heuristic, we don't want to risk making sabre accidentally worse.
-
Did I understand correctly that in practice we only use only 1 extended layer (unless I missed something, in every place where
LookaheadHeuristicwas initialized, a single-element vector was passed to it). Did you experiment with other values? -
Since I have reviewed the previous PR in the chain, I think I was able to identify the changes from this PR, but I would like to give another careful look once you rebase it on top of
main. -
Nitpick: now
Layerrepresents both the "front layer" and an "extended layer", but comments/docstring still mention "front layer" where just "layer" should be used. I have commented on a few of these, but not all.
The previous Sabre extended set was just the "next N" 2q gates topologically on from the front layer, where Qiskit reliably used `N = 20` ever since its introduction. For small-width circuits (as were common when the original Sabre paper was written, and when it was first implemented in Qiskit), this could mean the extended set was reliably several layers deep. This could also be the case for star-like circuits. For the wider circuits in use now, at the 100q order of magnitude, the 20-gate limit reliably means that denser circuits cannot have their entire next layer considered by the lookahead set. This commit modifies the lookahead heuristic to be based specifically on layers. This regularises much of the structure of the heuristic with respect to circuit and target topology; we reliably "look ahead" by the same "distance" as far as routing is concerned. It comes with the additional benefits: - we can use the same `Layer` structure for both the front layer and the lookahead layers, which reduces the amount of scoring code - the lookahead score of a swap can now affect at most two gates per layer, just like the front-layer scoring, and we can do this statically without loops - we no longer risk "biasing" the lookahead heuristic in either case of long chains of dependent gates (e.g. a gate that has 10 predecessors weights the score the same as a gate with only 1) or wide circuits (some qubits have their next layer counted in the score, but others don't because the extended set reached capacity). - applying a swap to the lookahead now has a time complexity that is constant per layer, regardless of the number of gates stored in it, whereas previously it was proportional to the number of gates stored (and the implementation in the parent of this commit is proportional to the number of qubits in the circuit). This change alone is mostly a set up, which enables further computational complexity improvements by modifying the lookahead layers in place after a gate routes, rather than rebuilding them from scratch, and subsequently only updating _swap scores_ based on routing changes, rather than recalculating all from scratch.
5b4d9ce to
a0f6808
Compare
Done in 7cba5a5
Judging quality of the output is quite hard to do, but I'll try and unearth the old scripts I was using to demonstrate the cases of quality improvement - it's been quite a while, and it's unfortunate I didn't include them in the PR text when I pushed this. This PR actually can have a negative impact on runtime because for wide circuits, it can significantly increase the number of gates we consider; before we were hard-limited to 20 gates, but now it's possible for us to consider I actually have a commit locally on my laptop that shows you can go further than this PR chain, and for the |
|
Thanks Jake for answering my questions.
That would be great. Intuitively on average we do want to see improvements in output quality, that is, less swaps introduced.
I am willing to accept this (temporary) slowdown, if the output quality improves. |
|
Ok, so I ran this (pretty messy) script (it takes like 5min or so): import numpy as np
from dataclasses import dataclass
from pathlib import Path
from qiskit import QuantumCircuit
from qiskit.circuit.library import quantum_volume
from qiskit.synthesis import qft
from qiskit.transpiler import passes, PassManager
from qiskit_ibm_runtime.fake_provider import FakeTorino
backend = FakeTorino()
num_seeds = 10
bench_dir = (
Path("~").expanduser()
/ "code"
/ "qiskit"
/ "terra"
/ "test"
/ "benchmarks"
/ "qasm"
)
rng = np.random.default_rng(2026_04_08)
seeds = lambda rng: rng.integers((1 << 63) - 1, size=(num_seeds,))
def initials(num_qubits, rng):
qubits = np.arange(backend.num_qubits, dtype=np.uint32)
return [
rng.choice(qubits, size=(num_qubits,), replace=False).tolist()
for _ in range(num_seeds)
]
unroll = passes.Unroll3qOrMore()
layouts = [
passes.SabreLayout(
backend.target, seed=seed, max_iterations=2, swap_trials=20, layout_trials=20
)
for seed in seeds(rng)
]
route = passes.SabreSwap(
backend.target, heuristic="lookahead", seed=seeds(rng)[0], trials=20
)
def route_pm(initial):
return PassManager(
[
passes.SetLayout(initial),
passes.FullAncillaAllocation(backend.target),
passes.EnlargeWithAncilla(),
passes.ApplyLayout(),
route,
]
)
benches = [
"square_heisenberg_N100.qasm",
]
circuits = {
bench: unroll(QuantumCircuit.from_qasm_file(bench_dir / bench)) for bench in benches
}
circuits["qv-16"] = quantum_volume(16, 400, seed=seeds(rng)[0])
circuits["qv-133"] = quantum_volume(backend.num_qubits, 400, seed=seeds(rng)[0])
circuits["qft-full-16"] = qft.synth_qft_full(16)
circuits["qft-full-133"] = qft.synth_qft_full(133)
circuits["qft-line-16"] = qft.synth_qft_line(16)
circuits["qft-line-133"] = qft.synth_qft_line(133)
@dataclass
class Stats:
layout: list[tuple[int, int]]
route: list[tuple[int, int]]
out = {}
for name, circuit in circuits.items():
orig_size = circuit.size()
orig_depth = circuit.depth()
stat = lambda qc: (qc.size() - orig_size, qc.depth() - orig_depth)
out[name] = Stats(
[stat(layout(circuit)) for layout in layouts],
[
stat(route_pm(initial).run(circuit))
for initial in initials(circuit.num_qubits, rng)
],
)
outJust in summary, the script measures excess (as in: added by routing) size and depth for a small number of benchmarks, both in the case of running complete headings = ["Layout (size)", "Layout (depth)", "Routing (size)", "Routing (depth)"]
rows = {"square_heisenberg_N100.qasm": "heisenberg-100"}
def summarize(stats):
return np.hstack([np.mean(stats.layout, axis=0), np.mean(stats.route, axis=0)]).tolist()
print("||" + "|".join(headings) + "|")
print("|:-|" + "-:|"*len(headings))
for key, values in out.items():
print(f"|{rows.get(key, key)}|" + "|".join(f"{int(x)}" for x in summarize(values)) + "|")With Qiskit 2.3.1, I got:
With this PR, I got:
I think what this is indicating is that really dense layers (like QV) benefit from this, but deeply structured circuits (particularly linearised ones with isolated high-degree connectivity like QFT) can suffer by looking only at a single layer, like the heuristic currently in this PR does. Just as a test, I also ran it using this PR, but modifying the
which is curious to me: seems like QV's depth got a little worse (not sure of the statistics here, but I'm not too surprised - too much lookahead here isn't a good thing because it's so unstructured), QFT-full got better (what I expected) and QFT-linear recovered a lot less than I expected. I wonder if this is suggesting that the lookahead heuristic should be scaled such that gates in the lookahead that are already close together get more weighting than lookahead gates that are miles apart. |
|
Thanks @jakelishman for the script. I have rerun it locally and got exactly the same results with
I have also run the ASV utility benchmarks: I would say that overall we see an improvement in runtime while the quality becomes worse on some families of benchmarks but better on others. |
alexanderivrii
left a comment
There was a problem hiding this comment.
Intuitively I expected a larger improvement in circuit quality, but the experimental results are mixed. Nevertheless I believe that the new heuristic is better and can be tuned more easily (which might already be done in further PRs in the chain), so I am in favor of merging.
Just one minor comment about the release note.
And, just to verify, we didn't document how exactly Sabre works (limiting the number of gates or the number of layers), so we can treat this as "new feature" and not an "upgrade", do you agree?
Co-authored-by: Alexander Ivrii <alexi@il.ibm.com>
|
Yeah, the mechanism of how the extended set is constructed is more an implementation detail. I expected this to be more universally good at first too, but I think there are complicating factors for particular circuit structures. All the ideas I have for improving the heuristic want some more formal layer structure, though. |
|
Also, with asv it's tricky because there's so much variance in layout/routing, but asv uses only one seed. This PR attempts to make the most like-for-like replacement to the heuristic used in layout and routing, but I think we can investigate playing with the parameters more in follow-ups. |
The previous Sabre extended set was just the "next N" 2q gates topologically on from the front layer, where Qiskit reliably used
N = 20ever since its introduction. For small-width circuits (as were common when the original Sabre paper was written, and when it was first implemented in Qiskit), this could mean the extended set was reliably several layers deep. This could also be the case for star-like circuits. For the wider circuits in use now, at the 100q order of magnitude, the 20-gate limit reliably means that denser circuits cannot have their entire next layer considered by the lookahead set.This commit modifies the lookahead heuristic to be based specifically on layers. This regularises much of the structure of the heuristic with respect to circuit and target topology; we reliably "look ahead" by the same "distance" as far as routing is concerned. It comes with the additional benefits:
we can use the same
Layerstructure for both the front layer and the lookahead layers, which reduces the amount of scoring codethe lookahead score of a swap can now affect at most two gates per layer, just like the front-layer scoring, and we can do this statically without loops
we no longer risk "biasing" the lookahead heuristic in either case of long chains of dependent gates (e.g. a gate that has 10 predecessors weights the score the same as a gate with only 1) or wide circuits (some qubits have their next layer counted in the score, but others don't because the extended set reached capacity).
applying a swap to the lookahead now has a time complexity that is constant per layer, regardless of the number of gates stored in it, whereas previously it was proportional to the number of gates stored (and the implementation in the parent of this commit is proportional to the number of qubits in the circuit).
This change alone is mostly a set up, which enables further computational complexity improvements by modifying the lookahead layers in place after a gate routes, rather than rebuilding them from scratch, and subsequently only updating swap scores based on routing changes, rather than recalculating all from scratch.