Skip to content

Convert Sabre's lookahead to layer-based tracking#14911

Merged
alexanderivrii merged 4 commits intoQiskit:mainfrom
jakelishman:sabre/layers/3
Apr 12, 2026
Merged

Convert Sabre's lookahead to layer-based tracking#14911
alexanderivrii merged 4 commits intoQiskit:mainfrom
jakelishman:sabre/layers/3

Conversation

@jakelishman
Copy link
Copy Markdown
Member

The previous Sabre extended set was just the "next N" 2q gates topologically on from the front layer, where Qiskit reliably used N = 20 ever since its introduction. For small-width circuits (as were common when the original Sabre paper was written, and when it was first implemented in Qiskit), this could mean the extended set was reliably several layers deep. This could also be the case for star-like circuits. For the wider circuits in use now, at the 100q order of magnitude, the 20-gate limit reliably means that denser circuits cannot have their entire next layer considered by the lookahead set.

This commit modifies the lookahead heuristic to be based specifically on layers. This regularises much of the structure of the heuristic with respect to circuit and target topology; we reliably "look ahead" by the same "distance" as far as routing is concerned. It comes with the additional benefits:

  • we can use the same Layer structure for both the front layer and the lookahead layers, which reduces the amount of scoring code

  • the lookahead score of a swap can now affect at most two gates per layer, just like the front-layer scoring, and we can do this statically without loops

  • we no longer risk "biasing" the lookahead heuristic in either case of long chains of dependent gates (e.g. a gate that has 10 predecessors weights the score the same as a gate with only 1) or wide circuits (some qubits have their next layer counted in the score, but others don't because the extended set reached capacity).

  • applying a swap to the lookahead now has a time complexity that is constant per layer, regardless of the number of gates stored in it, whereas previously it was proportional to the number of gates stored (and the implementation in the parent of this commit is proportional to the number of qubits in the circuit).

This change alone is mostly a set up, which enables further computational complexity improvements by modifying the lookahead layers in place after a gate routes, rather than rebuilding them from scratch, and subsequently only updating swap scores based on routing changes, rather than recalculating all from scratch.

@jakelishman jakelishman requested a review from a team as a code owner August 14, 2025 15:10
@jakelishman jakelishman added this to the 2.2.0 milestone Aug 14, 2025
@jakelishman jakelishman added the mod: transpiler Issues and PRs related to Transpiler label Aug 14, 2025
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@coveralls
Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 16969144205

Details

  • 220 of 236 (93.22%) changed or added relevant lines in 7 files are covered.
  • 20 unchanged lines in 7 files lost coverage.
  • Overall coverage increased (+0.007%) to 88.265%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/nlayout.rs 3 6 50.0%
crates/transpiler/src/passes/sabre/route.rs 153 156 98.08%
crates/transpiler/src/passes/sabre/heuristic.rs 15 25 60.0%
Files with Coverage Reduction New Missed Lines %
crates/circuit/src/parameter/parameter_expression.rs 1 83.39%
crates/qasm2/src/expr.rs 1 93.63%
crates/transpiler/src/passes/sabre/dag.rs 1 96.97%
crates/transpiler/src/passes/sabre/heuristic.rs 2 51.37%
crates/transpiler/src/passes/sabre/route.rs 3 94.3%
crates/qasm2/src/lex.rs 6 91.75%
crates/qasm2/src/parse.rs 6 97.56%
Totals Coverage Status
Change from base Build 16967470304: 0.007%
Covered Lines: 87840
Relevant Lines: 99519

💛 - Coveralls

Copy link
Copy Markdown
Member

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jake. The new heuristic looks very promising and I fully understand the motivation behind it from the PR summary.

A few general questions and comments:

  • Since this changes the behavior of the lookahead heuristic, would you like to add a release note?

  • I would really love to see some experimental data. Even though I am on board with the intuition behind the new heuristic, we don't want to risk making sabre accidentally worse.

  • Did I understand correctly that in practice we only use only 1 extended layer (unless I missed something, in every place where LookaheadHeuristic was initialized, a single-element vector was passed to it). Did you experiment with other values?

  • Since I have reviewed the previous PR in the chain, I think I was able to identify the changes from this PR, but I would like to give another careful look once you rebase it on top of main.

  • Nitpick: now Layer represents both the "front layer" and an "extended layer", but comments/docstring still mention "front layer" where just "layer" should be used. I have commented on a few of these, but not all.

Comment thread crates/transpiler/src/passes/sabre/heuristic.rs Outdated
Comment thread crates/transpiler/src/passes/sabre/layer.rs Outdated
Comment thread crates/transpiler/src/passes/sabre/layer.rs Outdated
Comment thread crates/transpiler/src/passes/sabre/layer.rs Outdated
Comment thread crates/cext/src/transpiler/passes/sabre_layout.rs
Comment thread crates/transpiler/src/passes/sabre/route.rs
Comment thread test/python/transpiler/test_sabre_swap.py Outdated
Comment thread test/python/transpiler/test_sabre_swap.py
The previous Sabre extended set was just the "next N" 2q gates
topologically on from the front layer, where Qiskit reliably used
`N = 20` ever since its introduction.  For small-width circuits (as were
common when the original Sabre paper was written, and when it was first
implemented in Qiskit), this could mean the extended set was reliably
several layers deep.  This could also be the case for star-like
circuits.  For the wider circuits in use now, at the 100q order of
magnitude, the 20-gate limit reliably means that denser circuits cannot
have their entire next layer considered by the lookahead set.

This commit modifies the lookahead heuristic to be based specifically on
layers.  This regularises much of the structure of the heuristic with
respect to circuit and target topology; we reliably "look ahead" by the
same "distance" as far as routing is concerned.  It comes with the
additional benefits:

- we can use the same `Layer` structure for both the front layer and the
  lookahead layers, which reduces the amount of scoring code

- the lookahead score of a swap can now affect at most two gates per
  layer, just like the front-layer scoring, and we can do this
  statically without loops

- we no longer risk "biasing" the lookahead heuristic in either case of
  long chains of dependent gates (e.g. a gate that has 10 predecessors
  weights the score the same as a gate with only 1) or wide circuits
  (some qubits have their next layer counted in the score, but others
  don't because the extended set reached capacity).

- applying a swap to the lookahead now has a time complexity that is
  constant per layer, regardless of the number of gates stored in it,
  whereas previously it was proportional to the number of gates stored
  (and the implementation in the parent of this commit is proportional
  to the number of qubits in the circuit).

This change alone is mostly a set up, which enables further
computational complexity improvements by modifying the lookahead layers
in place after a gate routes, rather than rebuilding them from scratch,
and subsequently only updating _swap scores_ based on routing changes,
rather than recalculating all from scratch.
@jakelishman
Copy link
Copy Markdown
Member Author

Since this changes the behavior of the lookahead heuristic, would you like to add a release note?

Done in 7cba5a5

I would really love to see some experimental data. Even though I am on board with the intuition behind the new heuristic, we don't want to risk making sabre accidentally worse.

Judging quality of the output is quite hard to do, but I'll try and unearth the old scripts I was using to demonstrate the cases of quality improvement - it's been quite a while, and it's unfortunate I didn't include them in the PR text when I pushed this.

This PR actually can have a negative impact on runtime because for wide circuits, it can significantly increase the number of gates we consider; before we were hard-limited to 20 gates, but now it's possible for us to consider num_qubits // 2 gates per layer (though only two contribute to any given score). The cost comes because in this PR, we still have to rebuild the entire lookahead-layer structure after a gate is routed. The next PR fixes that, so we only search topologically forwards from the routed gate, which significantly reduces the rebuild time.

I actually have a commit locally on my laptop that shows you can go further than this PR chain, and for the basic/lookahead heuristics, you can actually reduce the algorithmic complexity of choosing the best swap at any program point to constant time (the factors depend only on the hardware topology, not the circuit). In practice, though, I found the extra tracking you have to do to enable that comes with enough cost it's worse overall.

@alexanderivrii
Copy link
Copy Markdown
Member

Thanks Jake for answering my questions.

Judging quality of the output is quite hard to do, but I'll try and unearth the old scripts I was using to demonstrate the cases of quality improvement.

That would be great. Intuitively on average we do want to see improvements in output quality, that is, less swaps introduced.

This PR actually can have a negative impact on runtime because for wide circuits, it can significantly increase the number of gates we consider.

I am willing to accept this (temporary) slowdown, if the output quality improves.

@jakelishman
Copy link
Copy Markdown
Member Author

Ok, so I ran this (pretty messy) script (it takes like 5min or so):

import numpy as np
from dataclasses import dataclass
from pathlib import Path
from qiskit import QuantumCircuit
from qiskit.circuit.library import quantum_volume
from qiskit.synthesis import qft
from qiskit.transpiler import passes, PassManager
from qiskit_ibm_runtime.fake_provider import FakeTorino

backend = FakeTorino()
num_seeds = 10
bench_dir = (
    Path("~").expanduser()
    / "code"
    / "qiskit"
    / "terra"
    / "test"
    / "benchmarks"
    / "qasm"
)

rng = np.random.default_rng(2026_04_08)
seeds = lambda rng: rng.integers((1 << 63) - 1, size=(num_seeds,))


def initials(num_qubits, rng):
    qubits = np.arange(backend.num_qubits, dtype=np.uint32)
    return [
        rng.choice(qubits, size=(num_qubits,), replace=False).tolist()
        for _ in range(num_seeds)
    ]


unroll = passes.Unroll3qOrMore()
layouts = [
    passes.SabreLayout(
        backend.target, seed=seed, max_iterations=2, swap_trials=20, layout_trials=20
    )
    for seed in seeds(rng)
]
route = passes.SabreSwap(
    backend.target, heuristic="lookahead", seed=seeds(rng)[0], trials=20
)


def route_pm(initial):
    return PassManager(
        [
            passes.SetLayout(initial),
            passes.FullAncillaAllocation(backend.target),
            passes.EnlargeWithAncilla(),
            passes.ApplyLayout(),
            route,
        ]
    )


benches = [
    "square_heisenberg_N100.qasm",
]
circuits = {
    bench: unroll(QuantumCircuit.from_qasm_file(bench_dir / bench)) for bench in benches
}
circuits["qv-16"] = quantum_volume(16, 400, seed=seeds(rng)[0])
circuits["qv-133"] = quantum_volume(backend.num_qubits, 400, seed=seeds(rng)[0])
circuits["qft-full-16"] = qft.synth_qft_full(16)
circuits["qft-full-133"] = qft.synth_qft_full(133)
circuits["qft-line-16"] = qft.synth_qft_line(16)
circuits["qft-line-133"] = qft.synth_qft_line(133)


@dataclass
class Stats:
    layout: list[tuple[int, int]]
    route: list[tuple[int, int]]


out = {}
for name, circuit in circuits.items():
    orig_size = circuit.size()
    orig_depth = circuit.depth()
    stat = lambda qc: (qc.size() - orig_size, qc.depth() - orig_depth)

    out[name] = Stats(
        [stat(layout(circuit)) for layout in layouts],
        [
            stat(route_pm(initial).run(circuit))
            for initial in initials(circuit.num_qubits, rng)
        ],
    )
out

Just in summary, the script measures excess (as in: added by routing) size and depth for a small number of benchmarks, both in the case of running complete SabreLayout with various seeds and running only SabreSwap from random initial layouts. Then I just took the mean over each entry:

headings = ["Layout (size)", "Layout (depth)", "Routing (size)", "Routing (depth)"]
rows = {"square_heisenberg_N100.qasm": "heisenberg-100"}

def summarize(stats):
    return np.hstack([np.mean(stats.layout, axis=0), np.mean(stats.route, axis=0)]).tolist()

print("||" + "|".join(headings) + "|")
print("|:-|" + "-:|"*len(headings))
for key, values in out.items():
    print(f"|{rows.get(key, key)}|" + "|".join(f"{int(x)}" for x in summarize(values)) + "|")

With Qiskit 2.3.1, I got:

Layout (size) Layout (depth) Routing (size) Routing (depth)
heisenberg-100 324 701 1094 1369
qv-16 4847 2171 5291 2321
qv-133 192954 19288 194820 21486
qft-full-16 143 83 282 115
qft-full-133 13494 2861 17507 4085
qft-line-16 0 0 189 114
qft-line-133 3254 2961 10981 6297

With this PR, I got:

Layout (size) Layout (depth) Routing (size) Routing (depth)
heisenberg-100 316 848 1139 1740
qv-16 4721 2135 5290 2327
qv-133 176946 18768 186506 21111
qft-full-16 123 66 295 116
qft-full-133 13891 3148 17843 3821
qft-line-16 0 0 219 137
qft-line-133 4171 5185 13087 9731

I think what this is indicating is that really dense layers (like QV) benefit from this, but deeply structured circuits (particularly linearised ones with isolated high-degree connectivity like QFT) can suffer by looking only at a single layer, like the heuristic currently in this PR does.

Just as a test, I also ran it using this PR, but modifying the SabreLayout heuristic (not the routing heuristic, which is why those numbers are identical) to be three layers of decreasing weight and the same decay component, and got:

Layout (size) Layout (depth) Routing (size) Routing (depth)
heisenberg-100 360 712 1139 1740
qv-16 4814 2081 5290 2327
qv-133 178430 19236 186506 21111
qft-full-16 123 62 295 116
qft-full-133 11561 1935 17843 3821
qft-line-16 0 0 219 137
qft-line-133 3820 4212 13087 9731

which is curious to me: seems like QV's depth got a little worse (not sure of the statistics here, but I'm not too surprised - too much lookahead here isn't a good thing because it's so unstructured), QFT-full got better (what I expected) and QFT-linear recovered a lot less than I expected.

I wonder if this is suggesting that the lookahead heuristic should be scaled such that gates in the lookahead that are already close together get more weighting than lookahead gates that are miles apart.

@jakelishman jakelishman removed the on hold Can not fix yet label Apr 9, 2026
@alexanderivrii
Copy link
Copy Markdown
Member

alexanderivrii commented Apr 12, 2026

Thanks @jakelishman for the script. I have rerun it locally and got exactly the same results with main but slightly different results with your PR:

Layout (size) Layout (depth) Routing (size) Routing (depth)
heisenberg-100 382 713 1139 1740
qv-16 4962 2118 5290 2327
qv-133 185401 18963 186506 21111
qft-full-16 126 68 295 116
qft-full-133 12324 1871 17843 3821
qft-line-16 0 0 219 137
qft-line-133 4170 3434 13087 9731

I have also run the ASV utility benchmarks:

Benchmarks that have improved:
| Change   | Before [f8e543ff] <main>   | After [7cba5a5e] <jake-layers>   |   Ratio | Benchmark (Parameter)                                                     |
|----------|----------------------------|----------------------------------|---------|---------------------------------------------------------------------------|
| -        | 711±7ms                    | 645±5ms                          |    0.91 | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                        |
| -        | 86.4±3ms                   | 78.0±0.9ms                       |    0.9  | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')         |
| -        | 686±8ms                    | 613±4ms                          |    0.89 | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                        |
| -        | 91.5±1ms                   | 81.0±1ms                         |    0.89 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')        |
| -        | 93.9±3ms                   | 82.4±1ms                         |    0.88 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')         |
| -        | 743±30ms                   | 645±6ms                          |    0.87 | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                       |
| -        | 67.1±2ms                   | 56.8±1ms                         |    0.85 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                    |
| -        | 63.2±1ms                   | 53.4±0.6ms                       |    0.84 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                    |
| -        | 320±3ms                    | 268±10ms                         |    0.84 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                      |
| -        | 306±1ms                    | 247±2ms                          |    0.81 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                     |
| -        | 65.4±2ms                   | 52.4±0.5ms                       |    0.8  | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                   |
| -        | 489±9ms                    | 377±3ms                          |    0.77 | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                       |
| -        | 480                        | 366                              |    0.76 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')  |
| -        | 480                        | 366                              |    0.76 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')  |
| -        | 480                        | 366                              |    0.76 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr') |
| -        | 275±2ms                    | 207±2ms                          |    0.75 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                      |
| -        | 575±10ms                   | 427±4ms                          |    0.74 | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                       |
| -        | 591±9ms                    | 438±4ms                          |    0.74 | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                      |
| -        | 2692                       | 1801                             |    0.67 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                |
| -        | 2744                       | 1815                             |    0.66 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                |
| -        | 2744                       | 1815                             |    0.66 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')               |

Benchmarks that have stayed the same:

| Change   | Before [f8e543ff] <main>   | After [7cba5a5e] <jake-layers>   | Ratio   | Benchmark (Parameter)                                                         |
|----------|----------------------------|----------------------------------|---------|-------------------------------------------------------------------------------|
|          | 36.8±0.1s                  | 25.3±0.1s                        | ~0.69   | utility_scale.UtilityScaleBenchmarks.time_hwb12('ecr')                        |
|          | 39.2±0.3s                  | 24.7±0.1s                        | ~0.63   | utility_scale.UtilityScaleBenchmarks.time_hwb12('cz')                         |
|          | 32.2±0.08s                 | 19.6±0.06s                       | ~0.61   | utility_scale.UtilityScaleBenchmarks.time_hwb12('cx')                         |
|          | 0                          | 0                                | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx')                 |
|          | 0                          | 0                                | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz')                 |
|          | 0                          | 0                                | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr')                |
|          | 1312                       | 1423                             | 1.08    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('cx')             |
|          | 1313                       | 1423                             | 1.08    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('cz')             |
|          | 1313                       | 1423                             | 1.08    | utility_scale.UtilityScaleBenchmarks.track_circSU2_89_depth('ecr')            |
|          | 4.98±0.08s                 | 5.30±0.03s                       | 1.06    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('ecr')                   |
|          | 6.50±0.1ms                 | 6.80±0.07ms                      | 1.05    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')                        |
|          | 23.3±0.4ms                 | 24.1±0.3ms                       | 1.04    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                       |
|          | 23.3±0.2ms                 | 24.1±0.2ms                       | 1.04    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                      |
|          | 367549                     | 382321                           | 1.04    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('cx')                  |
|          | 20.6±0.4ms                 | 21.2±0.3ms                       | 1.03    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                       |
|          | 56.3±0.6ms                 | 58.1±1ms                         | 1.03    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')               |
|          | 6.55±0.1ms                 | 6.66±0.07ms                      | 1.02    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')                        |
|          | 4.99±0.01s                 | 5.09±0s                          | 1.02    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cx')                    |
|          | 1590                       | 1617                             | 1.02    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                   |
|          | 2571                       | 2628                             | 1.02    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                     |
|          | 2571                       | 2628                             | 1.02    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                     |
|          | 2571                       | 2628                             | 1.02    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                    |
|          | 5.20±0s                    | 5.24±0.05s                       | 1.01    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cz')                    |
|          | 308±2ms                    | 311±1ms                          | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cz')                   |
|          | 1603                       | 1622                             | 1.01    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                   |
|          | 1603                       | 1622                             | 1.01    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                  |
|          | 310±2ms                    | 308±1ms                          | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('ecr')                  |
|          | 5.39±0.02ms                | 5.36±0.05ms                      | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')              |
|          | 300                        | 300                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')                |
|          | 300                        | 300                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')                |
|          | 300                        | 300                              | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')               |
|          | 6.72±0.1ms                 | 6.65±0.1ms                       | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')                       |
|          | 311±2ms                    | 306±1ms                          | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cx')                   |
|          | 5.44±0.04ms                | 5.40±0.03ms                      | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')               |
|          | 57.2±0.3ms                 | 56.4±0.4ms                       | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                |
|          | 57.9±0.2ms                 | 57.0±0.7ms                       | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                |
|          | 18.6±0.05ms                | 18.4±0.2ms                       | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')  |
|          | 18.5±0.2ms                 | 18.3±0.1ms                       | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr') |
|          | 387068                     | 383978                           | 0.99    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('cz')                  |
|          | 5.41±0.07ms                | 5.32±0.02ms                      | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')               |
|          | 391069                     | 383192                           | 0.98    | utility_scale.UtilityScaleBenchmarks.track_hwb12_depth('ecr')                 |
|          | 19.3±0.4ms                 | 18.4±0.2ms                       | 0.96    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')  |

Benchmarks that have got worse:

| Change   |   Before [f8e543ff] <main> |   After [7cba5a5e] <jake-layers> |   Ratio | Benchmark (Parameter)                                          |
|----------|----------------------------|----------------------------------|---------|----------------------------------------------------------------|
| +        |                        400 |                              665 |    1.66 | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')  |
| +        |                        400 |                              665 |    1.66 | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')  |
| +        |                        400 |                              665 |    1.66 | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr') |

I would say that overall we see an improvement in runtime while the quality becomes worse on some families of benchmarks but better on others.

Copy link
Copy Markdown
Member

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively I expected a larger improvement in circuit quality, but the experimental results are mixed. Nevertheless I believe that the new heuristic is better and can be tuned more easily (which might already be done in further PRs in the chain), so I am in favor of merging.

Just one minor comment about the release note.

And, just to verify, we didn't document how exactly Sabre works (limiting the number of gates or the number of layers), so we can treat this as "new feature" and not an "upgrade", do you agree?

Comment thread releasenotes/notes/sabre-lookahead-layers-c00ad07aebfc751c.yaml Outdated
Co-authored-by: Alexander Ivrii <alexi@il.ibm.com>
@jakelishman
Copy link
Copy Markdown
Member Author

Yeah, the mechanism of how the extended set is constructed is more an implementation detail.

I expected this to be more universally good at first too, but I think there are complicating factors for particular circuit structures. All the ideas I have for improving the heuristic want some more formal layer structure, though.

@jakelishman
Copy link
Copy Markdown
Member Author

Also, with asv it's tricky because there's so much variance in layout/routing, but asv uses only one seed.

This PR attempts to make the most like-for-like replacement to the heuristic used in layout and routing, but I think we can investigate playing with the parameters more in follow-ups.

Copy link
Copy Markdown
Member

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jake!

@alexanderivrii alexanderivrii added this pull request to the merge queue Apr 12, 2026
Merged via the queue into Qiskit:main with commit b739944 Apr 12, 2026
26 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Done in Qiskit 2.4 Apr 12, 2026
@jakelishman jakelishman deleted the sabre/layers/3 branch April 12, 2026 19:59
@github-project-automation github-project-automation Bot moved this from Ready to Done in Qiskit 2.5 Apr 15, 2026
@ShellyGarion ShellyGarion added the Changelog: Added Add an "Added" entry in the GitHub Release changelog. label Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: Added Add an "Added" entry in the GitHub Release changelog. mod: transpiler Issues and PRs related to Transpiler

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants