Full resolve train by Yakkhini · Pull Request #810 · OpenXiangShan/GEM5

Yakkhini · 2026-04-02T15:15:41Z

Draft.

Summary by CodeRabbit

New Features
- Full packet-based resolve-train feedback with runtime toggles, new predictor hooks, per-instruction metadata, and additional statistics.
- Per-fetch-target generation/identity tracking to validate training against FTQ entries.
Refactor
- Dual-path resolve handling supporting legacy resolved-update and full resolve-train workflows; predictor interfaces extended to accept resolved-train packets.
Bug Fixes
- Prevent IEW signal double-processing per tick and ensure unmatched packets are cleaned up.
Tests
- New unit tests covering resolve-train behavior and bank-conflict scenarios.
Documentation
- Added a frontend full resolve-train review guide.

coderabbitai · 2026-04-02T15:16:17Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds packet-based full resolve-train plumbing: FTQ identity/generation, IEW→Fetch resolve-train entries, Fetch ingestion/validation/packetization and stats, predictor-side canResolveTrain/resolveTrain hooks, and runtime flags to roll out or retain the legacy PC-only resolve-update path.

Changes

Cohort / File(s)	Summary
CPU Config & Predictor Params `src/cpu/o3/BaseO3CPU.py`, `src/cpu/pred/BranchPredictor.py`	Added `enableFullResolveTrain` and `enableLegacyResolveUpdate` boolean params to control rollout of full vs legacy resolve paths.
IEW→Fetch Comm & DynInst `src/cpu/o3/comm.hh`, `src/cpu/o3/dyn_inst.hh`, `src/cpu/o3/dyn_inst.cc`	Added `ResolveTrainEntry` to `IewComm` and vector `resolveTrainEntries`; added `ftqGeneration`/`ftqOffset` fields and accessors in `DynInst`; constructor init updated.
IEW Logic `src/cpu/o3/iew.cc`	Conditionally emit legacy `resolvedCFIs` and/or new `ResolveTrainEntry` records based on flags; clear `resolveTrainEntries` per tick; gate redirect/squash on `control_mispredict`.
Fetch: queueing, packetization, stats `src/cpu/o3/fetch.hh`, `src/cpu/o3/fetch.cc`	New resolve-train types, `resolveTrainQueue` FIFO, per-tick IEW gating, ingestion/merge/filter/packet-builder helpers, stats, and per-inst FTQ generation/offset setting.
FTQ identity & APIs `src/cpu/pred/btb/common.hh`, `src/cpu/pred/btb/ftq.hh`, `src/cpu/pred/btb/ftq.cc`	Added `FetchTargetIdentity`, `ResolvedBranch`, `ResolvedTrainPacket`; `FetchTarget.generation`; FTQ per-thread `nextGeneration`, getters and identity-match helpers.
Decoupled BPU integration `src/cpu/pred/btb/decoupled_bpred.hh`, `src/cpu/pred/btb/decoupled_bpred.cc`	Added `enableFullResolveTrain` flag, FTQ accessors, `resolveTrain(packet, tid)` entry with validation/fan-out to components; gated legacy resolve-update when full-train enabled.
Predictor base hooks `src/cpu/pred/btb/timed_base_pred.hh`	Added virtual `canResolveTrain()` (default true) and `resolveTrain()` (no-op) hooks.
TAGE predictor `src/cpu/pred/btb/btb_tage.hh`, `src/cpu/pred/btb/btb_tage.cc`	Stored BTBEntry snapshots in `TageMeta`; added `canResolveTrain` bank-conflict gating and `resolveTrain` logic (recompute preds, updates, allocation); changed update API to accept explicit mispredict flag.
ITTAGE predictor `src/cpu/pred/btb/btb_ittage.hh`, `src/cpu/pred/btb/btb_ittage.cc`	Added `canResolveTrain`/`resolveTrain` overrides and resolved-train update iteration; tightened alt-table update condition.
MBTB predictor `src/cpu/pred/btb/mbtb.hh`, `src/cpu/pred/btb/mbtb.cc`	Added resolved-train prediction-hit checker, update overloads, `prepareResolveTrainEntries`, and `canResolveTrain`/`resolveTrain` plumbing.
Decoupled BPU Stats `src/cpu/pred/btb/decoupled_bpred_stats.cc`	Added full-resolve-train acceptance/rejection and validation-subreason counters.
Tests & Docs `src/cpu/pred/btb/test/btb_tage.test.cc`, `docs/Gem5_Docs/frontend/full_resolve_train_review_guide.md`	Added unit tests covering resolve-train bank conflicts and selection; added review guide documenting design, rollout flags, and review map.

Sequence Diagram

sequenceDiagram
    participant IEW as IEW (CPU)
    participant Fetch as Fetch
    participant BPU as DecoupledBPU
    participant FTQ as FTQ
    participant COMP as PredictorComponent

    Note over IEW,Fetch: IEW emits per-thread ResolveTrainEntry list
    IEW->>Fetch: push resolveTrainEntries
    Fetch->>Fetch: handleIEWSignals() (per-tick gate, validate FTQ id/generation, merge/filter)
    Fetch->>BPU: resolveTrain(ResolvedTrainPacket)
    BPU->>FTQ: ftqMatchTargetIdentity / ftqTargetGeneration
    loop per-component
        BPU->>COMP: canResolveTrain(packet, target)?
        alt defer (false)
            COMP-->>BPU: false
        else accept (true)
            BPU->>COMP: resolveTrain(packet, target)
            COMP->>COMP: reconstruct preds, update, allocate
            COMP-->>BPU: done
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

cpu-pred-btb: remove fetch stream use fetch target #744 — overlapping FTQ/FetchTarget generation and IEW↔Fetch resolve/train plumbing.
Train ITTAGE at resolve stage #630 — related ITTAGE resolve-train hooks and resolved-update behavior changes.
Train potential entry correctly in resolve update #637 — related resolve-update handling and BTB/resolved bookkeeping adjustments.

Suggested labels

perf, align-kmhv3

Suggested reviewers

jensen-yan
CJ362ff

Poem

🐇 I hop with packets, FTQ tags held tight,
From IEW to Fetch I carry truth and light.
I stitch the branches, merge the train so neat,
Predictors wake and counters meet,
Hooray — new training hops complete! 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.42% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Full resolve train' clearly and specifically describes the main architectural feature being added throughout the changeset—a complete resolve-train packet-based training path for branch predictors.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch full-resolve-train

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-02T15:30:34Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2665`	-
This PR	`2.1686`	📉 `-0.0979` (`-4.32%`)

✅ Difftest smoke test passed!

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

src/cpu/o3/dyn_inst.hh (1)

421-424: Default-initialize the new FTQ metadata fields.

IEW::SquashCheckAfterExe() now reads these on every control instruction. If any DynInst constructor or clone path misses the new setters, resolve training will carry garbage FTQ identity. Please verify the out-of-line constructors, or make the class robust by initializing them here.

🛡️ Proposed change

-    uint64_t ftqGeneration;
+    uint64_t ftqGeneration = 0;
     /** Halfword offset of the instruction within its FTQ target block. */
-    uint8_t ftqOffset;
+    uint8_t ftqOffset = 0;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/dyn_inst.hh` around lines 421 - 424, DynInst's new members
ftqGeneration and ftqOffset aren't default-initialized, causing
IEW::SquashCheckAfterExe() to read garbage when constructors or clone paths omit
setting them; fix by default-initializing these members in the DynInst
declaration (e.g., ftqGeneration = 0, ftqOffset = 0) and audit all out-of-line
constructors and clone/cloneInto methods (any copy/assignment paths that create
new DynInst instances) to ensure they either preserve or explicitly set
ftqGeneration and ftqOffset so resolve training won't pick up uninitialized FTQ
identity.

src/cpu/pred/btb/btb_ittage.cc (1)

399-559: Extract the common ITTAGE training core.

resolveTrain() is now effectively a second copy of update(). Alt-provider handling, useful-bit policy, and allocation rules will drift quickly unless both paths share the same helper.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/btb_ittage.cc` around lines 399 - 559, resolveTrain()
duplicates the core ITTAGE training logic found in update(), risking drift in
alt-provider handling, useful-bit policy, and allocation rules; extract the
shared logic into a single helper (e.g., trainCore or updateTrainingCore) that
encapsulates alt-info handling, useful-bit/reset policy, allocation decision &
entry creation, and counter/target updates, and have both resolveTrain() and
update() call that helper passing the minimal required data (predMeta/TageMeta
fields like preds, tagFoldedHist, altTagFoldedHist, indexFoldedHist, usefulMask,
and the resolved branch info such as exe_target, mispred, mainInfo/altInfo,
packet.startPC, etc.); update references to symbols like
BTBITTAGE::resolveTrain, BTBITTAGE::update, TageMeta, pred.mainInfo/altInfo,
usefulResetCnt, and tageTable when moving code so behavior and stats
(ittageStats.*) remain identical.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 1903-1907: The instruction's FTQ id is currently pinned to thread
0 while generation/offset are taken from fetch_target =
dbpbtb->ftqFetchingTarget(tid), causing mixed (ftqId, generation, ftqOffset)
tuples for non-zero tids; update the code to set the instruction's FTQ id from
the same fetch_target (e.g., use fetch_target.ftqId or
dbpbtb->ftqFetchingTarget(tid).ftqId) before calling
instruction->setFtqGeneration(...) and instruction->setFtqOffset(...), ensuring
the FTQ id, generation, and offset all come from the same thread's fetch_target.

In `@src/cpu/o3/iew.cc`:
- Around line 1559-1562: The code currently clears resolvedCFIs and
resolveTrainEntries only for the first active thread by using ThreadID tid =
*activeThreads->begin(); instead iterate over all ThreadID entries in
activeThreads and clear toFetch->iewInfo[tid].resolvedCFIs and
toFetch->iewInfo[tid].resolveTrainEntries for each thread; update the block that
references activeThreads, toFetch->iewInfo, resolvedCFIs and resolveTrainEntries
so every active SMT thread is cleared rather than just the first.

In `@src/cpu/pred/BranchPredictor.py`:
- Around line 1209-1212: The new Python config flag enableLegacyResolveUpdate is
declared in BranchPredictor.py but not wired into the C++ predictor backend;
update the DecoupledBPUWithBTB implementation to read and store this parameter
just like enableFullResolveTrain is currently handled: add a corresponding
member (e.g., bool enableLegacyResolveUpdate_) to the DecoupledBPUWithBTB class,
initialize it from the Python-config value in the constructor (mirror how
enableFullResolveTrain is read/initialized), and use that member in any legacy
resolve update code paths so the config flag actually controls behavior at
runtime. Ensure constructors, member declarations, and any initialization
lists/reference sites (where enableFullResolveTrain is used) are updated
consistently.

In `@src/cpu/pred/btb/btb_tage.cc`:
- Around line 796-809: The code only regenerates a TAGE prediction when
updateOnRead is true, which skips recomputing when has_original_pred is false
and can prevent correct updates/allocations; change the recomputed assignment to
call generateSinglePrediction whenever has_original_pred is false (i.e.,
recomputed = !has_original_pred ? generateSinglePrediction(btb_entry,
packet.startPC, predMeta) : (updateOnRead ? generateSinglePrediction(...) :
original_pred)), and ensure the hasRecomputedVsOriginalDiff comparison is
guarded by has_original_pred so you don't compare recomputed vs original_pred
when original is absent; keep the call to
updatePredictorStateAndCheckAllocation(btb_entry, actual_taken, recomputed,
resolved.mispredict) as-is so allocations happen with the regenerated
prediction.

In `@src/cpu/pred/btb/common.hh`:
- Around line 381-384: The default constructor FetchTarget() leaves the member
tid uninitialized; update the FetchTarget() initializer list to explicitly
initialize tid (e.g., tid(0)) alongside generation, startPC, and predTaken so
default-constructed FetchTarget instances have a defined thread id; modify the
FetchTarget() initializer list in class/struct FetchTarget to include tid
initialization.

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 661-665: The resolve-update loop currently skips training for
mbtb, tage, and ittage when enableFullResolveTrain is set, but misses microtage,
causing BTBTAGE-derived instances (microtage) to be trained twice; update the
conditional in the block that checks components[i]->getResolvedUpdate() to also
include components[i] == microtage (and do the same for the similar block around
the other occurrence at the 676-679 range) so that microtage is skipped from the
legacy resolve-update path when enableFullResolveTrain is enabled, ensuring
BTBTAGE::resolveTrain only performs the packet-trained updates once per
instance.

In `@src/cpu/pred/btb/mbtb.cc`:
- Around line 509-520: The resolve-train hit check incorrectly uses full struct
equality between taken_branch->branch and entries in meta->hit_entries, which
fails when the packet-side sets ResolvedBranch::branch.resolved; update the
check in mbtb.cc (the loop over meta->hit_entries that sets pred_branch_hit) to
ignore the resolved flag — e.g., compare only the relevant prediction fields
(such as branch.pc and other prediction-identifying fields) or mask/clear the
resolved bit on either side before equality — so that differing resolved state
does not produce spurious updateMiss increments.

---

Nitpick comments:
In `@src/cpu/o3/dyn_inst.hh`:
- Around line 421-424: DynInst's new members ftqGeneration and ftqOffset aren't
default-initialized, causing IEW::SquashCheckAfterExe() to read garbage when
constructors or clone paths omit setting them; fix by default-initializing these
members in the DynInst declaration (e.g., ftqGeneration = 0, ftqOffset = 0) and
audit all out-of-line constructors and clone/cloneInto methods (any
copy/assignment paths that create new DynInst instances) to ensure they either
preserve or explicitly set ftqGeneration and ftqOffset so resolve training won't
pick up uninitialized FTQ identity.

In `@src/cpu/pred/btb/btb_ittage.cc`:
- Around line 399-559: resolveTrain() duplicates the core ITTAGE training logic
found in update(), risking drift in alt-provider handling, useful-bit policy,
and allocation rules; extract the shared logic into a single helper (e.g.,
trainCore or updateTrainingCore) that encapsulates alt-info handling,
useful-bit/reset policy, allocation decision & entry creation, and
counter/target updates, and have both resolveTrain() and update() call that
helper passing the minimal required data (predMeta/TageMeta fields like preds,
tagFoldedHist, altTagFoldedHist, indexFoldedHist, usefulMask, and the resolved
branch info such as exe_target, mispred, mainInfo/altInfo, packet.startPC,
etc.); update references to symbols like BTBITTAGE::resolveTrain,
BTBITTAGE::update, TageMeta, pred.mainInfo/altInfo, usefulResetCnt, and
tageTable when moving code so behavior and stats (ittageStats.*) remain
identical.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 606287e0-bdad-4b56-bb5c-89e172bf98ac

📥 Commits

Reviewing files that changed from the base of the PR and between 4b6b1dc and 1922023.

📒 Files selected for processing (21)

src/cpu/o3/BaseO3CPU.py
src/cpu/o3/comm.hh
src/cpu/o3/dyn_inst.cc
src/cpu/o3/dyn_inst.hh
src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh
src/cpu/o3/iew.cc
src/cpu/pred/BranchPredictor.py
src/cpu/pred/btb/btb_ittage.cc
src/cpu/pred/btb/btb_ittage.hh
src/cpu/pred/btb/btb_tage.cc
src/cpu/pred/btb/btb_tage.hh
src/cpu/pred/btb/common.hh
src/cpu/pred/btb/decoupled_bpred.cc
src/cpu/pred/btb/decoupled_bpred.hh
src/cpu/pred/btb/ftq.cc
src/cpu/pred/btb/ftq.hh
src/cpu/pred/btb/mbtb.cc
src/cpu/pred/btb/mbtb.hh
src/cpu/pred/btb/test/btb_tage.test.cc
src/cpu/pred/btb/timed_base_pred.hh

src/cpu/o3/fetch.cc

coderabbitai · 2026-04-02T15:51:43Z

src/cpu/o3/iew.cc

    // Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend
    ThreadID tid = *activeThreads->begin();
    toFetch->iewInfo[tid].resolvedCFIs.clear();
+    toFetch->iewInfo[tid].resolveTrainEntries.clear();


⚠️ Potential issue | 🟠 Major

Clear the resolve queues for every thread, not just the first active one.

Line 1560 still picks only *activeThreads->begin(). On SMT runs, stale resolvedCFIs / resolveTrainEntries from the other threads can leak into the next cycle and train against the wrong FTQ state.

🔧 Proposed fix

- ThreadID tid = *activeThreads->begin(); - toFetch->iewInfo[tid].resolvedCFIs.clear(); - toFetch->iewInfo[tid].resolveTrainEntries.clear(); + for (ThreadID tid = 0; tid < numThreads; ++tid) { + toFetch->iewInfo[tid].resolvedCFIs.clear(); + toFetch->iewInfo[tid].resolveTrainEntries.clear(); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend

ThreadID tid = *activeThreads->begin();

toFetch->iewInfo[tid].resolvedCFIs.clear();

toFetch->iewInfo[tid].resolveTrainEntries.clear();

// Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend

for (ThreadID tid = 0; tid < numThreads; ++tid) {

toFetch->iewInfo[tid].resolvedCFIs.clear();

toFetch->iewInfo[tid].resolveTrainEntries.clear();

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/o3/iew.cc` around lines 1559 - 1562, The code currently clears resolvedCFIs and resolveTrainEntries only for the first active thread by using ThreadID tid = *activeThreads->begin(); instead iterate over all ThreadID entries in activeThreads and clear toFetch->iewInfo[tid].resolvedCFIs and toFetch->iewInfo[tid].resolveTrainEntries for each thread; update the block that references activeThreads, toFetch->iewInfo, resolvedCFIs and resolveTrainEntries so every active SMT thread is cleared rather than just the first.

coderabbitai · 2026-04-02T15:51:44Z

src/cpu/pred/BranchPredictor.py

+    enableFullResolveTrain = Param.Bool(Parent.enableFullResolveTrain,
+        "Enable packet-based resolve training rollout plumbing")
+    enableLegacyResolveUpdate = Param.Bool(Parent.enableLegacyResolveUpdate,
+        "Enable legacy PC-only resolve update")


⚠️ Potential issue | 🟠 Major

enableLegacyResolveUpdate is exposed here but not wired in the C++ predictor.

Right now this new parameter appears to be a no-op on the DecoupledBPUWithBTB backend (only enableFullResolveTrain is declared/initialized there), which can silently invalidate config-driven experiments.

Suggested backend wiring update

--- a/src/cpu/pred/btb/decoupled_bpred.hh +++ b/src/cpu/pred/btb/decoupled_bpred.hh @@ const unsigned resolveBlockThreshold; const bool enableFullResolveTrain; + const bool enableLegacyResolveUpdate;

--- a/src/cpu/pred/btb/decoupled_bpred.cc +++ b/src/cpu/pred/btb/decoupled_bpred.cc @@ resolveBlockThreshold(p.resolveBlockThreshold), enableFullResolveTrain(p.enableFullResolveTrain), + enableLegacyResolveUpdate(p.enableLegacyResolveUpdate), dbpBtbStats(this, p.numStages, p.fsq_size, maxInstsNum)

🧰 Tools

🪛 Ruff (0.15.7)

[error] 1209-1209: Param may be undefined, or defined from star imports

(F405)

[error] 1209-1209: Parent may be undefined, or defined from star imports

(F405)

[error] 1211-1211: Param may be undefined, or defined from star imports

(F405)

[error] 1211-1211: Parent may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/pred/BranchPredictor.py` around lines 1209 - 1212, The new Python config flag enableLegacyResolveUpdate is declared in BranchPredictor.py but not wired into the C++ predictor backend; update the DecoupledBPUWithBTB implementation to read and store this parameter just like enableFullResolveTrain is currently handled: add a corresponding member (e.g., bool enableLegacyResolveUpdate_) to the DecoupledBPUWithBTB class, initialize it from the Python-config value in the constructor (mirror how enableFullResolveTrain is read/initialized), and use that member in any legacy resolve update code paths so the config flag actually controls behavior at runtime. Ensure constructors, member declarations, and any initialization lists/reference sites (where enableFullResolveTrain is used) are updated consistently.

src/cpu/pred/btb/btb_tage.cc

coderabbitai · 2026-04-02T15:51:44Z

src/cpu/pred/btb/common.hh

   FetchTarget()
-       : startPC(0),
+       : generation(0),
+         startPC(0),
         predTaken(false),


⚠️ Potential issue | 🟡 Minor

Initialize FetchTarget::tid in the default constructor.

tid is left uninitialized while adjacent fields are explicitly initialized. This can surface as undefined behavior if a default-constructed FetchTarget is observed before assignment.

Suggested fix

- FetchTarget() - : generation(0), + FetchTarget() + : tid(0), + generation(0), startPC(0),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

FetchTarget()

: startPC(0),

: generation(0),

startPC(0),

predTaken(false),

FetchTarget()

: tid(0),

generation(0),

startPC(0),

predTaken(false),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/pred/btb/common.hh` around lines 381 - 384, The default constructor FetchTarget() leaves the member tid uninitialized; update the FetchTarget() initializer list to explicitly initialize tid (e.g., tid(0)) alongside generation, startPC, and predTaken so default-constructed FetchTarget instances have a defined thread id; modify the FetchTarget() initializer list in class/struct FetchTarget to include tid initialization.

src/cpu/pred/btb/decoupled_bpred.cc

coderabbitai · 2026-04-02T15:51:44Z

src/cpu/pred/btb/mbtb.cc

+    for (const auto &e : meta->hit_entries) {
+        if (taken_branch->branch == e) {
+            pred_branch_hit = true;
+            break;
+        }
+    }
+
+    if (!pred_branch_hit) {
+        DPRINTF(BTB, "resolve-train miss detected, pc %#lx\n",
+                taken_branch->branch.pc);
+        btbStats.updateMiss++;
+    } else {


⚠️ Potential issue | 🟡 Minor

Ignore the resolved bit in the resolve-train hit check.

The packet side marks ResolvedBranch::branch.resolved = true, while meta->hit_entries are prediction-time BTB entries. A full-struct equality here turns taken hits into updateMiss as soon as that flag differs.

💡 Suggested fix

- if (taken_branch->branch == e) { + if (taken_branch->branch.pc == e.pc && + taken_branch->branch.target == e.target && + taken_branch->branch.isCond == e.isCond && + taken_branch->branch.isIndirect == e.isIndirect && + taken_branch->branch.isDirect == e.isDirect && + taken_branch->branch.isCall == e.isCall && + taken_branch->branch.isReturn == e.isReturn && + taken_branch->branch.size == e.size) { pred_branch_hit = true; break; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/pred/btb/mbtb.cc` around lines 509 - 520, The resolve-train hit check incorrectly uses full struct equality between taken_branch->branch and entries in meta->hit_entries, which fails when the packet-side sets ResolvedBranch::branch.resolved; update the check in mbtb.cc (the loop over meta->hit_entries that sets pred_branch_hit) to ignore the resolved flag — e.g., compare only the relevant prediction fields (such as branch.pc and other prediction-identifying fields) or mask/clear the resolved bit on either side before equality — so that differing resolved state does not produce spurious updateMiss increments.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

src/cpu/o3/fetch.hh (1)
473-505: Keep the resolve-train queue plumbing private.

ResolveTrainInstData, ResolveTrainQueueEntry, and these four helper methods are internal to Fetch. Leaving them in the public section makes the queue format part of the class API for no external caller.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.hh` around lines 473 - 505, The ResolveTrainInstData and
ResolveTrainQueueEntry structs and the helper methods makeResolveTrainInstData,
buildResolvedTrainPacket, appendResolveTrainInst, and filterResolveTrainQueue
are internal implementation details and should not be in Fetch's public API;
move these declarations into the private section of the Fetch class (keeping
their names intact) so they are only accessible to Fetch internals, and if any
external code currently relies on them, provide a public accessor or refactor
that usage to avoid exposing the internal queue format.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 1526-1534: The current check uses resolveQueue.size() >
resolveQueueSize - 4 which underflows when resolveQueueSize is 1..3; update the
logic in the enableLegacyResolveUpdate block (where resolveQueueSize,
resolveQueue, enqueueSize, and fetchStats are used) to clamp or bound the
subtraction—e.g. compute a safe threshold like const auto threshold =
(resolveQueueSize > 4 ? resolveQueueSize - 4 : 0) and then compare
resolveQueue.size() > threshold (or explicitly handle small resolveQueueSize
values) so the queue-full branch fires correctly and legacy fallback cannot
exceed the configured bound.

In `@src/cpu/o3/iew.cc`:
- Around line 1559-1562: The code only clears resolvedCFIs and
resolveTrainEntries for the first active thread (using ThreadID tid =
*activeThreads->begin()), which lets other SMT thread slots retain stale
entries; update the logic to iterate over all thread IDs (or all entries in
toFetch->iewInfo corresponding to the thread set) and clear both
toFetch->iewInfo[thread].resolvedCFIs and
toFetch->iewInfo[thread].resolveTrainEntries for every thread before execute
starts so no per-thread resolve/train buffers survive into the next tick; use
the existing activeThreads or the keys/indexes of toFetch->iewInfo to locate
each thread slot.

In `@src/cpu/pred/btb/btb_tage.cc`:
- Around line 758-809: When predMeta->preds does not contain btb_entry.pc and
updateOnRead is false, recomputed is left as a default TagePrediction which
misrepresents the snapshot; mirror update()'s fallback by regenerating the
prediction from the snapshot in that case: if (!has_original_pred &&
!updateOnRead) set recomputed = generateSinglePrediction(btb_entry,
packet.startPC, predMeta) before comparing taken/results and calling
updatePredictorStateAndCheckAllocation so allocation/counter updates use the
correct reconstructed prediction (use symbols recomputed, original_pred,
updateOnRead, generateSinglePrediction, predMeta->preds, and
updatePredictorStateAndCheckAllocation to locate the change).

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 698-726: The validation checks in resolveUpdate handling (the tid
mismatch, ftq.matchTargetIdentity, startPC mismatch, and
validateResolvedTrainPacket failures) are permanent validation failures and
should be treated as dropped packets rather than retryable bank conflicts;
update the early-exit paths in the code around validateResolvedTrainPacket,
ftq.matchTargetIdentity, and the tid/startPC comparisons so they return true (as
resolveUpdate uses true to indicate stale/missing/consumed packets) instead of
false, ensuring resolveUpdate() signals a drop and does not cause the
dequeue/retry loop (which expects false only for transient canResolveUpdate()
blockers) to spin forever.
- Around line 47-77: The loop currently only treats a taken branch as terminal
(seenTaken) and allows later branches after a mispredicted branch, which can
train wrong-path state; update the validator to treat any resolved.mispredict as
terminal as well. Concretely, change the logic that uses seenTaken to a terminal
flag (e.g., seenTerminal) and set it when resolved.taken || resolved.mispredict
is true; keep the early-return at loop top (if seenTerminal) so any branch after
a mispredict or taken causes the packet to be rejected. Ensure you reference
packet.realBranches and the ResolvedBranch fields (taken and mispredict) and
preserve the existing ordering/ftqOffset and pc checks (firstBranch, lastOffset,
lastPc).

---

Nitpick comments:
In `@src/cpu/o3/fetch.hh`:
- Around line 473-505: The ResolveTrainInstData and ResolveTrainQueueEntry
structs and the helper methods makeResolveTrainInstData,
buildResolvedTrainPacket, appendResolveTrainInst, and filterResolveTrainQueue
are internal implementation details and should not be in Fetch's public API;
move these declarations into the private section of the Fetch class (keeping
their names intact) so they are only accessible to Fetch internals, and if any
external code currently relies on them, provide a public accessor or refactor
that usage to avoid exposing the internal queue format.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e908f2d-a886-48d9-8994-bfbc24ec6476

📥 Commits

Reviewing files that changed from the base of the PR and between 4b6b1dc and 1922023.

📒 Files selected for processing (21)

src/cpu/o3/BaseO3CPU.py
src/cpu/o3/comm.hh
src/cpu/o3/dyn_inst.cc
src/cpu/o3/dyn_inst.hh
src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh
src/cpu/o3/iew.cc
src/cpu/pred/BranchPredictor.py
src/cpu/pred/btb/btb_ittage.cc
src/cpu/pred/btb/btb_ittage.hh
src/cpu/pred/btb/btb_tage.cc
src/cpu/pred/btb/btb_tage.hh
src/cpu/pred/btb/common.hh
src/cpu/pred/btb/decoupled_bpred.cc
src/cpu/pred/btb/decoupled_bpred.hh
src/cpu/pred/btb/ftq.cc
src/cpu/pred/btb/ftq.hh
src/cpu/pred/btb/mbtb.cc
src/cpu/pred/btb/mbtb.hh
src/cpu/pred/btb/test/btb_tage.test.cc
src/cpu/pred/btb/timed_base_pred.hh

src/cpu/o3/fetch.cc

coderabbitai · 2026-04-02T16:03:16Z

src/cpu/o3/iew.cc

    // Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend
    ThreadID tid = *activeThreads->begin();
    toFetch->iewInfo[tid].resolvedCFIs.clear();
+    toFetch->iewInfo[tid].resolveTrainEntries.clear();


⚠️ Potential issue | 🟠 Major

Clear the IEW→Fetch resolve buffers for every thread.

Line 1560 only clears *activeThreads->begin(). On SMT, resolvedCFIs and resolveTrainEntries for the other thread slots survive into the next tick and can be emitted again as stale resolve/train traffic. Clear these vectors for all thread slots before execute starts.

Proposed fix

- ThreadID tid = *activeThreads->begin(); - toFetch->iewInfo[tid].resolvedCFIs.clear(); - toFetch->iewInfo[tid].resolveTrainEntries.clear(); + for (ThreadID tid = 0; tid < numThreads; ++tid) { + toFetch->iewInfo[tid].resolvedCFIs.clear(); + toFetch->iewInfo[tid].resolveTrainEntries.clear(); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend

ThreadID tid = *activeThreads->begin();

toFetch->iewInfo[tid].resolvedCFIs.clear();

toFetch->iewInfo[tid].resolveTrainEntries.clear();

// Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend

for (ThreadID tid = 0; tid < numThreads; ++tid) {

toFetch->iewInfo[tid].resolvedCFIs.clear();

toFetch->iewInfo[tid].resolveTrainEntries.clear();

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/o3/iew.cc` around lines 1559 - 1562, The code only clears resolvedCFIs and resolveTrainEntries for the first active thread (using ThreadID tid = *activeThreads->begin()), which lets other SMT thread slots retain stale entries; update the logic to iterate over all thread IDs (or all entries in toFetch->iewInfo corresponding to the thread set) and clear both toFetch->iewInfo[thread].resolvedCFIs and toFetch->iewInfo[thread].resolveTrainEntries for every thread before execute starts so no per-thread resolve/train buffers survive into the next tick; use the existing activeThreads or the keys/indexes of toFetch->iewInfo to locate each thread slot.

src/cpu/pred/btb/btb_tage.cc

coderabbitai · 2026-04-02T16:03:16Z

src/cpu/pred/btb/decoupled_bpred.cc

+    uint8_t lastOffset = 0;
+    Addr lastPc = 0;
+    bool firstBranch = true;
+    bool seenTaken = false;
+    for (const auto &resolved : packet.realBranches) {
+        if (resolved.branch.pc < packet.startPC) {
+            return false;
+        }
+
+        if (resolved.branch.size == 0) {
+            return false;
+        }
+
+        if (seenTaken) {
+            return false;
+        }
+
+        if (!firstBranch) {
+            if (resolved.ftqOffset < lastOffset) {
+                return false;
+            }
+            if (resolved.ftqOffset == lastOffset && resolved.branch.pc <= lastPc) {
+                return false;
+            }
+        }
+
+        lastOffset = resolved.ftqOffset;
+        lastPc = resolved.branch.pc;
+        seenTaken = resolved.taken;
+        firstBranch = false;
+    }


⚠️ Potential issue | 🟠 Major

Reject packets that continue after a mispredicted branch.

A valid packet should be terminal once any earlier ResolvedBranch has mispredict == true: IEW::SquashCheckAfterExe() stops recording younger control instructions after an older mispredict has armed squash state. The current validator only fences on taken, so {not-taken, mispredict}, later branch... still passes and can train wrong-path state if packet assembly goes wrong.

Proposed fix

- bool seenTaken = false; + bool seenTaken = false; + bool seenMispredict = false; for (const auto &resolved : packet.realBranches) { + if (seenTaken || seenMispredict) { + return false; + } + if (resolved.branch.pc < packet.startPC) { return false; } if (resolved.branch.size == 0) { return false; } - - if (seenTaken) { - return false; - } if (!firstBranch) { if (resolved.ftqOffset < lastOffset) { return false; } @@ lastOffset = resolved.ftqOffset; lastPc = resolved.branch.pc; seenTaken = resolved.taken; + seenMispredict = resolved.mispredict; firstBranch = false; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uint8_t lastOffset = 0;

Addr lastPc = 0;

bool firstBranch = true;

bool seenTaken = false;

for (const auto &resolved : packet.realBranches) {

if (resolved.branch.pc < packet.startPC) {

return false;

}

if (resolved.branch.size == 0) {

return false;

}

if (seenTaken) {

return false;

}

if (!firstBranch) {

if (resolved.ftqOffset < lastOffset) {

return false;

}

if (resolved.ftqOffset == lastOffset && resolved.branch.pc <= lastPc) {

return false;

}

}

lastOffset = resolved.ftqOffset;

lastPc = resolved.branch.pc;

seenTaken = resolved.taken;

firstBranch = false;

}

uint8_t lastOffset = 0;

Addr lastPc = 0;

bool firstBranch = true;

bool seenTaken = false;

bool seenMispredict = false;

for (const auto &resolved : packet.realBranches) {

if (seenTaken || seenMispredict) {

return false;

}

if (resolved.branch.pc < packet.startPC) {

return false;

}

if (resolved.branch.size == 0) {

return false;

}

if (!firstBranch) {

if (resolved.ftqOffset < lastOffset) {

return false;

}

if (resolved.ftqOffset == lastOffset && resolved.branch.pc <= lastPc) {

return false;

}

}

lastOffset = resolved.ftqOffset;

lastPc = resolved.branch.pc;

seenTaken = resolved.taken;

seenMispredict = resolved.mispredict;

firstBranch = false;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 47 - 77, The loop currently only treats a taken branch as terminal (seenTaken) and allows later branches after a mispredicted branch, which can train wrong-path state; update the validator to treat any resolved.mispredict as terminal as well. Concretely, change the logic that uses seenTaken to a terminal flag (e.g., seenTerminal) and set it when resolved.taken || resolved.mispredict is true; keep the early-return at loop top (if seenTerminal) so any branch after a mispredict or taken causes the packet to be rejected. Ensure you reference packet.realBranches and the ResolvedBranch fields (taken and mispredict) and preserve the existing ordering/ftqOffset and pc checks (firstBranch, lastOffset, lastPc).

src/cpu/pred/btb/decoupled_bpred.cc

coderabbitai

🧹 Nitpick comments (1)

docs/Gem5_Docs/frontend/full_resolve_train_review_guide.md (1)

56-61: Clarify the dual-mode behavior for migrated components.

The statement "It is not a double-update mode for migrated components" assumes the reader knows what "double-update mode" means. Consider being more explicit about what actually happens.

📝 Suggested clarification

 This means:
 
 - migrated components use the full packet path
 - non-migrated `resolvedUpdate` components still keep legacy fallback
 
-It is not a double-update mode for migrated components.
+Migrated components (MBTB, BTBTAGE, BTBITTAGE) use ONLY the new packet path and 
+skip legacy resolved-update processing entirely when enableFullResolveTrain is True.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/Gem5_Docs/frontend/full_resolve_train_review_guide.md` around lines 56 -
61, Clarify the sentence about dual-mode by explicitly describing what happens
for migrated components: state that migrated components use the full packet path
and therefore only receive a single consolidated update (no legacy fallback),
whereas non-migrated resolvedUpdate components continue to use the legacy
fallback path; replace "It is not a double-update mode for migrated components"
with a concrete sentence such as "Migrated components receive a single update
via the full packet path (they do not also receive the legacy fallback), while
non-migrated resolvedUpdate components retain the legacy fallback behavior."

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/Gem5_Docs/frontend/full_resolve_train_review_guide.md`:
- Around line 56-61: Clarify the sentence about dual-mode by explicitly
describing what happens for migrated components: state that migrated components
use the full packet path and therefore only receive a single consolidated update
(no legacy fallback), whereas non-migrated resolvedUpdate components continue to
use the legacy fallback path; replace "It is not a double-update mode for
migrated components" with a concrete sentence such as "Migrated components
receive a single update via the full packet path (they do not also receive the
legacy fallback), while non-migrated resolvedUpdate components retain the legacy
fallback behavior."

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7bda24cd-e7ba-4bb1-8294-90ce2e32f308

📥 Commits

Reviewing files that changed from the base of the PR and between 1922023 and a835592.

📒 Files selected for processing (1)

docs/Gem5_Docs/frontend/full_resolve_train_review_guide.md

github-actions · 2026-04-07T03:48:39Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2665`	-
This PR	`2.1686`	📉 `-0.0979` (`-4.32%`)

✅ Difftest smoke test passed!

coderabbitai

♻️ Duplicate comments (3)

src/cpu/o3/fetch.cc (2)

1518-1521: ⚠️ Potential issue | 🟡 Minor

Underflow risk in legacy queue headroom calculation.

When resolveQueueSize is 1..3, the subtraction resolveQueueSize - 4 wraps to a large value (unsigned arithmetic), so the queue-full condition never triggers and the legacy fallback can grow unbounded.

🛠️ Suggested fix

-        if (resolveQueueSize && resolveQueue.size() > resolveQueueSize - 4) {
+        const size_t reserve = std::min<size_t>(resolveQueueSize, 4);
+        if (resolveQueueSize &&
+            resolveQueue.size() + reserve > resolveQueueSize) {
             fetchStats.resolveQueueFullEvents++;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1518 - 1521, The check uses unsigned
arithmetic `resolveQueueSize - 4` which underflows when resolveQueueSize is
1..3; update the condition in fetch.cc so it computes a safe threshold before
comparing (e.g., compute threshold = (resolveQueueSize > 4 ? resolveQueueSize -
4 : 0) or use std::max with 4) and then test `resolveQueue.size() > threshold`;
keep the existing updates to fetchStats.resolveQueueFullEvents and
fetchStats.resolveEnqueueFailEvent (using enqueueSize) unchanged.

1887-1892: ⚠️ Potential issue | 🟠 Major

FTQ id pinned to thread 0 while generation/offset use tid.

Line 1887 uses ftqHeadId(0) while lines 1888-1892 derive generation and offset from ftqFetchingTarget(tid). For non-zero tids, this creates inconsistent (ftqId, generation, ftqOffset) tuples that will fail identity matching in the resolve-train path.

🛠️ Suggested fix

-    instruction->setFtqId(dbpbtb->ftqHeadId(0));
+    instruction->setFtqId(dbpbtb->ftqHeadId(tid));
     const auto &fetch_target = dbpbtb->ftqFetchingTarget(tid);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1887 - 1892, The FTQ id is incorrectly
hardcoded to thread 0 which mismatches the generation/offset derived from
dbpbtb->ftqFetchingTarget(tid); update the code that calls
instruction->setFtqId(dbpbtb->ftqHeadId(0)) to use the current thread id
(dbpbtb->ftqHeadId(tid)) so the tuple (ftqId, generation, ftqOffset) produced by
instruction->setFtqId, instruction->setFtqGeneration(fetch_target.generation)
and instruction->setFtqOffset(...) is consistent with fetch_target and tid (see
fetch_target, this_pc.instAddr(), instShiftAmt).

src/cpu/pred/btb/mbtb.cc (1)

508-514: ⚠️ Potential issue | 🟡 Minor

Struct equality includes resolved flag, causing false misses.

The comparison taken_branch->branch == e uses full struct equality. Since the packet sets resolved = true while meta->hit_entries retain prediction-time values, this will incorrectly classify hits as misses.

🛠️ Suggested fix: Compare by PC instead

     bool pred_branch_hit = false;
     for (const auto &e : meta->hit_entries) {
-        if (taken_branch->branch == e) {
+        if (taken_branch->branch.pc == e.pc) {
             pred_branch_hit = true;
             break;
         }
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/mbtb.cc` around lines 508 - 514, The loop computing
pred_branch_hit wrongly uses full struct equality between taken_branch->branch
and entries in meta->hit_entries (which includes the resolved flag), causing
false misses; update the comparison in the loop to compare only the branch PC
(e.g., compare taken_branch->branch.pc to e.pc) (or another stable identifier
field used for matching) instead of using operator== so that resolved
differences do not affect hit detection for pred_branch_hit.

🧹 Nitpick comments (1)

src/cpu/pred/btb/mbtb.cc (1)

871-874: Unused predicted_hits set.

The unordered_set<Addr> is populated but never queried. Consider removing it or using it for the lookup instead of std::find_if.

♻️ Option 1: Remove unused variable

 std::vector<MBTB::ResolveTrainUpdate>
 MBTB::prepareResolveTrainEntries(const ResolvedTrainPacket &packet,
                                  const BTBMeta *meta)
 {
     std::vector<ResolveTrainUpdate> updates;
-    std::unordered_set<Addr> predicted_hits;
-    for (const auto &entry : meta->hit_entries) {
-        predicted_hits.insert(entry.pc);
-    }

     for (const auto &resolved : packet.realBranches) {

♻️ Option 2: Use set for O(1) lookup

+    std::unordered_map<Addr, const BTBEntry*> pc_to_entry;
+    for (const auto &entry : meta->hit_entries) {
+        pc_to_entry[entry.pc] = &entry;
+    }
-    std::unordered_set<Addr> predicted_hits;
-    for (const auto &entry : meta->hit_entries) {
-        predicted_hits.insert(entry.pc);
-    }

     for (const auto &resolved : packet.realBranches) {
-        auto hit_it = std::find_if(meta->hit_entries.begin(), meta->hit_entries.end(),
-            [&resolved](const BTBEntry &entry) {
-                return entry.pc == resolved.branch.pc;
-            });
-        if (hit_it != meta->hit_entries.end()) {
-            updates.push_back({*hit_it, resolved});
+        auto it = pc_to_entry.find(resolved.branch.pc);
+        if (it != pc_to_entry.end()) {
+            updates.push_back({*(it->second), resolved});
             continue;
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/mbtb.cc` around lines 871 - 874, The populated but unused
unordered_set predicted_hits should be used for O(1) lookups instead of the
current std::find_if; replace the linear search over meta->hit_entries with a
membership test against predicted_hits (e.g., predicted_hits.count(some_pc) or
predicted_hits.contains(some_pc)) wherever the code uses find_if to check if an
entry's pc is present, and ensure predicted_hits is built from meta->hit_entries
(in the same scope where std::find_if is currently used) and remove the old
find_if-based branch if present.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 1518-1521: The check uses unsigned arithmetic `resolveQueueSize -
4` which underflows when resolveQueueSize is 1..3; update the condition in
fetch.cc so it computes a safe threshold before comparing (e.g., compute
threshold = (resolveQueueSize > 4 ? resolveQueueSize - 4 : 0) or use std::max
with 4) and then test `resolveQueue.size() > threshold`; keep the existing
updates to fetchStats.resolveQueueFullEvents and
fetchStats.resolveEnqueueFailEvent (using enqueueSize) unchanged.
- Around line 1887-1892: The FTQ id is incorrectly hardcoded to thread 0 which
mismatches the generation/offset derived from dbpbtb->ftqFetchingTarget(tid);
update the code that calls instruction->setFtqId(dbpbtb->ftqHeadId(0)) to use
the current thread id (dbpbtb->ftqHeadId(tid)) so the tuple (ftqId, generation,
ftqOffset) produced by instruction->setFtqId,
instruction->setFtqGeneration(fetch_target.generation) and
instruction->setFtqOffset(...) is consistent with fetch_target and tid (see
fetch_target, this_pc.instAddr(), instShiftAmt).

In `@src/cpu/pred/btb/mbtb.cc`:
- Around line 508-514: The loop computing pred_branch_hit wrongly uses full
struct equality between taken_branch->branch and entries in meta->hit_entries
(which includes the resolved flag), causing false misses; update the comparison
in the loop to compare only the branch PC (e.g., compare taken_branch->branch.pc
to e.pc) (or another stable identifier field used for matching) instead of using
operator== so that resolved differences do not affect hit detection for
pred_branch_hit.

---

Nitpick comments:
In `@src/cpu/pred/btb/mbtb.cc`:
- Around line 871-874: The populated but unused unordered_set predicted_hits
should be used for O(1) lookups instead of the current std::find_if; replace the
linear search over meta->hit_entries with a membership test against
predicted_hits (e.g., predicted_hits.count(some_pc) or
predicted_hits.contains(some_pc)) wherever the code uses find_if to check if an
entry's pc is present, and ensure predicted_hits is built from meta->hit_entries
(in the same scope where std::find_if is currently used) and remove the old
find_if-based branch if present.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9200612-317b-45b7-957b-a08aa0171b88

📥 Commits

Reviewing files that changed from the base of the PR and between a835592 and 66148cb.

📒 Files selected for processing (11)

src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh
src/cpu/pred/btb/btb_ittage.cc
src/cpu/pred/btb/btb_ittage.hh
src/cpu/pred/btb/btb_tage.cc
src/cpu/pred/btb/btb_tage.hh
src/cpu/pred/btb/common.hh
src/cpu/pred/btb/decoupled_bpred.cc
src/cpu/pred/btb/mbtb.cc
src/cpu/pred/btb/mbtb.hh
src/cpu/pred/btb/timed_base_pred.hh

🚧 Files skipped from review as they are similar to previous changes (1)

src/cpu/pred/btb/timed_base_pred.hh

github-actions · 2026-04-07T15:26:21Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2665`	-
This PR	`2.1980`	📉 `-0.0685` (`-3.02%`)

✅ Difftest smoke test passed!

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/pred/btb/test/btb_tage.test.cc`:
- Around line 408-438: The lookup helpers (findTableWithEntry and
findTablesWithEntry) currently treat any valid entry with matching pc as a hit,
but because indices are history-folded you must also verify the tag for the
current startPC/history; compute the expected tag for table t (e.g. via
tage->getTageTag(startPC, t, meta->tagFoldedHist[t].get()) or the equivalent
tag-fold API on BTBTAGE) and require entry.tag == expectedTag in the
if-condition in both functions (i.e., change the check from entry.valid &&
entry.pc == branchPC to entry.valid && entry.pc == branchPC && entry.tag ==
expectedTag) so the helpers only count true context-matching entries.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63ec0b7f-6913-4faf-9f22-97615eea7c19

📥 Commits

Reviewing files that changed from the base of the PR and between 66148cb and ee45622.

📒 Files selected for processing (4)

src/cpu/o3/fetch.cc
src/cpu/pred/btb/btb_tage.cc
src/cpu/pred/btb/btb_tage.hh
src/cpu/pred/btb/test/btb_tage.test.cc

✅ Files skipped from review due to trivial changes (1)

src/cpu/pred/btb/btb_tage.cc

coderabbitai · 2026-04-07T15:28:29Z

src/cpu/pred/btb/test/btb_tage.test.cc

+int findTableWithEntry(BTBTAGE* tage, Addr startPC, Addr branchPC,
+                       const std::shared_ptr<BTBTAGE::TageMeta>& meta) {
+    for (int t = 0; t < tage->numPredictors; t++) {
+        Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());
+        for (unsigned way = 0; way < tage->numWays[t]; way++) {
+            auto &entry = tage->tageTable[t][index][way];
+            if (entry.valid && entry.pc == branchPC) {
+                return t;
+            }
+        }
+    }
+    return -1;
+}
+
+std::vector<int> findTablesWithEntry(
+    BTBTAGE* tage, Addr startPC, Addr branchPC,
+    const std::shared_ptr<BTBTAGE::TageMeta>& meta)
+{
+    std::vector<int> tables;
+    for (int t = 0; t < tage->numPredictors; t++) {
+        Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());
+        for (unsigned way = 0; way < tage->numWays[t]; way++) {
+            auto &entry = tage->tageTable[t][index][way];
+            if (entry.valid && entry.pc == branchPC) {
+                tables.push_back(t);
+                break;
+            }
+        }
+    }
+    return tables;
+}


⚠️ Potential issue | 🟡 Minor

Match the expected TAGE tag in these lookup helpers.

These helpers currently treat any valid && entry.pc == branchPC way in the indexed set as a hit. Because the set index is history-folded, an older context for the same branch can alias into the same set with a different tag, so the parity check can pass even when the current context would not actually hit that entry. Please include the expected tag for the current startPC/history/position before counting a table as present, and mirror that in both helpers.

🔎 Suggested tightening

int findTableWithEntry(BTBTAGE* tage, Addr startPC, Addr branchPC, const std::shared_ptr<BTBTAGE::TageMeta>& meta) { + unsigned position = tage->getBranchIndexInBlock(branchPC, startPC); for (int t = 0; t < tage->numPredictors; t++) { Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get()); + Addr tag = tage->getTageTag(startPC, t, meta->tagFoldedHist[t].get(), + meta->altTagFoldedHist[t].get(), position); for (unsigned way = 0; way < tage->numWays[t]; way++) { auto &entry = tage->tageTable[t][index][way]; - if (entry.valid && entry.pc == branchPC) { + if (entry.valid && entry.pc == branchPC && entry.tag == tag) { return t; } } } return -1; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

int findTableWithEntry(BTBTAGE* tage, Addr startPC, Addr branchPC,

const std::shared_ptr<BTBTAGE::TageMeta>& meta) {

for (int t = 0; t < tage->numPredictors; t++) {

Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());

for (unsigned way = 0; way < tage->numWays[t]; way++) {

auto &entry = tage->tageTable[t][index][way];

if (entry.valid && entry.pc == branchPC) {

return t;

}

}

}

return -1;

}

std::vector<int> findTablesWithEntry(

BTBTAGE* tage, Addr startPC, Addr branchPC,

const std::shared_ptr<BTBTAGE::TageMeta>& meta)

{

std::vector<int> tables;

for (int t = 0; t < tage->numPredictors; t++) {

Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());

for (unsigned way = 0; way < tage->numWays[t]; way++) {

auto &entry = tage->tageTable[t][index][way];

if (entry.valid && entry.pc == branchPC) {

tables.push_back(t);

break;

}

}

}

return tables;

}

int findTableWithEntry(BTBTAGE* tage, Addr startPC, Addr branchPC,

const std::shared_ptr<BTBTAGE::TageMeta>& meta) {

unsigned position = tage->getBranchIndexInBlock(branchPC, startPC);

for (int t = 0; t < tage->numPredictors; t++) {

Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());

Addr tag = tage->getTageTag(startPC, t, meta->tagFoldedHist[t].get(),

meta->altTagFoldedHist[t].get(), position);

for (unsigned way = 0; way < tage->numWays[t]; way++) {

auto &entry = tage->tageTable[t][index][way];

if (entry.valid && entry.pc == branchPC && entry.tag == tag) {

return t;

}

}

}

return -1;

}

std::vector<int> findTablesWithEntry(

BTBTAGE* tage, Addr startPC, Addr branchPC,

const std::shared_ptr<BTBTAGE::TageMeta>& meta)

{

std::vector<int> tables;

for (int t = 0; t < tage->numPredictors; t++) {

Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());

for (unsigned way = 0; way < tage->numWays[t]; way++) {

auto &entry = tage->tageTable[t][index][way];

if (entry.valid && entry.pc == branchPC) {

tables.push_back(t);

break;

}

}

}

return tables;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/cpu/pred/btb/test/btb_tage.test.cc` around lines 408 - 438, The lookup helpers (findTableWithEntry and findTablesWithEntry) currently treat any valid entry with matching pc as a hit, but because indices are history-folded you must also verify the tag for the current startPC/history; compute the expected tag for table t (e.g. via tage->getTageTag(startPC, t, meta->tagFoldedHist[t].get()) or the equivalent tag-fold API on BTBTAGE) and require entry.tag == expectedTag in the if-condition in both functions (i.e., change the check from entry.valid && entry.pc == branchPC to entry.valid && entry.pc == branchPC && entry.tag == expectedTag) so the helpers only count true context-matching entries.

src/cpu/pred/btb/test/btb_tage.test.cc

coderabbitai

♻️ Duplicate comments (3)

src/cpu/o3/fetch.cc (1)

1528-1536: ⚠️ Potential issue | 🟡 Minor

Clamp the legacy queue headroom calculation to prevent underflow.

When resolveQueueSize is 1, 2, or 3, the expression resolveQueueSize - 4 wraps to a very large unsigned value, causing the queue-full check to never trigger. This allows the legacy resolve queue to grow unbounded in small configurations.

🛠️ Suggested fix

-        if (resolveQueueSize && resolveQueue.size() > resolveQueueSize - 4) {
+        const size_t headroom = std::min<size_t>(resolveQueueSize, 4);
+        if (resolveQueueSize && resolveQueue.size() + headroom > resolveQueueSize) {
             fetchStats.resolveQueueFullEvents++;
             fetchStats.resolveEnqueueFailEvent += enqueueSize;
         } else {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1528 - 1536, The queue-full check in
enableLegacyResolveUpdate can underflow because it computes resolveQueueSize - 4
as an unsigned value; change the logic in the block using resolveQueueSize,
resolveQueue and enqueueSize (inside enableLegacyResolveUpdate) to first compute
a safe threshold (e.g., threshold = resolveQueueSize > 4 ? resolveQueueSize - 4
: 0) or explicitly handle resolveQueueSize <= 4, then compare
resolveQueue.size() > threshold before incrementing
fetchStats.resolveQueueFullEvents and fetchStats.resolveEnqueueFailEvent so
small resolveQueueSize values do not wrap and allow unbounded growth.

src/cpu/pred/btb/decoupled_bpred.cc (2)

645-663: ⚠️ Potential issue | 🟠 Major

Skip microtage in the legacy resolve-update path when enableFullResolveTrain is active.

microtage is also a BTBTAGE instance. If BTBTAGE::resolveTrain() handles packet-trained conditionals, leaving microtage on the legacy resolve-update path means it gets trained twice when enableFullResolveTrain is enabled.

🛠️ Suggested fix

             if (enableFullResolveTrain &&
                 (components[i] == mbtb || components[i] == tage ||
-                 components[i] == ittage)) {
+                 components[i] == ittage || components[i] == microtage)) {
                 continue;
             }

Apply the same change to the second loop (around lines 659-662).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 645 - 663, The second-phase
loop that applies resolved updates skips mbtb, tage, and ittage when
enableFullResolveTrain is true but misses microtage, causing double-training;
inside the for-loop that checks components[i]->getResolvedUpdate(), extend the
existing conditional (which currently checks components[i] == mbtb ||
components[i] == tage || components[i] == ittage) to also include components[i]
== microtage so microtage is skipped in the legacy resolve-update path when
enableFullResolveTrain is active.

24-65: ⚠️ Potential issue | 🟠 Major

Add misprediction as a terminal condition in packet validation.

The validation loop tracks seenTaken to reject packets with branches after a taken branch, but doesn't track mispredictions. When a branch mispredicts (even if not taken), IEW::SquashCheckAfterExe() stops recording younger control instructions. Allowing branches after a mispredicted branch could lead to training on wrong-path state.

🐛 Proposed fix

 DecoupledBPUWithBTB::validateResolvedTrainPacket(const ResolvedTrainPacket &packet)
 {
     uint8_t lastOffset = 0;
     Addr lastPc = 0;
     bool firstBranch = true;
     bool seenTaken = false;
+    bool seenMispredict = false;
     for (const auto &resolved : packet.realBranches) {
+        if (seenTaken || seenMispredict) {
+            dbpBtbStats.fullResolveTrainValidationAfterTaken++;
+            return ResolveTrainValidationReason::AfterTaken;
+        }
+
         if (resolved.branch.pc < packet.startPC) {
             dbpBtbStats.fullResolveTrainValidationPcBeforeStart++;
             return ResolveTrainValidationReason::PcBeforeStart;
         }
 
         if (resolved.branch.size == 0) {
             dbpBtbStats.fullResolveTrainValidationZeroSize++;
             return ResolveTrainValidationReason::ZeroSize;
         }
 
-        if (seenTaken) {
-            dbpBtbStats.fullResolveTrainValidationAfterTaken++;
-            return ResolveTrainValidationReason::AfterTaken;
-        }
-
         if (!firstBranch) {
             // ... offset/pc ordering checks ...
         }
 
         lastOffset = resolved.ftqOffset;
         lastPc = resolved.branch.pc;
         seenTaken = resolved.taken;
+        seenMispredict = resolved.mispredict;
         firstBranch = false;
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 24 - 65, The loop in
validateResolvedTrainPacket (iterating packet.realBranches) treats any branch
after a taken branch as terminal via seenTaken but ignores mispredicted
branches; add a check similar to the seenTaken handling to treat a branch that
is mispredicted as terminal: detect the per-branch misprediction flag on
resolved (e.g., resolved.mispredicted or resolved.mispredict), increment a new
statistic (e.g., dbpBtbStats.fullResolveTrainValidationAfterMispredict++), and
return a new validation reason (e.g.,
ResolveTrainValidationReason::AfterMispredict); also add the new enum value to
ResolveTrainValidationReason and the corresponding stat field to dbpBtbStats so
the code compiles and the condition is handled before accepting the packet.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 1528-1536: The queue-full check in enableLegacyResolveUpdate can
underflow because it computes resolveQueueSize - 4 as an unsigned value; change
the logic in the block using resolveQueueSize, resolveQueue and enqueueSize
(inside enableLegacyResolveUpdate) to first compute a safe threshold (e.g.,
threshold = resolveQueueSize > 4 ? resolveQueueSize - 4 : 0) or explicitly
handle resolveQueueSize <= 4, then compare resolveQueue.size() > threshold
before incrementing fetchStats.resolveQueueFullEvents and
fetchStats.resolveEnqueueFailEvent so small resolveQueueSize values do not wrap
and allow unbounded growth.

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 645-663: The second-phase loop that applies resolved updates skips
mbtb, tage, and ittage when enableFullResolveTrain is true but misses microtage,
causing double-training; inside the for-loop that checks
components[i]->getResolvedUpdate(), extend the existing conditional (which
currently checks components[i] == mbtb || components[i] == tage || components[i]
== ittage) to also include components[i] == microtage so microtage is skipped in
the legacy resolve-update path when enableFullResolveTrain is active.
- Around line 24-65: The loop in validateResolvedTrainPacket (iterating
packet.realBranches) treats any branch after a taken branch as terminal via
seenTaken but ignores mispredicted branches; add a check similar to the
seenTaken handling to treat a branch that is mispredicted as terminal: detect
the per-branch misprediction flag on resolved (e.g., resolved.mispredicted or
resolved.mispredict), increment a new statistic (e.g.,
dbpBtbStats.fullResolveTrainValidationAfterMispredict++), and return a new
validation reason (e.g., ResolveTrainValidationReason::AfterMispredict); also
add the new enum value to ResolveTrainValidationReason and the corresponding
stat field to dbpBtbStats so the code compiles and the condition is handled
before accepting the packet.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a04be19-a090-4a15-87a9-f2a20ad1c152

📥 Commits

Reviewing files that changed from the base of the PR and between ee45622 and 7a06f85.

📒 Files selected for processing (4)

src/cpu/o3/fetch.cc
src/cpu/pred/btb/decoupled_bpred.cc
src/cpu/pred/btb/decoupled_bpred.hh
src/cpu/pred/btb/decoupled_bpred_stats.cc

github-actions · 2026-04-08T06:57:25Z

🚀 Performance test triggered: gcc12-spec06-0.8c

github-actions · 2026-04-08T09:06:45Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2665`	-
This PR	`2.3658`	📈 `+0.0993` (`+4.38%`)

✅ Difftest smoke test passed!

github-actions · 2026-04-08T09:45:54Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2665`	-
This PR	`2.3658`	📈 `+0.0993` (`+4.38%`)

✅ Difftest smoke test passed!

github-actions · 2026-04-08T10:19:30Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2665`	-
This PR	`2.3658`	📈 `+0.0993` (`+4.38%`)

✅ Difftest smoke test passed!

Yakkhini · 2026-04-08T10:32:20Z

src/cpu/o3/fetch.cc

 void
 Fetch::handleIEWSignals()
 {
+    if (lastIewSignalHandleTick == curTick()) {


Is this necessary？

github-actions · 2026-04-09T03:46:02Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2665`	-
This PR	`2.3658`	📈 `+0.0993` (`+4.38%`)

✅ Difftest smoke test passed!

github-actions · 2026-04-09T03:48:32Z

🚀 Performance test triggered: gcc12-spec06-0.8c

Change-Id: Id3ddfead4fd87cf04c650f368f6cfb75ff19b81b

Move MBTB, BTBTAGE, and BTBITTAGE to packet-based resolve training so they consume execution truth directly while preserving legacy fallback for non-migrated components. Change-Id: I4682438d18796127542b9c9c6a23efcb1415a413

Enable packet-based resolve training by default while keeping legacy resolve updates enabled as fallback until the remaining components are migrated. Change-Id: I124f9657b46ad9e19bb0bf81f3d3ec85592d25ab

Add a review guide for the full resolve train branch covering the new dataflow, migrated components, default behavior, and the main files and invariants reviewers should check. Change-Id: Idfa11844ec1b46f59339e589bfd88c0d17d3e47f

Change-Id: I736783b6b3bb469c7a02d060e9317ee24879d31a

Change-Id: Icda8729682a3d1112280dbc1a3b120e922bc830b

Change-Id: I2ca72fa9dadeca3ed0b369ba2ce21c7664768753

Collapse resolved-stage BTB training onto the packet-based full resolve path, remove the old resolvedCFI/resolveQueue helper chain, and reinterpret config intent around full-resolve participation versus commit fallback. Change-Id: I777f46178bc7b1ce891bb2242cefff0b166e48a2

Change-Id: Icbf82e480f0fec8c1119c36851c9f6367835c745

Change-Id: I371e8e306f55cb7eca8991e77db53f8825d10874

Change-Id: If72220a598a23ca2aad8fe50a390b80fdce11a7b

Change-Id: I16d6b8d495ef16e9e20f4b5ca5a8262484674ae6

Change-Id: I6f1de90ea368558913dc913593ac0f193f54976b

github-actions · 2026-04-09T03:56:27Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2665`	-
This PR	`2.3658`	📈 `+0.0993` (`+4.38%`)

✅ Difftest smoke test passed!

XiangShanRobot · 2026-04-09T05:03:55Z

[Generated by GEM5 Performance Robot]
commit: 08425fc
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	20.69	20.69	-0.02 🔴

Yakkhini · 2026-04-09T06:43:36Z

It does not appear that disabling the resolve train causes any significant performance degradation. However, this PR still introduces a substantial increase in source code size, even after cleanup and simplification, mainly due to two features: packet rebuild in fetch and the split resolve train interface in the affected predictors, which no longer reuses update().

I now want to run the 0.8 coverage test with the resolve train enabled to determine whether there is any performance issue in the new resolve train path.

https://github.com/OpenXiangShan/GEM5/actions/runs/24176533553

github-actions · 2026-04-09T09:11:33Z

🚀 Performance test triggered: gcc12-spec06-0.8c

XiangShanRobot · 2026-04-09T11:42:13Z

[Generated by GEM5 Performance Robot]
commit: 21e95cb
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	20.48	21.69	-5.56 🔴

[Generated by GEM5 Performance Robot]
commit: 21e95cb
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

	PR	Previous Commit	Diff(%)
Score	20.48	20.69	-0.99 🔴

Yakkhini · 2026-04-10T06:16:41Z

We never evaluate resolve train on ideal config before. Although there has about 0.2 score decrease appears on this resolve train implementation, but it still better than legacy one (0.3 score decrease).

Ref: #823 (comment)

Besides, there is also about 0.2 score increase on kmhv3 config (16.9 - 17.1).

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

Yakkhini added the perf label Apr 8, 2026

Yakkhini force-pushed the full-resolve-train branch from a23ce3f to dd99cb2 Compare April 8, 2026 08:41

Yakkhini commented Apr 8, 2026

View reviewed changes

Yakkhini force-pushed the full-resolve-train branch from ca14ae6 to 6388b8d Compare April 9, 2026 03:30

Yakkhini force-pushed the full-resolve-train branch from 6388b8d to 08425fc Compare April 9, 2026 03:47

Yakkhini added perf and removed perf labels Apr 9, 2026

Yakkhini added 7 commits April 9, 2026 11:48

cpu-o3: add full resolve train plumbing

c237302

Change-Id: Id3ddfead4fd87cf04c650f368f6cfb75ff19b81b

cpu: migrate resolved updates to train packets

a7fa597

Move MBTB, BTBTAGE, and BTBITTAGE to packet-based resolve training so they consume execution truth directly while preserving legacy fallback for non-migrated components. Change-Id: I4682438d18796127542b9c9c6a23efcb1415a413

cpu: default to full resolve train rollout

c8a4be7

Enable packet-based resolve training by default while keeping legacy resolve updates enabled as fallback until the remaining components are migrated. Change-Id: I124f9657b46ad9e19bb0bf81f3d3ec85592d25ab

misc: document full resolve train rollout

bf8ccb4

Add a review guide for the full resolve train branch covering the new dataflow, migrated components, default behavior, and the main files and invariants reviewers should check. Change-Id: Idfa11844ec1b46f59339e589bfd88c0d17d3e47f

cpu: do not save metas another time

ca46f62

Change-Id: I736783b6b3bb469c7a02d060e9317ee24879d31a

cpu: find and alloc new entries in TAGE correctly

e8e8daf

Change-Id: Icda8729682a3d1112280dbc1a3b120e922bc830b

cpu: filter packet rather than block it with fake path branch

9acf432

Change-Id: I2ca72fa9dadeca3ed0b369ba2ce21c7664768753

Yakkhini added 6 commits April 9, 2026 11:48

misc: dicument update

b833d77

Change-Id: Icbf82e480f0fec8c1119c36851c9f6367835c745

cpu: only filter entries in resolve BTB update

a817c9c

Change-Id: I371e8e306f55cb7eca8991e77db53f8825d10874

misc: cleanup code

03981d9

Change-Id: If72220a598a23ca2aad8fe50a390b80fdce11a7b

misc: move trim func out of anon namespace

dba6636

Change-Id: I16d6b8d495ef16e9e20f4b5ca5a8262484674ae6

misc: remove redundant test & counter

08425fc

Change-Id: I6f1de90ea368558913dc913593ac0f193f54976b

cpu: enable resolve train on ideal config for performance eval

21e95cb

Yakkhini added perf and removed perf labels Apr 9, 2026

Conversation

Yakkhini commented Apr 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Apr 2, 2026

🚀 Coremark Smoke Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 7, 2026

🚀 Coremark Smoke Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 7, 2026

🚀 Coremark Smoke Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

🚀 Coremark Smoke Test Results

Uh oh!

github-actions bot commented Apr 8, 2026

🚀 Coremark Smoke Test Results

Yakkhini commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 2, 2026 •

edited

Loading

Yakkhini commented Apr 9, 2026 •

edited

Loading