Skip to content

Full resolve train#810

Open
Yakkhini wants to merge 14 commits intoxs-devfrom
full-resolve-train
Open

Full resolve train#810
Yakkhini wants to merge 14 commits intoxs-devfrom
full-resolve-train

Conversation

@Yakkhini
Copy link
Copy Markdown
Collaborator

@Yakkhini Yakkhini commented Apr 2, 2026

Draft.

Summary by CodeRabbit

  • New Features

    • Full packet-based resolve-train feedback with runtime toggles, new predictor hooks, per-instruction metadata, and additional statistics.
    • Per-fetch-target generation/identity tracking to validate training against FTQ entries.
  • Refactor

    • Dual-path resolve handling supporting legacy resolved-update and full resolve-train workflows; predictor interfaces extended to accept resolved-train packets.
  • Bug Fixes

    • Prevent IEW signal double-processing per tick and ensure unmatched packets are cleaned up.
  • Tests

    • New unit tests covering resolve-train behavior and bank-conflict scenarios.
  • Documentation

    • Added a frontend full resolve-train review guide.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds packet-based full resolve-train plumbing: FTQ identity/generation, IEW→Fetch resolve-train entries, Fetch ingestion/validation/packetization and stats, predictor-side canResolveTrain/resolveTrain hooks, and runtime flags to roll out or retain the legacy PC-only resolve-update path.

Changes

Cohort / File(s) Summary
CPU Config & Predictor Params
src/cpu/o3/BaseO3CPU.py, src/cpu/pred/BranchPredictor.py
Added enableFullResolveTrain and enableLegacyResolveUpdate boolean params to control rollout of full vs legacy resolve paths.
IEW→Fetch Comm & DynInst
src/cpu/o3/comm.hh, src/cpu/o3/dyn_inst.hh, src/cpu/o3/dyn_inst.cc
Added ResolveTrainEntry to IewComm and vector resolveTrainEntries; added ftqGeneration/ftqOffset fields and accessors in DynInst; constructor init updated.
IEW Logic
src/cpu/o3/iew.cc
Conditionally emit legacy resolvedCFIs and/or new ResolveTrainEntry records based on flags; clear resolveTrainEntries per tick; gate redirect/squash on control_mispredict.
Fetch: queueing, packetization, stats
src/cpu/o3/fetch.hh, src/cpu/o3/fetch.cc
New resolve-train types, resolveTrainQueue FIFO, per-tick IEW gating, ingestion/merge/filter/packet-builder helpers, stats, and per-inst FTQ generation/offset setting.
FTQ identity & APIs
src/cpu/pred/btb/common.hh, src/cpu/pred/btb/ftq.hh, src/cpu/pred/btb/ftq.cc
Added FetchTargetIdentity, ResolvedBranch, ResolvedTrainPacket; FetchTarget.generation; FTQ per-thread nextGeneration, getters and identity-match helpers.
Decoupled BPU integration
src/cpu/pred/btb/decoupled_bpred.hh, src/cpu/pred/btb/decoupled_bpred.cc
Added enableFullResolveTrain flag, FTQ accessors, resolveTrain(packet, tid) entry with validation/fan-out to components; gated legacy resolve-update when full-train enabled.
Predictor base hooks
src/cpu/pred/btb/timed_base_pred.hh
Added virtual canResolveTrain() (default true) and resolveTrain() (no-op) hooks.
TAGE predictor
src/cpu/pred/btb/btb_tage.hh, src/cpu/pred/btb/btb_tage.cc
Stored BTBEntry snapshots in TageMeta; added canResolveTrain bank-conflict gating and resolveTrain logic (recompute preds, updates, allocation); changed update API to accept explicit mispredict flag.
ITTAGE predictor
src/cpu/pred/btb/btb_ittage.hh, src/cpu/pred/btb/btb_ittage.cc
Added canResolveTrain/resolveTrain overrides and resolved-train update iteration; tightened alt-table update condition.
MBTB predictor
src/cpu/pred/btb/mbtb.hh, src/cpu/pred/btb/mbtb.cc
Added resolved-train prediction-hit checker, update overloads, prepareResolveTrainEntries, and canResolveTrain/resolveTrain plumbing.
Decoupled BPU Stats
src/cpu/pred/btb/decoupled_bpred_stats.cc
Added full-resolve-train acceptance/rejection and validation-subreason counters.
Tests & Docs
src/cpu/pred/btb/test/btb_tage.test.cc, docs/Gem5_Docs/frontend/full_resolve_train_review_guide.md
Added unit tests covering resolve-train bank conflicts and selection; added review guide documenting design, rollout flags, and review map.

Sequence Diagram

sequenceDiagram
    participant IEW as IEW (CPU)
    participant Fetch as Fetch
    participant BPU as DecoupledBPU
    participant FTQ as FTQ
    participant COMP as PredictorComponent

    Note over IEW,Fetch: IEW emits per-thread ResolveTrainEntry list
    IEW->>Fetch: push resolveTrainEntries
    Fetch->>Fetch: handleIEWSignals() (per-tick gate, validate FTQ id/generation, merge/filter)
    Fetch->>BPU: resolveTrain(ResolvedTrainPacket)
    BPU->>FTQ: ftqMatchTargetIdentity / ftqTargetGeneration
    loop per-component
        BPU->>COMP: canResolveTrain(packet, target)?
        alt defer (false)
            COMP-->>BPU: false
        else accept (true)
            BPU->>COMP: resolveTrain(packet, target)
            COMP->>COMP: reconstruct preds, update, allocate
            COMP-->>BPU: done
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

perf, align-kmhv3

Suggested reviewers

  • jensen-yan
  • CJ362ff

Poem

🐇 I hop with packets, FTQ tags held tight,
From IEW to Fetch I carry truth and light.
I stitch the branches, merge the train so neat,
Predictors wake and counters meet,
Hooray — new training hops complete! 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Full resolve train' clearly and specifically describes the main architectural feature being added throughout the changeset—a complete resolve-train packet-based training path for branch predictors.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch full-resolve-train

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.1686 📉 -0.0979 (-4.32%)

✅ Difftest smoke test passed!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
src/cpu/o3/dyn_inst.hh (1)

421-424: Default-initialize the new FTQ metadata fields.

IEW::SquashCheckAfterExe() now reads these on every control instruction. If any DynInst constructor or clone path misses the new setters, resolve training will carry garbage FTQ identity. Please verify the out-of-line constructors, or make the class robust by initializing them here.

🛡️ Proposed change
-    uint64_t ftqGeneration;
+    uint64_t ftqGeneration = 0;
     /** Halfword offset of the instruction within its FTQ target block. */
-    uint8_t ftqOffset;
+    uint8_t ftqOffset = 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/dyn_inst.hh` around lines 421 - 424, DynInst's new members
ftqGeneration and ftqOffset aren't default-initialized, causing
IEW::SquashCheckAfterExe() to read garbage when constructors or clone paths omit
setting them; fix by default-initializing these members in the DynInst
declaration (e.g., ftqGeneration = 0, ftqOffset = 0) and audit all out-of-line
constructors and clone/cloneInto methods (any copy/assignment paths that create
new DynInst instances) to ensure they either preserve or explicitly set
ftqGeneration and ftqOffset so resolve training won't pick up uninitialized FTQ
identity.
src/cpu/pred/btb/btb_ittage.cc (1)

399-559: Extract the common ITTAGE training core.

resolveTrain() is now effectively a second copy of update(). Alt-provider handling, useful-bit policy, and allocation rules will drift quickly unless both paths share the same helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/btb_ittage.cc` around lines 399 - 559, resolveTrain()
duplicates the core ITTAGE training logic found in update(), risking drift in
alt-provider handling, useful-bit policy, and allocation rules; extract the
shared logic into a single helper (e.g., trainCore or updateTrainingCore) that
encapsulates alt-info handling, useful-bit/reset policy, allocation decision &
entry creation, and counter/target updates, and have both resolveTrain() and
update() call that helper passing the minimal required data (predMeta/TageMeta
fields like preds, tagFoldedHist, altTagFoldedHist, indexFoldedHist, usefulMask,
and the resolved branch info such as exe_target, mispred, mainInfo/altInfo,
packet.startPC, etc.); update references to symbols like
BTBITTAGE::resolveTrain, BTBITTAGE::update, TageMeta, pred.mainInfo/altInfo,
usefulResetCnt, and tageTable when moving code so behavior and stats
(ittageStats.*) remain identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 1903-1907: The instruction's FTQ id is currently pinned to thread
0 while generation/offset are taken from fetch_target =
dbpbtb->ftqFetchingTarget(tid), causing mixed (ftqId, generation, ftqOffset)
tuples for non-zero tids; update the code to set the instruction's FTQ id from
the same fetch_target (e.g., use fetch_target.ftqId or
dbpbtb->ftqFetchingTarget(tid).ftqId) before calling
instruction->setFtqGeneration(...) and instruction->setFtqOffset(...), ensuring
the FTQ id, generation, and offset all come from the same thread's fetch_target.

In `@src/cpu/o3/iew.cc`:
- Around line 1559-1562: The code currently clears resolvedCFIs and
resolveTrainEntries only for the first active thread by using ThreadID tid =
*activeThreads->begin(); instead iterate over all ThreadID entries in
activeThreads and clear toFetch->iewInfo[tid].resolvedCFIs and
toFetch->iewInfo[tid].resolveTrainEntries for each thread; update the block that
references activeThreads, toFetch->iewInfo, resolvedCFIs and resolveTrainEntries
so every active SMT thread is cleared rather than just the first.

In `@src/cpu/pred/BranchPredictor.py`:
- Around line 1209-1212: The new Python config flag enableLegacyResolveUpdate is
declared in BranchPredictor.py but not wired into the C++ predictor backend;
update the DecoupledBPUWithBTB implementation to read and store this parameter
just like enableFullResolveTrain is currently handled: add a corresponding
member (e.g., bool enableLegacyResolveUpdate_) to the DecoupledBPUWithBTB class,
initialize it from the Python-config value in the constructor (mirror how
enableFullResolveTrain is read/initialized), and use that member in any legacy
resolve update code paths so the config flag actually controls behavior at
runtime. Ensure constructors, member declarations, and any initialization
lists/reference sites (where enableFullResolveTrain is used) are updated
consistently.

In `@src/cpu/pred/btb/btb_tage.cc`:
- Around line 796-809: The code only regenerates a TAGE prediction when
updateOnRead is true, which skips recomputing when has_original_pred is false
and can prevent correct updates/allocations; change the recomputed assignment to
call generateSinglePrediction whenever has_original_pred is false (i.e.,
recomputed = !has_original_pred ? generateSinglePrediction(btb_entry,
packet.startPC, predMeta) : (updateOnRead ? generateSinglePrediction(...) :
original_pred)), and ensure the hasRecomputedVsOriginalDiff comparison is
guarded by has_original_pred so you don't compare recomputed vs original_pred
when original is absent; keep the call to
updatePredictorStateAndCheckAllocation(btb_entry, actual_taken, recomputed,
resolved.mispredict) as-is so allocations happen with the regenerated
prediction.

In `@src/cpu/pred/btb/common.hh`:
- Around line 381-384: The default constructor FetchTarget() leaves the member
tid uninitialized; update the FetchTarget() initializer list to explicitly
initialize tid (e.g., tid(0)) alongside generation, startPC, and predTaken so
default-constructed FetchTarget instances have a defined thread id; modify the
FetchTarget() initializer list in class/struct FetchTarget to include tid
initialization.

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 661-665: The resolve-update loop currently skips training for
mbtb, tage, and ittage when enableFullResolveTrain is set, but misses microtage,
causing BTBTAGE-derived instances (microtage) to be trained twice; update the
conditional in the block that checks components[i]->getResolvedUpdate() to also
include components[i] == microtage (and do the same for the similar block around
the other occurrence at the 676-679 range) so that microtage is skipped from the
legacy resolve-update path when enableFullResolveTrain is enabled, ensuring
BTBTAGE::resolveTrain only performs the packet-trained updates once per
instance.

In `@src/cpu/pred/btb/mbtb.cc`:
- Around line 509-520: The resolve-train hit check incorrectly uses full struct
equality between taken_branch->branch and entries in meta->hit_entries, which
fails when the packet-side sets ResolvedBranch::branch.resolved; update the
check in mbtb.cc (the loop over meta->hit_entries that sets pred_branch_hit) to
ignore the resolved flag — e.g., compare only the relevant prediction fields
(such as branch.pc and other prediction-identifying fields) or mask/clear the
resolved bit on either side before equality — so that differing resolved state
does not produce spurious updateMiss increments.

---

Nitpick comments:
In `@src/cpu/o3/dyn_inst.hh`:
- Around line 421-424: DynInst's new members ftqGeneration and ftqOffset aren't
default-initialized, causing IEW::SquashCheckAfterExe() to read garbage when
constructors or clone paths omit setting them; fix by default-initializing these
members in the DynInst declaration (e.g., ftqGeneration = 0, ftqOffset = 0) and
audit all out-of-line constructors and clone/cloneInto methods (any
copy/assignment paths that create new DynInst instances) to ensure they either
preserve or explicitly set ftqGeneration and ftqOffset so resolve training won't
pick up uninitialized FTQ identity.

In `@src/cpu/pred/btb/btb_ittage.cc`:
- Around line 399-559: resolveTrain() duplicates the core ITTAGE training logic
found in update(), risking drift in alt-provider handling, useful-bit policy,
and allocation rules; extract the shared logic into a single helper (e.g.,
trainCore or updateTrainingCore) that encapsulates alt-info handling,
useful-bit/reset policy, allocation decision & entry creation, and
counter/target updates, and have both resolveTrain() and update() call that
helper passing the minimal required data (predMeta/TageMeta fields like preds,
tagFoldedHist, altTagFoldedHist, indexFoldedHist, usefulMask, and the resolved
branch info such as exe_target, mispred, mainInfo/altInfo, packet.startPC,
etc.); update references to symbols like BTBITTAGE::resolveTrain,
BTBITTAGE::update, TageMeta, pred.mainInfo/altInfo, usefulResetCnt, and
tageTable when moving code so behavior and stats (ittageStats.*) remain
identical.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 606287e0-bdad-4b56-bb5c-89e172bf98ac

📥 Commits

Reviewing files that changed from the base of the PR and between 4b6b1dc and 1922023.

📒 Files selected for processing (21)
  • src/cpu/o3/BaseO3CPU.py
  • src/cpu/o3/comm.hh
  • src/cpu/o3/dyn_inst.cc
  • src/cpu/o3/dyn_inst.hh
  • src/cpu/o3/fetch.cc
  • src/cpu/o3/fetch.hh
  • src/cpu/o3/iew.cc
  • src/cpu/pred/BranchPredictor.py
  • src/cpu/pred/btb/btb_ittage.cc
  • src/cpu/pred/btb/btb_ittage.hh
  • src/cpu/pred/btb/btb_tage.cc
  • src/cpu/pred/btb/btb_tage.hh
  • src/cpu/pred/btb/common.hh
  • src/cpu/pred/btb/decoupled_bpred.cc
  • src/cpu/pred/btb/decoupled_bpred.hh
  • src/cpu/pred/btb/ftq.cc
  • src/cpu/pred/btb/ftq.hh
  • src/cpu/pred/btb/mbtb.cc
  • src/cpu/pred/btb/mbtb.hh
  • src/cpu/pred/btb/test/btb_tage.test.cc
  • src/cpu/pred/btb/timed_base_pred.hh

Comment on lines +1559 to +1562
// Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend
ThreadID tid = *activeThreads->begin();
toFetch->iewInfo[tid].resolvedCFIs.clear();
toFetch->iewInfo[tid].resolveTrainEntries.clear();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clear the resolve queues for every thread, not just the first active one.

Line 1560 still picks only *activeThreads->begin(). On SMT runs, stale resolvedCFIs / resolveTrainEntries from the other threads can leak into the next cycle and train against the wrong FTQ state.

🔧 Proposed fix
-    ThreadID tid = *activeThreads->begin();
-    toFetch->iewInfo[tid].resolvedCFIs.clear();
-    toFetch->iewInfo[tid].resolveTrainEntries.clear();
+    for (ThreadID tid = 0; tid < numThreads; ++tid) {
+        toFetch->iewInfo[tid].resolvedCFIs.clear();
+        toFetch->iewInfo[tid].resolveTrainEntries.clear();
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend
ThreadID tid = *activeThreads->begin();
toFetch->iewInfo[tid].resolvedCFIs.clear();
toFetch->iewInfo[tid].resolveTrainEntries.clear();
// Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend
for (ThreadID tid = 0; tid < numThreads; ++tid) {
toFetch->iewInfo[tid].resolvedCFIs.clear();
toFetch->iewInfo[tid].resolveTrainEntries.clear();
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 1559 - 1562, The code currently clears
resolvedCFIs and resolveTrainEntries only for the first active thread by using
ThreadID tid = *activeThreads->begin(); instead iterate over all ThreadID
entries in activeThreads and clear toFetch->iewInfo[tid].resolvedCFIs and
toFetch->iewInfo[tid].resolveTrainEntries for each thread; update the block that
references activeThreads, toFetch->iewInfo, resolvedCFIs and resolveTrainEntries
so every active SMT thread is cleared rather than just the first.

Comment on lines +1209 to +1212
enableFullResolveTrain = Param.Bool(Parent.enableFullResolveTrain,
"Enable packet-based resolve training rollout plumbing")
enableLegacyResolveUpdate = Param.Bool(Parent.enableLegacyResolveUpdate,
"Enable legacy PC-only resolve update")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

enableLegacyResolveUpdate is exposed here but not wired in the C++ predictor.

Right now this new parameter appears to be a no-op on the DecoupledBPUWithBTB backend (only enableFullResolveTrain is declared/initialized there), which can silently invalidate config-driven experiments.

Suggested backend wiring update
--- a/src/cpu/pred/btb/decoupled_bpred.hh
+++ b/src/cpu/pred/btb/decoupled_bpred.hh
@@
     const unsigned resolveBlockThreshold;
     const bool enableFullResolveTrain;
+    const bool enableLegacyResolveUpdate;
--- a/src/cpu/pred/btb/decoupled_bpred.cc
+++ b/src/cpu/pred/btb/decoupled_bpred.cc
@@
     resolveBlockThreshold(p.resolveBlockThreshold),
     enableFullResolveTrain(p.enableFullResolveTrain),
+    enableLegacyResolveUpdate(p.enableLegacyResolveUpdate),
     dbpBtbStats(this, p.numStages, p.fsq_size, maxInstsNum)
🧰 Tools
🪛 Ruff (0.15.7)

[error] 1209-1209: Param may be undefined, or defined from star imports

(F405)


[error] 1209-1209: Parent may be undefined, or defined from star imports

(F405)


[error] 1211-1211: Param may be undefined, or defined from star imports

(F405)


[error] 1211-1211: Parent may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/BranchPredictor.py` around lines 1209 - 1212, The new Python
config flag enableLegacyResolveUpdate is declared in BranchPredictor.py but not
wired into the C++ predictor backend; update the DecoupledBPUWithBTB
implementation to read and store this parameter just like enableFullResolveTrain
is currently handled: add a corresponding member (e.g., bool
enableLegacyResolveUpdate_) to the DecoupledBPUWithBTB class, initialize it from
the Python-config value in the constructor (mirror how enableFullResolveTrain is
read/initialized), and use that member in any legacy resolve update code paths
so the config flag actually controls behavior at runtime. Ensure constructors,
member declarations, and any initialization lists/reference sites (where
enableFullResolveTrain is used) are updated consistently.

Comment on lines 381 to 384
FetchTarget()
: startPC(0),
: generation(0),
startPC(0),
predTaken(false),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Initialize FetchTarget::tid in the default constructor.

tid is left uninitialized while adjacent fields are explicitly initialized. This can surface as undefined behavior if a default-constructed FetchTarget is observed before assignment.

Suggested fix
-   FetchTarget()
-       : generation(0),
+   FetchTarget()
+       : tid(0),
+         generation(0),
          startPC(0),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FetchTarget()
: startPC(0),
: generation(0),
startPC(0),
predTaken(false),
FetchTarget()
: tid(0),
generation(0),
startPC(0),
predTaken(false),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/common.hh` around lines 381 - 384, The default constructor
FetchTarget() leaves the member tid uninitialized; update the FetchTarget()
initializer list to explicitly initialize tid (e.g., tid(0)) alongside
generation, startPC, and predTaken so default-constructed FetchTarget instances
have a defined thread id; modify the FetchTarget() initializer list in
class/struct FetchTarget to include tid initialization.

Comment on lines +509 to +520
for (const auto &e : meta->hit_entries) {
if (taken_branch->branch == e) {
pred_branch_hit = true;
break;
}
}

if (!pred_branch_hit) {
DPRINTF(BTB, "resolve-train miss detected, pc %#lx\n",
taken_branch->branch.pc);
btbStats.updateMiss++;
} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ignore the resolved bit in the resolve-train hit check.

The packet side marks ResolvedBranch::branch.resolved = true, while meta->hit_entries are prediction-time BTB entries. A full-struct equality here turns taken hits into updateMiss as soon as that flag differs.

💡 Suggested fix
-        if (taken_branch->branch == e) {
+        if (taken_branch->branch.pc == e.pc &&
+            taken_branch->branch.target == e.target &&
+            taken_branch->branch.isCond == e.isCond &&
+            taken_branch->branch.isIndirect == e.isIndirect &&
+            taken_branch->branch.isDirect == e.isDirect &&
+            taken_branch->branch.isCall == e.isCall &&
+            taken_branch->branch.isReturn == e.isReturn &&
+            taken_branch->branch.size == e.size) {
             pred_branch_hit = true;
             break;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/mbtb.cc` around lines 509 - 520, The resolve-train hit check
incorrectly uses full struct equality between taken_branch->branch and entries
in meta->hit_entries, which fails when the packet-side sets
ResolvedBranch::branch.resolved; update the check in mbtb.cc (the loop over
meta->hit_entries that sets pred_branch_hit) to ignore the resolved flag — e.g.,
compare only the relevant prediction fields (such as branch.pc and other
prediction-identifying fields) or mask/clear the resolved bit on either side
before equality — so that differing resolved state does not produce spurious
updateMiss increments.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
src/cpu/o3/fetch.hh (1)

473-505: Keep the resolve-train queue plumbing private.

ResolveTrainInstData, ResolveTrainQueueEntry, and these four helper methods are internal to Fetch. Leaving them in the public section makes the queue format part of the class API for no external caller.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.hh` around lines 473 - 505, The ResolveTrainInstData and
ResolveTrainQueueEntry structs and the helper methods makeResolveTrainInstData,
buildResolvedTrainPacket, appendResolveTrainInst, and filterResolveTrainQueue
are internal implementation details and should not be in Fetch's public API;
move these declarations into the private section of the Fetch class (keeping
their names intact) so they are only accessible to Fetch internals, and if any
external code currently relies on them, provide a public accessor or refactor
that usage to avoid exposing the internal queue format.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 1526-1534: The current check uses resolveQueue.size() >
resolveQueueSize - 4 which underflows when resolveQueueSize is 1..3; update the
logic in the enableLegacyResolveUpdate block (where resolveQueueSize,
resolveQueue, enqueueSize, and fetchStats are used) to clamp or bound the
subtraction—e.g. compute a safe threshold like const auto threshold =
(resolveQueueSize > 4 ? resolveQueueSize - 4 : 0) and then compare
resolveQueue.size() > threshold (or explicitly handle small resolveQueueSize
values) so the queue-full branch fires correctly and legacy fallback cannot
exceed the configured bound.

In `@src/cpu/o3/iew.cc`:
- Around line 1559-1562: The code only clears resolvedCFIs and
resolveTrainEntries for the first active thread (using ThreadID tid =
*activeThreads->begin()), which lets other SMT thread slots retain stale
entries; update the logic to iterate over all thread IDs (or all entries in
toFetch->iewInfo corresponding to the thread set) and clear both
toFetch->iewInfo[thread].resolvedCFIs and
toFetch->iewInfo[thread].resolveTrainEntries for every thread before execute
starts so no per-thread resolve/train buffers survive into the next tick; use
the existing activeThreads or the keys/indexes of toFetch->iewInfo to locate
each thread slot.

In `@src/cpu/pred/btb/btb_tage.cc`:
- Around line 758-809: When predMeta->preds does not contain btb_entry.pc and
updateOnRead is false, recomputed is left as a default TagePrediction which
misrepresents the snapshot; mirror update()'s fallback by regenerating the
prediction from the snapshot in that case: if (!has_original_pred &&
!updateOnRead) set recomputed = generateSinglePrediction(btb_entry,
packet.startPC, predMeta) before comparing taken/results and calling
updatePredictorStateAndCheckAllocation so allocation/counter updates use the
correct reconstructed prediction (use symbols recomputed, original_pred,
updateOnRead, generateSinglePrediction, predMeta->preds, and
updatePredictorStateAndCheckAllocation to locate the change).

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 698-726: The validation checks in resolveUpdate handling (the tid
mismatch, ftq.matchTargetIdentity, startPC mismatch, and
validateResolvedTrainPacket failures) are permanent validation failures and
should be treated as dropped packets rather than retryable bank conflicts;
update the early-exit paths in the code around validateResolvedTrainPacket,
ftq.matchTargetIdentity, and the tid/startPC comparisons so they return true (as
resolveUpdate uses true to indicate stale/missing/consumed packets) instead of
false, ensuring resolveUpdate() signals a drop and does not cause the
dequeue/retry loop (which expects false only for transient canResolveUpdate()
blockers) to spin forever.
- Around line 47-77: The loop currently only treats a taken branch as terminal
(seenTaken) and allows later branches after a mispredicted branch, which can
train wrong-path state; update the validator to treat any resolved.mispredict as
terminal as well. Concretely, change the logic that uses seenTaken to a terminal
flag (e.g., seenTerminal) and set it when resolved.taken || resolved.mispredict
is true; keep the early-return at loop top (if seenTerminal) so any branch after
a mispredict or taken causes the packet to be rejected. Ensure you reference
packet.realBranches and the ResolvedBranch fields (taken and mispredict) and
preserve the existing ordering/ftqOffset and pc checks (firstBranch, lastOffset,
lastPc).

---

Nitpick comments:
In `@src/cpu/o3/fetch.hh`:
- Around line 473-505: The ResolveTrainInstData and ResolveTrainQueueEntry
structs and the helper methods makeResolveTrainInstData,
buildResolvedTrainPacket, appendResolveTrainInst, and filterResolveTrainQueue
are internal implementation details and should not be in Fetch's public API;
move these declarations into the private section of the Fetch class (keeping
their names intact) so they are only accessible to Fetch internals, and if any
external code currently relies on them, provide a public accessor or refactor
that usage to avoid exposing the internal queue format.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e908f2d-a886-48d9-8994-bfbc24ec6476

📥 Commits

Reviewing files that changed from the base of the PR and between 4b6b1dc and 1922023.

📒 Files selected for processing (21)
  • src/cpu/o3/BaseO3CPU.py
  • src/cpu/o3/comm.hh
  • src/cpu/o3/dyn_inst.cc
  • src/cpu/o3/dyn_inst.hh
  • src/cpu/o3/fetch.cc
  • src/cpu/o3/fetch.hh
  • src/cpu/o3/iew.cc
  • src/cpu/pred/BranchPredictor.py
  • src/cpu/pred/btb/btb_ittage.cc
  • src/cpu/pred/btb/btb_ittage.hh
  • src/cpu/pred/btb/btb_tage.cc
  • src/cpu/pred/btb/btb_tage.hh
  • src/cpu/pred/btb/common.hh
  • src/cpu/pred/btb/decoupled_bpred.cc
  • src/cpu/pred/btb/decoupled_bpred.hh
  • src/cpu/pred/btb/ftq.cc
  • src/cpu/pred/btb/ftq.hh
  • src/cpu/pred/btb/mbtb.cc
  • src/cpu/pred/btb/mbtb.hh
  • src/cpu/pred/btb/test/btb_tage.test.cc
  • src/cpu/pred/btb/timed_base_pred.hh

Comment on lines +1559 to +1562
// Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend
ThreadID tid = *activeThreads->begin();
toFetch->iewInfo[tid].resolvedCFIs.clear();
toFetch->iewInfo[tid].resolveTrainEntries.clear();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clear the IEW→Fetch resolve buffers for every thread.

Line 1560 only clears *activeThreads->begin(). On SMT, resolvedCFIs and resolveTrainEntries for the other thread slots survive into the next tick and can be emitted again as stale resolve/train traffic. Clear these vectors for all thread slots before execute starts.

Proposed fix
-    ThreadID tid = *activeThreads->begin();
-    toFetch->iewInfo[tid].resolvedCFIs.clear();
-    toFetch->iewInfo[tid].resolveTrainEntries.clear();
+    for (ThreadID tid = 0; tid < numThreads; ++tid) {
+        toFetch->iewInfo[tid].resolvedCFIs.clear();
+        toFetch->iewInfo[tid].resolveTrainEntries.clear();
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend
ThreadID tid = *activeThreads->begin();
toFetch->iewInfo[tid].resolvedCFIs.clear();
toFetch->iewInfo[tid].resolveTrainEntries.clear();
// Clear resolvedFSQId and resolvedInstPC since they are already handled in frontend
for (ThreadID tid = 0; tid < numThreads; ++tid) {
toFetch->iewInfo[tid].resolvedCFIs.clear();
toFetch->iewInfo[tid].resolveTrainEntries.clear();
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 1559 - 1562, The code only clears
resolvedCFIs and resolveTrainEntries for the first active thread (using ThreadID
tid = *activeThreads->begin()), which lets other SMT thread slots retain stale
entries; update the logic to iterate over all thread IDs (or all entries in
toFetch->iewInfo corresponding to the thread set) and clear both
toFetch->iewInfo[thread].resolvedCFIs and
toFetch->iewInfo[thread].resolveTrainEntries for every thread before execute
starts so no per-thread resolve/train buffers survive into the next tick; use
the existing activeThreads or the keys/indexes of toFetch->iewInfo to locate
each thread slot.

Comment on lines +47 to +77
uint8_t lastOffset = 0;
Addr lastPc = 0;
bool firstBranch = true;
bool seenTaken = false;
for (const auto &resolved : packet.realBranches) {
if (resolved.branch.pc < packet.startPC) {
return false;
}

if (resolved.branch.size == 0) {
return false;
}

if (seenTaken) {
return false;
}

if (!firstBranch) {
if (resolved.ftqOffset < lastOffset) {
return false;
}
if (resolved.ftqOffset == lastOffset && resolved.branch.pc <= lastPc) {
return false;
}
}

lastOffset = resolved.ftqOffset;
lastPc = resolved.branch.pc;
seenTaken = resolved.taken;
firstBranch = false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject packets that continue after a mispredicted branch.

A valid packet should be terminal once any earlier ResolvedBranch has mispredict == true: IEW::SquashCheckAfterExe() stops recording younger control instructions after an older mispredict has armed squash state. The current validator only fences on taken, so {not-taken, mispredict}, later branch... still passes and can train wrong-path state if packet assembly goes wrong.

Proposed fix
-    bool seenTaken = false;
+    bool seenTaken = false;
+    bool seenMispredict = false;
     for (const auto &resolved : packet.realBranches) {
+        if (seenTaken || seenMispredict) {
+            return false;
+        }
+
         if (resolved.branch.pc < packet.startPC) {
             return false;
         }
 
         if (resolved.branch.size == 0) {
             return false;
         }
-
-        if (seenTaken) {
-            return false;
-        }
 
         if (!firstBranch) {
             if (resolved.ftqOffset < lastOffset) {
                 return false;
             }
@@
         lastOffset = resolved.ftqOffset;
         lastPc = resolved.branch.pc;
         seenTaken = resolved.taken;
+        seenMispredict = resolved.mispredict;
         firstBranch = false;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uint8_t lastOffset = 0;
Addr lastPc = 0;
bool firstBranch = true;
bool seenTaken = false;
for (const auto &resolved : packet.realBranches) {
if (resolved.branch.pc < packet.startPC) {
return false;
}
if (resolved.branch.size == 0) {
return false;
}
if (seenTaken) {
return false;
}
if (!firstBranch) {
if (resolved.ftqOffset < lastOffset) {
return false;
}
if (resolved.ftqOffset == lastOffset && resolved.branch.pc <= lastPc) {
return false;
}
}
lastOffset = resolved.ftqOffset;
lastPc = resolved.branch.pc;
seenTaken = resolved.taken;
firstBranch = false;
}
uint8_t lastOffset = 0;
Addr lastPc = 0;
bool firstBranch = true;
bool seenTaken = false;
bool seenMispredict = false;
for (const auto &resolved : packet.realBranches) {
if (seenTaken || seenMispredict) {
return false;
}
if (resolved.branch.pc < packet.startPC) {
return false;
}
if (resolved.branch.size == 0) {
return false;
}
if (!firstBranch) {
if (resolved.ftqOffset < lastOffset) {
return false;
}
if (resolved.ftqOffset == lastOffset && resolved.branch.pc <= lastPc) {
return false;
}
}
lastOffset = resolved.ftqOffset;
lastPc = resolved.branch.pc;
seenTaken = resolved.taken;
seenMispredict = resolved.mispredict;
firstBranch = false;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 47 - 77, The loop currently
only treats a taken branch as terminal (seenTaken) and allows later branches
after a mispredicted branch, which can train wrong-path state; update the
validator to treat any resolved.mispredict as terminal as well. Concretely,
change the logic that uses seenTaken to a terminal flag (e.g., seenTerminal) and
set it when resolved.taken || resolved.mispredict is true; keep the early-return
at loop top (if seenTerminal) so any branch after a mispredict or taken causes
the packet to be rejected. Ensure you reference packet.realBranches and the
ResolvedBranch fields (taken and mispredict) and preserve the existing
ordering/ftqOffset and pc checks (firstBranch, lastOffset, lastPc).

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docs/Gem5_Docs/frontend/full_resolve_train_review_guide.md (1)

56-61: Clarify the dual-mode behavior for migrated components.

The statement "It is not a double-update mode for migrated components" assumes the reader knows what "double-update mode" means. Consider being more explicit about what actually happens.

📝 Suggested clarification
 This means:
 
 - migrated components use the full packet path
 - non-migrated `resolvedUpdate` components still keep legacy fallback
 
-It is not a double-update mode for migrated components.
+Migrated components (MBTB, BTBTAGE, BTBITTAGE) use ONLY the new packet path and 
+skip legacy resolved-update processing entirely when enableFullResolveTrain is True.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Gem5_Docs/frontend/full_resolve_train_review_guide.md` around lines 56 -
61, Clarify the sentence about dual-mode by explicitly describing what happens
for migrated components: state that migrated components use the full packet path
and therefore only receive a single consolidated update (no legacy fallback),
whereas non-migrated resolvedUpdate components continue to use the legacy
fallback path; replace "It is not a double-update mode for migrated components"
with a concrete sentence such as "Migrated components receive a single update
via the full packet path (they do not also receive the legacy fallback), while
non-migrated resolvedUpdate components retain the legacy fallback behavior."
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/Gem5_Docs/frontend/full_resolve_train_review_guide.md`:
- Around line 56-61: Clarify the sentence about dual-mode by explicitly
describing what happens for migrated components: state that migrated components
use the full packet path and therefore only receive a single consolidated update
(no legacy fallback), whereas non-migrated resolvedUpdate components continue to
use the legacy fallback path; replace "It is not a double-update mode for
migrated components" with a concrete sentence such as "Migrated components
receive a single update via the full packet path (they do not also receive the
legacy fallback), while non-migrated resolvedUpdate components retain the legacy
fallback behavior."

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7bda24cd-e7ba-4bb1-8294-90ce2e32f308

📥 Commits

Reviewing files that changed from the base of the PR and between 1922023 and a835592.

📒 Files selected for processing (1)
  • docs/Gem5_Docs/frontend/full_resolve_train_review_guide.md

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.1686 📉 -0.0979 (-4.32%)

✅ Difftest smoke test passed!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
src/cpu/o3/fetch.cc (2)

1518-1521: ⚠️ Potential issue | 🟡 Minor

Underflow risk in legacy queue headroom calculation.

When resolveQueueSize is 1..3, the subtraction resolveQueueSize - 4 wraps to a large value (unsigned arithmetic), so the queue-full condition never triggers and the legacy fallback can grow unbounded.

🛠️ Suggested fix
-        if (resolveQueueSize && resolveQueue.size() > resolveQueueSize - 4) {
+        const size_t reserve = std::min<size_t>(resolveQueueSize, 4);
+        if (resolveQueueSize &&
+            resolveQueue.size() + reserve > resolveQueueSize) {
             fetchStats.resolveQueueFullEvents++;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1518 - 1521, The check uses unsigned
arithmetic `resolveQueueSize - 4` which underflows when resolveQueueSize is
1..3; update the condition in fetch.cc so it computes a safe threshold before
comparing (e.g., compute threshold = (resolveQueueSize > 4 ? resolveQueueSize -
4 : 0) or use std::max with 4) and then test `resolveQueue.size() > threshold`;
keep the existing updates to fetchStats.resolveQueueFullEvents and
fetchStats.resolveEnqueueFailEvent (using enqueueSize) unchanged.

1887-1892: ⚠️ Potential issue | 🟠 Major

FTQ id pinned to thread 0 while generation/offset use tid.

Line 1887 uses ftqHeadId(0) while lines 1888-1892 derive generation and offset from ftqFetchingTarget(tid). For non-zero tids, this creates inconsistent (ftqId, generation, ftqOffset) tuples that will fail identity matching in the resolve-train path.

🛠️ Suggested fix
-    instruction->setFtqId(dbpbtb->ftqHeadId(0));
+    instruction->setFtqId(dbpbtb->ftqHeadId(tid));
     const auto &fetch_target = dbpbtb->ftqFetchingTarget(tid);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1887 - 1892, The FTQ id is incorrectly
hardcoded to thread 0 which mismatches the generation/offset derived from
dbpbtb->ftqFetchingTarget(tid); update the code that calls
instruction->setFtqId(dbpbtb->ftqHeadId(0)) to use the current thread id
(dbpbtb->ftqHeadId(tid)) so the tuple (ftqId, generation, ftqOffset) produced by
instruction->setFtqId, instruction->setFtqGeneration(fetch_target.generation)
and instruction->setFtqOffset(...) is consistent with fetch_target and tid (see
fetch_target, this_pc.instAddr(), instShiftAmt).
src/cpu/pred/btb/mbtb.cc (1)

508-514: ⚠️ Potential issue | 🟡 Minor

Struct equality includes resolved flag, causing false misses.

The comparison taken_branch->branch == e uses full struct equality. Since the packet sets resolved = true while meta->hit_entries retain prediction-time values, this will incorrectly classify hits as misses.

🛠️ Suggested fix: Compare by PC instead
     bool pred_branch_hit = false;
     for (const auto &e : meta->hit_entries) {
-        if (taken_branch->branch == e) {
+        if (taken_branch->branch.pc == e.pc) {
             pred_branch_hit = true;
             break;
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/mbtb.cc` around lines 508 - 514, The loop computing
pred_branch_hit wrongly uses full struct equality between taken_branch->branch
and entries in meta->hit_entries (which includes the resolved flag), causing
false misses; update the comparison in the loop to compare only the branch PC
(e.g., compare taken_branch->branch.pc to e.pc) (or another stable identifier
field used for matching) instead of using operator== so that resolved
differences do not affect hit detection for pred_branch_hit.
🧹 Nitpick comments (1)
src/cpu/pred/btb/mbtb.cc (1)

871-874: Unused predicted_hits set.

The unordered_set<Addr> is populated but never queried. Consider removing it or using it for the lookup instead of std::find_if.

♻️ Option 1: Remove unused variable
 std::vector<MBTB::ResolveTrainUpdate>
 MBTB::prepareResolveTrainEntries(const ResolvedTrainPacket &packet,
                                  const BTBMeta *meta)
 {
     std::vector<ResolveTrainUpdate> updates;
-    std::unordered_set<Addr> predicted_hits;
-    for (const auto &entry : meta->hit_entries) {
-        predicted_hits.insert(entry.pc);
-    }

     for (const auto &resolved : packet.realBranches) {
♻️ Option 2: Use set for O(1) lookup
+    std::unordered_map<Addr, const BTBEntry*> pc_to_entry;
+    for (const auto &entry : meta->hit_entries) {
+        pc_to_entry[entry.pc] = &entry;
+    }
-    std::unordered_set<Addr> predicted_hits;
-    for (const auto &entry : meta->hit_entries) {
-        predicted_hits.insert(entry.pc);
-    }

     for (const auto &resolved : packet.realBranches) {
-        auto hit_it = std::find_if(meta->hit_entries.begin(), meta->hit_entries.end(),
-            [&resolved](const BTBEntry &entry) {
-                return entry.pc == resolved.branch.pc;
-            });
-        if (hit_it != meta->hit_entries.end()) {
-            updates.push_back({*hit_it, resolved});
+        auto it = pc_to_entry.find(resolved.branch.pc);
+        if (it != pc_to_entry.end()) {
+            updates.push_back({*(it->second), resolved});
             continue;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/mbtb.cc` around lines 871 - 874, The populated but unused
unordered_set predicted_hits should be used for O(1) lookups instead of the
current std::find_if; replace the linear search over meta->hit_entries with a
membership test against predicted_hits (e.g., predicted_hits.count(some_pc) or
predicted_hits.contains(some_pc)) wherever the code uses find_if to check if an
entry's pc is present, and ensure predicted_hits is built from meta->hit_entries
(in the same scope where std::find_if is currently used) and remove the old
find_if-based branch if present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 1518-1521: The check uses unsigned arithmetic `resolveQueueSize -
4` which underflows when resolveQueueSize is 1..3; update the condition in
fetch.cc so it computes a safe threshold before comparing (e.g., compute
threshold = (resolveQueueSize > 4 ? resolveQueueSize - 4 : 0) or use std::max
with 4) and then test `resolveQueue.size() > threshold`; keep the existing
updates to fetchStats.resolveQueueFullEvents and
fetchStats.resolveEnqueueFailEvent (using enqueueSize) unchanged.
- Around line 1887-1892: The FTQ id is incorrectly hardcoded to thread 0 which
mismatches the generation/offset derived from dbpbtb->ftqFetchingTarget(tid);
update the code that calls instruction->setFtqId(dbpbtb->ftqHeadId(0)) to use
the current thread id (dbpbtb->ftqHeadId(tid)) so the tuple (ftqId, generation,
ftqOffset) produced by instruction->setFtqId,
instruction->setFtqGeneration(fetch_target.generation) and
instruction->setFtqOffset(...) is consistent with fetch_target and tid (see
fetch_target, this_pc.instAddr(), instShiftAmt).

In `@src/cpu/pred/btb/mbtb.cc`:
- Around line 508-514: The loop computing pred_branch_hit wrongly uses full
struct equality between taken_branch->branch and entries in meta->hit_entries
(which includes the resolved flag), causing false misses; update the comparison
in the loop to compare only the branch PC (e.g., compare taken_branch->branch.pc
to e.pc) (or another stable identifier field used for matching) instead of using
operator== so that resolved differences do not affect hit detection for
pred_branch_hit.

---

Nitpick comments:
In `@src/cpu/pred/btb/mbtb.cc`:
- Around line 871-874: The populated but unused unordered_set predicted_hits
should be used for O(1) lookups instead of the current std::find_if; replace the
linear search over meta->hit_entries with a membership test against
predicted_hits (e.g., predicted_hits.count(some_pc) or
predicted_hits.contains(some_pc)) wherever the code uses find_if to check if an
entry's pc is present, and ensure predicted_hits is built from meta->hit_entries
(in the same scope where std::find_if is currently used) and remove the old
find_if-based branch if present.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9200612-317b-45b7-957b-a08aa0171b88

📥 Commits

Reviewing files that changed from the base of the PR and between a835592 and 66148cb.

📒 Files selected for processing (11)
  • src/cpu/o3/fetch.cc
  • src/cpu/o3/fetch.hh
  • src/cpu/pred/btb/btb_ittage.cc
  • src/cpu/pred/btb/btb_ittage.hh
  • src/cpu/pred/btb/btb_tage.cc
  • src/cpu/pred/btb/btb_tage.hh
  • src/cpu/pred/btb/common.hh
  • src/cpu/pred/btb/decoupled_bpred.cc
  • src/cpu/pred/btb/mbtb.cc
  • src/cpu/pred/btb/mbtb.hh
  • src/cpu/pred/btb/timed_base_pred.hh
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cpu/pred/btb/timed_base_pred.hh

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.1980 📉 -0.0685 (-3.02%)

✅ Difftest smoke test passed!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/pred/btb/test/btb_tage.test.cc`:
- Around line 408-438: The lookup helpers (findTableWithEntry and
findTablesWithEntry) currently treat any valid entry with matching pc as a hit,
but because indices are history-folded you must also verify the tag for the
current startPC/history; compute the expected tag for table t (e.g. via
tage->getTageTag(startPC, t, meta->tagFoldedHist[t].get()) or the equivalent
tag-fold API on BTBTAGE) and require entry.tag == expectedTag in the
if-condition in both functions (i.e., change the check from entry.valid &&
entry.pc == branchPC to entry.valid && entry.pc == branchPC && entry.tag ==
expectedTag) so the helpers only count true context-matching entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63ec0b7f-6913-4faf-9f22-97615eea7c19

📥 Commits

Reviewing files that changed from the base of the PR and between 66148cb and ee45622.

📒 Files selected for processing (4)
  • src/cpu/o3/fetch.cc
  • src/cpu/pred/btb/btb_tage.cc
  • src/cpu/pred/btb/btb_tage.hh
  • src/cpu/pred/btb/test/btb_tage.test.cc
✅ Files skipped from review due to trivial changes (1)
  • src/cpu/pred/btb/btb_tage.cc

Comment on lines +408 to +438
int findTableWithEntry(BTBTAGE* tage, Addr startPC, Addr branchPC,
const std::shared_ptr<BTBTAGE::TageMeta>& meta) {
for (int t = 0; t < tage->numPredictors; t++) {
Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());
for (unsigned way = 0; way < tage->numWays[t]; way++) {
auto &entry = tage->tageTable[t][index][way];
if (entry.valid && entry.pc == branchPC) {
return t;
}
}
}
return -1;
}

std::vector<int> findTablesWithEntry(
BTBTAGE* tage, Addr startPC, Addr branchPC,
const std::shared_ptr<BTBTAGE::TageMeta>& meta)
{
std::vector<int> tables;
for (int t = 0; t < tage->numPredictors; t++) {
Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());
for (unsigned way = 0; way < tage->numWays[t]; way++) {
auto &entry = tage->tageTable[t][index][way];
if (entry.valid && entry.pc == branchPC) {
tables.push_back(t);
break;
}
}
}
return tables;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Match the expected TAGE tag in these lookup helpers.

These helpers currently treat any valid && entry.pc == branchPC way in the indexed set as a hit. Because the set index is history-folded, an older context for the same branch can alias into the same set with a different tag, so the parity check can pass even when the current context would not actually hit that entry. Please include the expected tag for the current startPC/history/position before counting a table as present, and mirror that in both helpers.

🔎 Suggested tightening
 int findTableWithEntry(BTBTAGE* tage, Addr startPC, Addr branchPC,
                        const std::shared_ptr<BTBTAGE::TageMeta>& meta) {
+    unsigned position = tage->getBranchIndexInBlock(branchPC, startPC);
     for (int t = 0; t < tage->numPredictors; t++) {
         Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());
+        Addr tag = tage->getTageTag(startPC, t, meta->tagFoldedHist[t].get(),
+                                    meta->altTagFoldedHist[t].get(), position);
         for (unsigned way = 0; way < tage->numWays[t]; way++) {
             auto &entry = tage->tageTable[t][index][way];
-            if (entry.valid && entry.pc == branchPC) {
+            if (entry.valid && entry.pc == branchPC && entry.tag == tag) {
                 return t;
             }
         }
     }
     return -1;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
int findTableWithEntry(BTBTAGE* tage, Addr startPC, Addr branchPC,
const std::shared_ptr<BTBTAGE::TageMeta>& meta) {
for (int t = 0; t < tage->numPredictors; t++) {
Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());
for (unsigned way = 0; way < tage->numWays[t]; way++) {
auto &entry = tage->tageTable[t][index][way];
if (entry.valid && entry.pc == branchPC) {
return t;
}
}
}
return -1;
}
std::vector<int> findTablesWithEntry(
BTBTAGE* tage, Addr startPC, Addr branchPC,
const std::shared_ptr<BTBTAGE::TageMeta>& meta)
{
std::vector<int> tables;
for (int t = 0; t < tage->numPredictors; t++) {
Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());
for (unsigned way = 0; way < tage->numWays[t]; way++) {
auto &entry = tage->tageTable[t][index][way];
if (entry.valid && entry.pc == branchPC) {
tables.push_back(t);
break;
}
}
}
return tables;
}
int findTableWithEntry(BTBTAGE* tage, Addr startPC, Addr branchPC,
const std::shared_ptr<BTBTAGE::TageMeta>& meta) {
unsigned position = tage->getBranchIndexInBlock(branchPC, startPC);
for (int t = 0; t < tage->numPredictors; t++) {
Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());
Addr tag = tage->getTageTag(startPC, t, meta->tagFoldedHist[t].get(),
meta->altTagFoldedHist[t].get(), position);
for (unsigned way = 0; way < tage->numWays[t]; way++) {
auto &entry = tage->tageTable[t][index][way];
if (entry.valid && entry.pc == branchPC && entry.tag == tag) {
return t;
}
}
}
return -1;
}
std::vector<int> findTablesWithEntry(
BTBTAGE* tage, Addr startPC, Addr branchPC,
const std::shared_ptr<BTBTAGE::TageMeta>& meta)
{
std::vector<int> tables;
for (int t = 0; t < tage->numPredictors; t++) {
Addr index = tage->getTageIndex(startPC, t, meta->indexFoldedHist[t].get());
for (unsigned way = 0; way < tage->numWays[t]; way++) {
auto &entry = tage->tageTable[t][index][way];
if (entry.valid && entry.pc == branchPC) {
tables.push_back(t);
break;
}
}
}
return tables;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/test/btb_tage.test.cc` around lines 408 - 438, The lookup
helpers (findTableWithEntry and findTablesWithEntry) currently treat any valid
entry with matching pc as a hit, but because indices are history-folded you must
also verify the tag for the current startPC/history; compute the expected tag
for table t (e.g. via tage->getTageTag(startPC, t, meta->tagFoldedHist[t].get())
or the equivalent tag-fold API on BTBTAGE) and require entry.tag == expectedTag
in the if-condition in both functions (i.e., change the check from entry.valid
&& entry.pc == branchPC to entry.valid && entry.pc == branchPC && entry.tag ==
expectedTag) so the helpers only count true context-matching entries.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
src/cpu/o3/fetch.cc (1)

1528-1536: ⚠️ Potential issue | 🟡 Minor

Clamp the legacy queue headroom calculation to prevent underflow.

When resolveQueueSize is 1, 2, or 3, the expression resolveQueueSize - 4 wraps to a very large unsigned value, causing the queue-full check to never trigger. This allows the legacy resolve queue to grow unbounded in small configurations.

🛠️ Suggested fix
-        if (resolveQueueSize && resolveQueue.size() > resolveQueueSize - 4) {
+        const size_t headroom = std::min<size_t>(resolveQueueSize, 4);
+        if (resolveQueueSize && resolveQueue.size() + headroom > resolveQueueSize) {
             fetchStats.resolveQueueFullEvents++;
             fetchStats.resolveEnqueueFailEvent += enqueueSize;
         } else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1528 - 1536, The queue-full check in
enableLegacyResolveUpdate can underflow because it computes resolveQueueSize - 4
as an unsigned value; change the logic in the block using resolveQueueSize,
resolveQueue and enqueueSize (inside enableLegacyResolveUpdate) to first compute
a safe threshold (e.g., threshold = resolveQueueSize > 4 ? resolveQueueSize - 4
: 0) or explicitly handle resolveQueueSize <= 4, then compare
resolveQueue.size() > threshold before incrementing
fetchStats.resolveQueueFullEvents and fetchStats.resolveEnqueueFailEvent so
small resolveQueueSize values do not wrap and allow unbounded growth.
src/cpu/pred/btb/decoupled_bpred.cc (2)

645-663: ⚠️ Potential issue | 🟠 Major

Skip microtage in the legacy resolve-update path when enableFullResolveTrain is active.

microtage is also a BTBTAGE instance. If BTBTAGE::resolveTrain() handles packet-trained conditionals, leaving microtage on the legacy resolve-update path means it gets trained twice when enableFullResolveTrain is enabled.

🛠️ Suggested fix
             if (enableFullResolveTrain &&
                 (components[i] == mbtb || components[i] == tage ||
-                 components[i] == ittage)) {
+                 components[i] == ittage || components[i] == microtage)) {
                 continue;
             }

Apply the same change to the second loop (around lines 659-662).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 645 - 663, The second-phase
loop that applies resolved updates skips mbtb, tage, and ittage when
enableFullResolveTrain is true but misses microtage, causing double-training;
inside the for-loop that checks components[i]->getResolvedUpdate(), extend the
existing conditional (which currently checks components[i] == mbtb ||
components[i] == tage || components[i] == ittage) to also include components[i]
== microtage so microtage is skipped in the legacy resolve-update path when
enableFullResolveTrain is active.

24-65: ⚠️ Potential issue | 🟠 Major

Add misprediction as a terminal condition in packet validation.

The validation loop tracks seenTaken to reject packets with branches after a taken branch, but doesn't track mispredictions. When a branch mispredicts (even if not taken), IEW::SquashCheckAfterExe() stops recording younger control instructions. Allowing branches after a mispredicted branch could lead to training on wrong-path state.

🐛 Proposed fix
 DecoupledBPUWithBTB::validateResolvedTrainPacket(const ResolvedTrainPacket &packet)
 {
     uint8_t lastOffset = 0;
     Addr lastPc = 0;
     bool firstBranch = true;
     bool seenTaken = false;
+    bool seenMispredict = false;
     for (const auto &resolved : packet.realBranches) {
+        if (seenTaken || seenMispredict) {
+            dbpBtbStats.fullResolveTrainValidationAfterTaken++;
+            return ResolveTrainValidationReason::AfterTaken;
+        }
+
         if (resolved.branch.pc < packet.startPC) {
             dbpBtbStats.fullResolveTrainValidationPcBeforeStart++;
             return ResolveTrainValidationReason::PcBeforeStart;
         }
 
         if (resolved.branch.size == 0) {
             dbpBtbStats.fullResolveTrainValidationZeroSize++;
             return ResolveTrainValidationReason::ZeroSize;
         }
 
-        if (seenTaken) {
-            dbpBtbStats.fullResolveTrainValidationAfterTaken++;
-            return ResolveTrainValidationReason::AfterTaken;
-        }
-
         if (!firstBranch) {
             // ... offset/pc ordering checks ...
         }
 
         lastOffset = resolved.ftqOffset;
         lastPc = resolved.branch.pc;
         seenTaken = resolved.taken;
+        seenMispredict = resolved.mispredict;
         firstBranch = false;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 24 - 65, The loop in
validateResolvedTrainPacket (iterating packet.realBranches) treats any branch
after a taken branch as terminal via seenTaken but ignores mispredicted
branches; add a check similar to the seenTaken handling to treat a branch that
is mispredicted as terminal: detect the per-branch misprediction flag on
resolved (e.g., resolved.mispredicted or resolved.mispredict), increment a new
statistic (e.g., dbpBtbStats.fullResolveTrainValidationAfterMispredict++), and
return a new validation reason (e.g.,
ResolveTrainValidationReason::AfterMispredict); also add the new enum value to
ResolveTrainValidationReason and the corresponding stat field to dbpBtbStats so
the code compiles and the condition is handled before accepting the packet.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 1528-1536: The queue-full check in enableLegacyResolveUpdate can
underflow because it computes resolveQueueSize - 4 as an unsigned value; change
the logic in the block using resolveQueueSize, resolveQueue and enqueueSize
(inside enableLegacyResolveUpdate) to first compute a safe threshold (e.g.,
threshold = resolveQueueSize > 4 ? resolveQueueSize - 4 : 0) or explicitly
handle resolveQueueSize <= 4, then compare resolveQueue.size() > threshold
before incrementing fetchStats.resolveQueueFullEvents and
fetchStats.resolveEnqueueFailEvent so small resolveQueueSize values do not wrap
and allow unbounded growth.

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 645-663: The second-phase loop that applies resolved updates skips
mbtb, tage, and ittage when enableFullResolveTrain is true but misses microtage,
causing double-training; inside the for-loop that checks
components[i]->getResolvedUpdate(), extend the existing conditional (which
currently checks components[i] == mbtb || components[i] == tage || components[i]
== ittage) to also include components[i] == microtage so microtage is skipped in
the legacy resolve-update path when enableFullResolveTrain is active.
- Around line 24-65: The loop in validateResolvedTrainPacket (iterating
packet.realBranches) treats any branch after a taken branch as terminal via
seenTaken but ignores mispredicted branches; add a check similar to the
seenTaken handling to treat a branch that is mispredicted as terminal: detect
the per-branch misprediction flag on resolved (e.g., resolved.mispredicted or
resolved.mispredict), increment a new statistic (e.g.,
dbpBtbStats.fullResolveTrainValidationAfterMispredict++), and return a new
validation reason (e.g., ResolveTrainValidationReason::AfterMispredict); also
add the new enum value to ResolveTrainValidationReason and the corresponding
stat field to dbpBtbStats so the code compiles and the condition is handled
before accepting the packet.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a04be19-a090-4a15-87a9-f2a20ad1c152

📥 Commits

Reviewing files that changed from the base of the PR and between ee45622 and 7a06f85.

📒 Files selected for processing (4)
  • src/cpu/o3/fetch.cc
  • src/cpu/pred/btb/decoupled_bpred.cc
  • src/cpu/pred/btb/decoupled_bpred.hh
  • src/cpu/pred/btb/decoupled_bpred_stats.cc

@Yakkhini Yakkhini added the perf label Apr 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🚀 Performance test triggered: gcc12-spec06-0.8c

@Yakkhini Yakkhini force-pushed the full-resolve-train branch from a23ce3f to dd99cb2 Compare April 8, 2026 08:41
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.3658 📈 +0.0993 (+4.38%)

✅ Difftest smoke test passed!

2 similar comments
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.3658 📈 +0.0993 (+4.38%)

✅ Difftest smoke test passed!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.3658 📈 +0.0993 (+4.38%)

✅ Difftest smoke test passed!

void
Fetch::handleIEWSignals()
{
if (lastIewSignalHandleTick == curTick()) {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

@Yakkhini Yakkhini force-pushed the full-resolve-train branch from ca14ae6 to 6388b8d Compare April 9, 2026 03:30
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.3658 📈 +0.0993 (+4.38%)

✅ Difftest smoke test passed!

@Yakkhini Yakkhini force-pushed the full-resolve-train branch from 6388b8d to 08425fc Compare April 9, 2026 03:47
@Yakkhini Yakkhini added perf and removed perf labels Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🚀 Performance test triggered: gcc12-spec06-0.8c

Yakkhini added 7 commits April 9, 2026 11:48
Change-Id: Id3ddfead4fd87cf04c650f368f6cfb75ff19b81b
Move MBTB, BTBTAGE, and BTBITTAGE to packet-based resolve training so they consume execution truth directly while preserving legacy fallback for non-migrated components.

Change-Id: I4682438d18796127542b9c9c6a23efcb1415a413
Enable packet-based resolve training by default while keeping legacy resolve updates enabled as fallback until the remaining components are migrated.

Change-Id: I124f9657b46ad9e19bb0bf81f3d3ec85592d25ab
Add a review guide for the full resolve train branch covering the new dataflow, migrated components, default behavior, and the main files and invariants reviewers should check.

Change-Id: Idfa11844ec1b46f59339e589bfd88c0d17d3e47f
Change-Id: I736783b6b3bb469c7a02d060e9317ee24879d31a
Change-Id: Icda8729682a3d1112280dbc1a3b120e922bc830b
Change-Id: I2ca72fa9dadeca3ed0b369ba2ce21c7664768753
Yakkhini added 6 commits April 9, 2026 11:48
Collapse resolved-stage BTB training onto the packet-based full resolve path, remove the old resolvedCFI/resolveQueue helper chain, and reinterpret config intent around full-resolve participation versus commit fallback.

Change-Id: I777f46178bc7b1ce891bb2242cefff0b166e48a2
Change-Id: Icbf82e480f0fec8c1119c36851c9f6367835c745
Change-Id: I371e8e306f55cb7eca8991e77db53f8825d10874
Change-Id: If72220a598a23ca2aad8fe50a390b80fdce11a7b
Change-Id: I16d6b8d495ef16e9e20f4b5ca5a8262484674ae6
Change-Id: I6f1de90ea368558913dc913593ac0f193f54976b
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.3658 📈 +0.0993 (+4.38%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 08425fc
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 20.69 20.69 -0.02 🔴

@Yakkhini
Copy link
Copy Markdown
Collaborator Author

Yakkhini commented Apr 9, 2026

It does not appear that disabling the resolve train causes any significant performance degradation. However, this PR still introduces a substantial increase in source code size, even after cleanup and simplification, mainly due to two features: packet rebuild in fetch and the split resolve train interface in the affected predictors, which no longer reuses update().

I now want to run the 0.8 coverage test with the resolve train enabled to determine whether there is any performance issue in the new resolve train path.

https://github.com/OpenXiangShan/GEM5/actions/runs/24176533553

@Yakkhini Yakkhini added perf and removed perf labels Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🚀 Performance test triggered: gcc12-spec06-0.8c

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 21e95cb
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 20.48 21.69 -5.56 🔴

[Generated by GEM5 Performance Robot]
commit: 21e95cb
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 20.48 20.69 -0.99 🔴

@Yakkhini
Copy link
Copy Markdown
Collaborator Author

We never evaluate resolve train on ideal config before. Although there has about 0.2 score decrease appears on this resolve train implementation, but it still better than legacy one (0.3 score decrease).

Ref: #823 (comment)

Besides, there is also about 0.2 score increase on kmhv3 config (16.9 - 17.1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants