Skip to content

Train ITTAGE at resolve stage#630

Merged
Yakkhini merged 4 commits intoxs-devfrom
ittage-align
Dec 5, 2025
Merged

Train ITTAGE at resolve stage#630
Yakkhini merged 4 commits intoxs-devfrom
ittage-align

Conversation

@Yakkhini
Copy link
Copy Markdown
Collaborator

@Yakkhini Yakkhini commented Dec 3, 2025

Summary by CodeRabbit

  • Chores
    • Updated branch prediction system configuration and component enablement handling.
    • Enhanced prediction update filtering with improved safety checks to ensure reliable operation under various conditions.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 3, 2025

Walkthrough

The changes enable the ittage branch predictor in the KMHV3 configuration and modify BTB update logic in btb_ittage.cc and decoupled_bpred.cc to conditionally filter entries based on resolution status and component enablement flags.

Changes

Cohort / File(s) Change Summary
Configuration update
configs/example/kmhv3.py
Enabled ittage branch predictor (changed from False to True) and added cpu.branchPred.ittage.resolvedUpdate = True to enable resolution tracking for ittage, alongside existing mbtb and tage resolvedUpdate assignments.
BTB update filtering
src/cpu/pred/btb/btb_ittage.cc
Modified BTBITTAGE::update to conditionally filter BTB entries: when getResolvedUpdate() is true, retains only entries that are indirect, non-return, and resolved; otherwise, preserves prior behavior of updating indirect non-return entries.
Update guard conditions
src/cpu/pred/btb/decoupled_bpred.cc
Added isEnabled checks to gate UBTB and ABTB updates; ABTB update now handles missing FTQ entries by defaulting previous_startpc to 0 when enabled, or skipping update when disabled.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • btb_ittage.cc: Requires careful review of the conditional filtering logic and verification that both paths (resolvedUpdate true/false) maintain intended behavior.
  • decoupled_bpred.cc: Multiple guard condition additions across UBTB and ABTB updates; verify FTQ entry fallback logic is correct.
  • kmhv3.py: Cross-reference config changes with corresponding code paths to ensure consistency.

Possibly related PRs

  • PR#612: Directly enables ittage predictor in KMHV3 config with similar parameter assignments.
  • PR#629: Modifies ABTB update behavior and gating logic in decoupled_bpred path.
  • PR#628: Adjusts ittage history and update handling in both btb_ittage and decoupled_bpred components.

Suggested labels

perf

Suggested reviewers

  • jensen-yan

Poem

🐰 With ittage now awake and enabled bright,
BTB filters dance left and right—
Guards and flags keep updates in sight,
Branch prediction tuned for speed and might,
Resolution tracked in morning light! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: enabling ITTAGE training at the resolve stage, which is reflected in the configuration updates and conditional resolution tracking logic across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ittage-align

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 427f3c0 and ea1a5b2.

📒 Files selected for processing (1)
  • src/cpu/pred/btb/decoupled_bpred.cc (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cpu/pred/btb/decoupled_bpred.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a221f6 and 493cb0d.

📒 Files selected for processing (1)
  • configs/example/kmhv3.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7846 📈 +0.0017 (+0.09%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 493cb0d
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.24 19.92 -23.47 🔴

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7921 📈 +0.0092 (+0.51%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 4a41c44
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.19 14.64 +3.81 🟢

[Generated by GEM5 Performance Robot]
commit: 4a41c44
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 15.19 15.24 -0.32 🔴

jensen-yan
jensen-yan previously approved these changes Dec 3, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_ittage.cc (1)

222-232: Resolved-update filtering looks sound; consider deduping the remove_if and double-check resolved plumbing.

The new gating on getResolvedUpdate() and e.resolved preserves the old behavior when resolvedUpdate is false and correctly restricts ITTAGE training to resolved, indirect, non-return entries when it is true. The main thing to watch is that all BTB entries you expect to train on—especially stream.updateNewBTBEntry in the miss case—have resolved set by the time update() runs, otherwise they’ll be silently dropped from all_entries_to_update.

You could also slightly simplify this block by using a single predicate that incorporates getResolvedUpdate() instead of two near-identical remove_if calls; that would reduce duplication and keep the filter logic in one place. The leading // // on the comment is likely a leftover and can be cleaned up as well.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a41c44 and 66aa284.

📒 Files selected for processing (1)
  • src/cpu/pred/btb/btb_ittage.cc (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/pred/btb/btb_ittage.cc (1)
src/cpu/pred/btb/decoupled_bpred.hh (10)
  • e (202-214)
  • e (202-202)
  • e (216-228)
  • e (216-216)
  • e (230-235)
  • e (230-230)
  • e (237-241)
  • e (237-237)
  • e (268-271)
  • e (268-268)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7747 📉 -0.0082 (-0.46%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 66aa284
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.36 19.89 -22.77 🔴

[Generated by GEM5 Performance Robot]
commit: 66aa284
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 15.36 15.19 +1.10 🟢

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cpu/pred/btb/btb_tage.cc (1)

396-404: Add a clarifying comment explaining why new entries are marked as resolved.

The change to set resolved = true on new BTB entries is correct and intentional. New entries reach this code path only when execution information is available (stream.exeTaken), and marking them as resolved enables them to participate in resolve-stage training for ITTAGE components when getResolvedUpdate() is true. This aligns with the PR objective to train ITTAGE at the resolve stage.

Add a comment to clarify this intent:

+        // Mark as resolved to enable resolve-stage training for ITTAGE
         potential_new_entry.resolved = true;
🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_tage.cc (1)

407-415: Consider reducing code duplication in the filtering logic.

The filtering logic has duplicated lambda predicates that differ only in the additional e.resolved check. This could be simplified for better maintainability.

Apply this diff to reduce duplication:

     // Filter: only keep conditional branches that are not always taken
+    auto base_filter = [](const BTBEntry &e) { return e.isCond && !e.alwaysTaken; };
     if (getResolvedUpdate()) {
         auto remove_it = std::remove_if(all_entries.begin(), all_entries.end(),
-            [](const BTBEntry &e) { return !(e.isCond && !e.alwaysTaken && e.resolved); });
+            [&base_filter](const BTBEntry &e) { return !(base_filter(e) && e.resolved); });
         all_entries.erase(remove_it, all_entries.end());
     } else {
         auto remove_it = std::remove_if(all_entries.begin(), all_entries.end(),
-            [](const BTBEntry &e) { return !(e.isCond && !e.alwaysTaken); });
+            [&base_filter](const BTBEntry &e) { return !base_filter(e); });
         all_entries.erase(remove_it, all_entries.end());
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66aa284 and 6d0f724.

📒 Files selected for processing (1)
  • src/cpu/pred/btb/btb_tage.cc (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7747 📉 -0.0082 (-0.46%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 6d0f724
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.23 19.89 -23.42 🔴

[Generated by GEM5 Performance Robot]
commit: 6d0f724
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 15.23 15.24 -0.08 🔴

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7830 📉 -0.0000 (-0.00%)

✅ Difftest smoke test passed!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d0f724 and e1c5b82.

📒 Files selected for processing (4)
  • src/cpu/pred/btb/abtb.cc (1 hunks)
  • src/cpu/pred/btb/decoupled_bpred.cc (2 hunks)
  • src/cpu/pred/btb/mbtb.cc (3 hunks)
  • src/cpu/pred/btb/stream_struct.hh (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/cpu/pred/btb/mbtb.cc (2)
src/cpu/pred/btb/mbtb.hh (4)
  • stream (127-127)
  • stream (163-163)
  • stream (171-171)
  • stream (281-282)
src/cpu/pred/btb/btb_ubtb.hh (2)
  • stream (133-133)
  • stream (139-139)
src/cpu/pred/btb/stream_struct.hh (1)
src/cpu/pred/loop_predictor.hh (1)
  • bi (238-238)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (5)
src/cpu/pred/btb/stream_struct.hh (2)

81-86: BranchInfo::resolved wiring looks consistent with resolve-stage training

The new BranchInfo::resolved field and its initialization in both constructors default to false, with explicit true assignment only at control squash sites. Operators still compare only pc, so existing equality semantics (per-static-branch) remain unchanged while the resolved bit is purely auxiliary for training logic. This aligns well with the resolve-update design.

Also applies to: 93-96, 99-109


447-453: BTB entry resolve marking matches new BranchInfo::resolved semantics

FetchStream::markBTBEntryResolved selectively sets entry.resolved = true for matching pc in updateBTBEntries. Since these are per-stream copies, this cleanly annotates only the entries used during resolve-stage update without polluting the underlying BTB arrays.

src/cpu/pred/btb/abtb.cc (1)

523-528: Clearing resolved on AheadBTB updates is consistent with transient resolve-state usage

Resetting entry_to_write.resolved to false before committing to the BTB keeps the ahead BTB arrays free of stale resolve annotations, while resolve information is carried only in per-stream copies (updateBTBEntries) for ITTAGE-style training. This matches the intended “ephemeral resolve bit” design.

src/cpu/pred/btb/decoupled_bpred.cc (1)

463-467: Resolve-stage flow for exeBranchInfo.resolved and per-entry resolved looks coherent, but verify call ordering

The new bits in this file establish a clear resolve-update contract:

  • On control squashes with a valid static_inst, handleSquash() now builds stream.exeBranchInfo with full type/target info and sets stream.exeBranchInfo.resolved = true.
  • markCFIResolved() tags matching entries in stream.updateBTBEntries by setting entry.resolved = true, so resolve-aware predictors can see which BTB entries in the stream have actually completed.
  • resolveUpdate() then calls update(stream) only on components with getResolvedUpdate()==true (e.g., ITTAGE) when stream.isHit || stream.exeTaken, and finally clears stream.exeBranchInfo.resolved to avoid double-training on later calls.

This wiring is logically sound, but the correctness hinges on external sequencing:

  • prepareResolveUpdateEntries(stream_id) must be called before markCFIResolved(stream_id, resolvedInstPC) so that updateBTBEntries is populated; otherwise markCFIResolved won’t actually mark anything.
  • Only components that are truly resolve-stage–trained should return getResolvedUpdate()==true; others should continue to rely on the existing commit-time updatePredictorComponents() path.

Given the performance regressions reported by the robot, please double-check those two assumptions in the callers (resolve path in the core) and that ITTAGE is the only predictor currently switched to the resolve-update path.

Also applies to: 615-636, 661-672

src/cpu/pred/btb/mbtb.cc (1)

432-467: MBTB resolved-bit handling for new vs. existing entries is internally consistent

In MBTB::getAndSetNewBTBEntry():

  • On a BTB miss with stream.exeTaken, you now initialize the new BTBEntry from stream.exeBranchInfo and explicitly copy stream.exeBranchInfo.resolved into entry_to_write.resolved before storing it in stream.updateNewBTBEntry.

In MBTB::buildUpdatedEntry():

  • Regardless of whether you start from an existing entry or the request entry, you now reset entry_to_write.resolved = false before updating counters/targets and writing back.

This achieves a clean separation:

  • Transient resolve information is carried on per-stream state (updateNewBTBEntry and updateBTBEntries) for resolve-stage predictors.
  • The physical MBTB arrays and victim cache never retain stale resolved state across updates.

The only thing to watch is timing: for new branches, getAndSetNewBTBEntry() must be called in the resolve-stage preparation path before resolveUpdate() clears stream.exeBranchInfo.resolved, otherwise updateNewBTBEntry.resolved will always be false and ITTAGE won’t see resolved new-branch allocations as intended. Please confirm that the core’s resolve callback orders prepareResolveUpdateEntries(), markCFIResolved(), and resolveUpdate() accordingly.

Also applies to: 584-595

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: e1c5b82
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.24 19.89 -23.35 🔴

[Generated by GEM5 Performance Robot]
commit: e1c5b82
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 15.24 15.23 +0.09 🟢

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_ittage.cc (1)

222-232: Resolved vs non‑resolved BTB update filter looks correct; consider tightening comment and duplication

The conditional filtering cleanly preserves old behavior when getResolvedUpdate() is false and adds the expected e.resolved requirement when it is true, so functionally this aligns with “train only on resolved indirect, non‑return branches”.

Two minor follow‑ups:

  • The comment // // only update indirect branches that are not returns is now slightly inaccurate for the resolved‑update path and is double‑commented. Consider clarifying it to mention the resolved requirement, or just drop it.
  • The two remove_if + erase blocks differ only by the extra && e.resolved. You could reduce duplication by capturing getResolvedUpdate() in a predicate and calling remove_if once, but that’s just a readability nit.

Given the performance regressions in the robot reports, it’s worth double‑checking that all intended training entries (especially stream.updateNewBTBEntry) have BTBEntry::resolved set when getResolvedUpdate() is true; otherwise they’ll be silently filtered out here.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1c5b82 and 2ac6ecb.

📒 Files selected for processing (2)
  • configs/example/kmhv3.py (1 hunks)
  • src/cpu/pred/btb/btb_ittage.cc (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • configs/example/kmhv3.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/pred/btb/btb_ittage.cc (1)
src/cpu/pred/btb/test/decoupled_bpred.hh (6)
  • e (153-165)
  • e (153-153)
  • e (167-179)
  • e (167-167)
  • e (181-186)
  • e (181-181)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7747 📉 -0.0082 (-0.46%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 2ac6ecb
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.24 19.89 -23.39 🔴

jensen-yan
jensen-yan previously approved these changes Dec 4, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_ittage.cc (1)

222-232: Update the comment to reflect conditional filtering behavior.

The implementation correctly introduces two-path conditional filtering based on getResolvedUpdate(). However, the comment on line 222 states "only update indirect branches that are not returns" which no longer fully describes the behavior when getResolvedUpdate() is true (where the resolved flag is also required).

Apply this diff to clarify the comment:

-    // // only update indirect branches that are not returns
     if (getResolvedUpdate()) {
+        // Only update indirect branches that are not returns AND are resolved
         auto remove_it =
             std::remove_if(all_entries_to_update.begin(), all_entries_to_update.end(),
                            [](const BTBEntry &e) { return !(e.isIndirect && !e.isReturn && e.resolved); });
         all_entries_to_update.erase(remove_it, all_entries_to_update.end());
     } else {
+        // Only update indirect branches that are not returns (original behavior)
         auto remove_it = std::remove_if(all_entries_to_update.begin(), all_entries_to_update.end(),
                                         [](const BTBEntry &e) { return !(e.isIndirect && !e.isReturn); });
         all_entries_to_update.erase(remove_it, all_entries_to_update.end());
     }

Based on PR comments, the automated performance tests show significant regressions (-22% to -23% vs master across multiple commits). While this change correctly implements the resolve-stage training feature, please verify that:

  1. The performance delta is expected for this feature
  2. Unit tests cover both code paths (resolvedUpdate true/false)
  3. The resolved flag is being set correctly on BTBEntry instances
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ac6ecb and a4c5c74.

📒 Files selected for processing (2)
  • configs/example/kmhv3.py (1 hunks)
  • src/cpu/pred/btb/btb_ittage.cc (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/cpu/pred/btb/btb_ittage.cc (1)
src/cpu/pred/btb/test/decoupled_bpred.hh (6)
  • e (153-165)
  • e (153-153)
  • e (167-179)
  • e (167-167)
  • e (181-186)
  • e (181-181)
configs/example/kmhv3.py (3)
src/cpu/pred/btb/timed_base_pred.hh (1)
  • resolvedUpdate (77-77)
src/cpu/pred/btb/test/abtb.test.cc (1)
  • abtb (59-67)
src/cpu/pred/btb/test/btb_tage.test.cc (1)
  • tage (276-282)
🔇 Additional comments (1)
configs/example/kmhv3.py (1)

100-100: LGTM! Configuration correctly enables resolve-stage training for ITTAGE.

The configuration changes properly enable ITTAGE with resolve-stage training, aligning with the PR objective and matching the pattern used for mbtb and tage predictors. The past review comment requesting this change has been appropriately addressed after establishing a baseline.

However, the automated performance reports in the PR comments show consistent regressions of -22% to -23% versus master across multiple commits. Please verify whether:

  1. This performance impact is expected for resolve-stage training
  2. The regression is specific to the test workload or affects general performance
  3. Any tuning or optimization opportunities exist to mitigate the impact

Also applies to: 107-107

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7747 📉 -0.0082 (-0.46%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: a4c5c74
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.37 19.83 -22.49 🔴

Change-Id: I6a01e4ecb12119e1d2040eff44f28ceeefb3886d
Change-Id: Ie7812db8bc256535f019bd59e36d5688312122ef
Change-Id: I13a7033fce6096dd2ad7a5e3c58f935486b28a63
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_ittage.cc (1)

222-232: Consider consolidating the duplicate filter logic.

The two branches share nearly identical code with only the lambda condition differing. You could refactor to reduce duplication:

-    // // only update indirect branches that are not returns
-    if (getResolvedUpdate()) {
-        auto remove_it =
-            std::remove_if(all_entries_to_update.begin(), all_entries_to_update.end(),
-                           [](const BTBEntry &e) { return !(e.isIndirect && !e.isReturn && e.resolved); });
-        all_entries_to_update.erase(remove_it, all_entries_to_update.end());
-    } else {
-        auto remove_it = std::remove_if(all_entries_to_update.begin(), all_entries_to_update.end(),
-                                        [](const BTBEntry &e) { return !(e.isIndirect && !e.isReturn); });
-        all_entries_to_update.erase(remove_it, all_entries_to_update.end());
-    }
+    // only update indirect branches that are not returns
+    bool resolvedUpdate = getResolvedUpdate();
+    auto remove_it = std::remove_if(
+        all_entries_to_update.begin(), all_entries_to_update.end(),
+        [resolvedUpdate](const BTBEntry &e) {
+            return !(e.isIndirect && !e.isReturn && (!resolvedUpdate || e.resolved));
+        });
+    all_entries_to_update.erase(remove_it, all_entries_to_update.end());

Also, line 222 has a double comment prefix (// //) that should be cleaned up.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4c5c74 and d97449c.

📒 Files selected for processing (3)
  • configs/example/kmhv3.py (1 hunks)
  • src/cpu/pred/BranchPredictor.py (1 hunks)
  • src/cpu/pred/btb/btb_ittage.cc (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • configs/example/kmhv3.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/pred/btb/btb_ittage.cc (2)
src/cpu/pred/btb/decoupled_bpred.hh (10)
  • e (202-214)
  • e (202-202)
  • e (216-228)
  • e (216-216)
  • e (230-235)
  • e (230-230)
  • e (237-241)
  • e (237-237)
  • e (268-271)
  • e (268-268)
src/cpu/pred/btb/test/decoupled_bpred.hh (6)
  • e (153-165)
  • e (153-153)
  • e (167-179)
  • e (167-167)
  • e (181-186)
  • e (181-181)
🪛 Ruff (0.14.7)
src/cpu/pred/BranchPredictor.py

1003-1003: Param may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (1)
src/cpu/pred/BranchPredictor.py (1)

1003-1003: LGTM!

The default value change for usingS3Pred from True to False aligns with the PR's objective of training ITTAGE at the resolve stage. The static analysis warning about Param is a false positive — it's correctly imported via the standard gem5 pattern from m5.params import * on line 29.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7747 📉 -0.0082 (-0.46%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: d97449c
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.31 19.83 -22.76 🔴

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/cpu/pred/btb/decoupled_bpred.cc (1)

252-261: Tighten ABTB S3 update gating and simplify previous-PC handling

The new it != fetchStreamQueue.end() check is good; it prevents dereferencing an invalid iterator when there is no previous stream entry. However, the current structure still calls abtb->updateUsingS3Pred(...) from the else branch even when abtb->isEnabled() is false. That’s a bit inconsistent with the constructor pattern where disabled components are generally not used, and it duplicates the updateUsingS3Pred call.

You can both (a) avoid touching ABTB when it’s disabled and (b) centralize previous-PC selection with something like:

-        auto it = fetchStreamQueue.find(fsqId-1);
-        if (it != fetchStreamQueue.end() && abtb->isEnabled()) {
-            auto previous_block_startpc = it->second.startPC;
-            abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], previous_block_startpc);
-        } else {
-            abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], 0);
-        }
+        if (abtb->isEnabled()) {
+            auto it = fetchStreamQueue.find(fsqId - 1);
+            Addr previous_block_startpc = 0;
+            if (it != fetchStreamQueue.end()) {
+                previous_block_startpc = it->second.startPC;
+            }
+            abtb->updateUsingS3Pred(
+                predsOfEachStage[numStages - 1], previous_block_startpc);
+        }

This keeps the “use 0 when no previous stream” behavior, but clearly skips ABTB training when the component is disabled and removes duplicated calls. If updateUsingS3Pred is intentionally callable when disabled, you can still use this pattern for clarity and to document that intent.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d97449c and 427f3c0.

📒 Files selected for processing (1)
  • src/cpu/pred/btb/decoupled_bpred.cc (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7747 📉 -0.0082 (-0.46%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 427f3c0
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.32 19.83 -22.74 🔴

[Generated by GEM5 Performance Robot]
commit: 427f3c0
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 15.32 15.31 +0.02 🟢

Change-Id: If8ad467ec69fc727ce177a942ae3ebc10eceafbe
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 5, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7747 📉 -0.0082 (-0.46%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: ea1a5b2
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.31 19.83 -22.78 🔴

@Yakkhini Yakkhini merged commit 4129e13 into xs-dev Dec 5, 2025
3 checks passed
@Yakkhini Yakkhini deleted the ittage-align branch December 5, 2025 03:46
@coderabbitai coderabbitai bot mentioned this pull request Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants