Conversation
WalkthroughThe changes enable the ittage branch predictor in the KMHV3 configuration and modify BTB update logic in btb_ittage.cc and decoupled_bpred.cc to conditionally filter entries based on resolution status and component enablement flags. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
configs/example/kmhv3.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_ittage.cc (1)
222-232: Resolved-update filtering looks sound; consider deduping theremove_ifand double-checkresolvedplumbing.The new gating on
getResolvedUpdate()ande.resolvedpreserves the old behavior whenresolvedUpdateis false and correctly restricts ITTAGE training to resolved, indirect, non-return entries when it is true. The main thing to watch is that all BTB entries you expect to train on—especiallystream.updateNewBTBEntryin the miss case—haveresolvedset by the timeupdate()runs, otherwise they’ll be silently dropped fromall_entries_to_update.You could also slightly simplify this block by using a single predicate that incorporates
getResolvedUpdate()instead of two near-identicalremove_ifcalls; that would reduce duplication and keep the filter logic in one place. The leading// //on the comment is likely a leftover and can be cleaned up as well.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cpu/pred/btb/btb_ittage.cc(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/pred/btb/btb_ittage.cc (1)
src/cpu/pred/btb/decoupled_bpred.hh (10)
e(202-214)e(202-202)e(216-228)e(216-216)e(230-235)e(230-230)e(237-241)e(237-237)e(268-271)e(268-268)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cpu/pred/btb/btb_tage.cc (1)
396-404: Add a clarifying comment explaining why new entries are marked as resolved.The change to set
resolved = trueon new BTB entries is correct and intentional. New entries reach this code path only when execution information is available (stream.exeTaken), and marking them as resolved enables them to participate in resolve-stage training for ITTAGE components whengetResolvedUpdate()is true. This aligns with the PR objective to train ITTAGE at the resolve stage.Add a comment to clarify this intent:
+ // Mark as resolved to enable resolve-stage training for ITTAGE potential_new_entry.resolved = true;
🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_tage.cc (1)
407-415: Consider reducing code duplication in the filtering logic.The filtering logic has duplicated lambda predicates that differ only in the additional
e.resolvedcheck. This could be simplified for better maintainability.Apply this diff to reduce duplication:
// Filter: only keep conditional branches that are not always taken + auto base_filter = [](const BTBEntry &e) { return e.isCond && !e.alwaysTaken; }; if (getResolvedUpdate()) { auto remove_it = std::remove_if(all_entries.begin(), all_entries.end(), - [](const BTBEntry &e) { return !(e.isCond && !e.alwaysTaken && e.resolved); }); + [&base_filter](const BTBEntry &e) { return !(base_filter(e) && e.resolved); }); all_entries.erase(remove_it, all_entries.end()); } else { auto remove_it = std::remove_if(all_entries.begin(), all_entries.end(), - [](const BTBEntry &e) { return !(e.isCond && !e.alwaysTaken); }); + [&base_filter](const BTBEntry &e) { return !base_filter(e); }); all_entries.erase(remove_it, all_entries.end()); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cpu/pred/btb/btb_tage.cc(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/cpu/pred/btb/abtb.cc(1 hunks)src/cpu/pred/btb/decoupled_bpred.cc(2 hunks)src/cpu/pred/btb/mbtb.cc(3 hunks)src/cpu/pred/btb/stream_struct.hh(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/cpu/pred/btb/mbtb.cc (2)
src/cpu/pred/btb/mbtb.hh (4)
stream(127-127)stream(163-163)stream(171-171)stream(281-282)src/cpu/pred/btb/btb_ubtb.hh (2)
stream(133-133)stream(139-139)
src/cpu/pred/btb/stream_struct.hh (1)
src/cpu/pred/loop_predictor.hh (1)
bi(238-238)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (5)
src/cpu/pred/btb/stream_struct.hh (2)
81-86: BranchInfo::resolved wiring looks consistent with resolve-stage trainingThe new
BranchInfo::resolvedfield and its initialization in both constructors default tofalse, with explicittrueassignment only at control squash sites. Operators still compare onlypc, so existing equality semantics (per-static-branch) remain unchanged while the resolved bit is purely auxiliary for training logic. This aligns well with the resolve-update design.Also applies to: 93-96, 99-109
447-453: BTB entry resolve marking matches new BranchInfo::resolved semantics
FetchStream::markBTBEntryResolvedselectively setsentry.resolved = truefor matchingpcinupdateBTBEntries. Since these are per-stream copies, this cleanly annotates only the entries used during resolve-stage update without polluting the underlying BTB arrays.src/cpu/pred/btb/abtb.cc (1)
523-528: Clearingresolvedon AheadBTB updates is consistent with transient resolve-state usageResetting
entry_to_write.resolvedtofalsebefore committing to the BTB keeps the ahead BTB arrays free of stale resolve annotations, while resolve information is carried only in per-stream copies (updateBTBEntries) for ITTAGE-style training. This matches the intended “ephemeral resolve bit” design.src/cpu/pred/btb/decoupled_bpred.cc (1)
463-467: Resolve-stage flow forexeBranchInfo.resolvedand per-entryresolvedlooks coherent, but verify call orderingThe new bits in this file establish a clear resolve-update contract:
- On control squashes with a valid
static_inst,handleSquash()now buildsstream.exeBranchInfowith full type/target info and setsstream.exeBranchInfo.resolved = true.markCFIResolved()tags matching entries instream.updateBTBEntriesby settingentry.resolved = true, so resolve-aware predictors can see which BTB entries in the stream have actually completed.resolveUpdate()then callsupdate(stream)only on components withgetResolvedUpdate()==true(e.g., ITTAGE) whenstream.isHit || stream.exeTaken, and finally clearsstream.exeBranchInfo.resolvedto avoid double-training on later calls.This wiring is logically sound, but the correctness hinges on external sequencing:
prepareResolveUpdateEntries(stream_id)must be called beforemarkCFIResolved(stream_id, resolvedInstPC)so thatupdateBTBEntriesis populated; otherwisemarkCFIResolvedwon’t actually mark anything.- Only components that are truly resolve-stage–trained should return
getResolvedUpdate()==true; others should continue to rely on the existing commit-timeupdatePredictorComponents()path.Given the performance regressions reported by the robot, please double-check those two assumptions in the callers (resolve path in the core) and that ITTAGE is the only predictor currently switched to the resolve-update path.
Also applies to: 615-636, 661-672
src/cpu/pred/btb/mbtb.cc (1)
432-467: MBTB resolved-bit handling for new vs. existing entries is internally consistentIn
MBTB::getAndSetNewBTBEntry():
- On a BTB miss with
stream.exeTaken, you now initialize the newBTBEntryfromstream.exeBranchInfoand explicitly copystream.exeBranchInfo.resolvedintoentry_to_write.resolvedbefore storing it instream.updateNewBTBEntry.In
MBTB::buildUpdatedEntry():
- Regardless of whether you start from an existing entry or the request entry, you now reset
entry_to_write.resolved = falsebefore updating counters/targets and writing back.This achieves a clean separation:
- Transient resolve information is carried on per-stream state (
updateNewBTBEntryandupdateBTBEntries) for resolve-stage predictors.- The physical MBTB arrays and victim cache never retain stale
resolvedstate across updates.The only thing to watch is timing: for new branches,
getAndSetNewBTBEntry()must be called in the resolve-stage preparation path beforeresolveUpdate()clearsstream.exeBranchInfo.resolved, otherwiseupdateNewBTBEntry.resolvedwill always be false and ITTAGE won’t see resolved new-branch allocations as intended. Please confirm that the core’s resolve callback ordersprepareResolveUpdateEntries(),markCFIResolved(), andresolveUpdate()accordingly.Also applies to: 584-595
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
e1c5b82 to
2ac6ecb
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_ittage.cc (1)
222-232: Resolved vs non‑resolved BTB update filter looks correct; consider tightening comment and duplicationThe conditional filtering cleanly preserves old behavior when
getResolvedUpdate()is false and adds the expectede.resolvedrequirement when it is true, so functionally this aligns with “train only on resolved indirect, non‑return branches”.Two minor follow‑ups:
- The comment
// // only update indirect branches that are not returnsis now slightly inaccurate for the resolved‑update path and is double‑commented. Consider clarifying it to mention theresolvedrequirement, or just drop it.- The two
remove_if+eraseblocks differ only by the extra&& e.resolved. You could reduce duplication by capturinggetResolvedUpdate()in a predicate and callingremove_ifonce, but that’s just a readability nit.Given the performance regressions in the robot reports, it’s worth double‑checking that all intended training entries (especially
stream.updateNewBTBEntry) haveBTBEntry::resolvedset whengetResolvedUpdate()is true; otherwise they’ll be silently filtered out here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
configs/example/kmhv3.py(1 hunks)src/cpu/pred/btb/btb_ittage.cc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- configs/example/kmhv3.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/pred/btb/btb_ittage.cc (1)
src/cpu/pred/btb/test/decoupled_bpred.hh (6)
e(153-165)e(153-153)e(167-179)e(167-167)e(181-186)e(181-181)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
2ac6ecb to
a4c5c74
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_ittage.cc (1)
222-232: Update the comment to reflect conditional filtering behavior.The implementation correctly introduces two-path conditional filtering based on
getResolvedUpdate(). However, the comment on line 222 states "only update indirect branches that are not returns" which no longer fully describes the behavior whengetResolvedUpdate()is true (where theresolvedflag is also required).Apply this diff to clarify the comment:
- // // only update indirect branches that are not returns if (getResolvedUpdate()) { + // Only update indirect branches that are not returns AND are resolved auto remove_it = std::remove_if(all_entries_to_update.begin(), all_entries_to_update.end(), [](const BTBEntry &e) { return !(e.isIndirect && !e.isReturn && e.resolved); }); all_entries_to_update.erase(remove_it, all_entries_to_update.end()); } else { + // Only update indirect branches that are not returns (original behavior) auto remove_it = std::remove_if(all_entries_to_update.begin(), all_entries_to_update.end(), [](const BTBEntry &e) { return !(e.isIndirect && !e.isReturn); }); all_entries_to_update.erase(remove_it, all_entries_to_update.end()); }Based on PR comments, the automated performance tests show significant regressions (-22% to -23% vs master across multiple commits). While this change correctly implements the resolve-stage training feature, please verify that:
- The performance delta is expected for this feature
- Unit tests cover both code paths (resolvedUpdate true/false)
- The
resolvedflag is being set correctly on BTBEntry instances
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
configs/example/kmhv3.py(1 hunks)src/cpu/pred/btb/btb_ittage.cc(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/cpu/pred/btb/btb_ittage.cc (1)
src/cpu/pred/btb/test/decoupled_bpred.hh (6)
e(153-165)e(153-153)e(167-179)e(167-167)e(181-186)e(181-181)
configs/example/kmhv3.py (3)
src/cpu/pred/btb/timed_base_pred.hh (1)
resolvedUpdate(77-77)src/cpu/pred/btb/test/abtb.test.cc (1)
abtb(59-67)src/cpu/pred/btb/test/btb_tage.test.cc (1)
tage(276-282)
🔇 Additional comments (1)
configs/example/kmhv3.py (1)
100-100: LGTM! Configuration correctly enables resolve-stage training for ITTAGE.The configuration changes properly enable ITTAGE with resolve-stage training, aligning with the PR objective and matching the pattern used for mbtb and tage predictors. The past review comment requesting this change has been appropriately addressed after establishing a baseline.
However, the automated performance reports in the PR comments show consistent regressions of -22% to -23% versus master across multiple commits. Please verify whether:
- This performance impact is expected for resolve-stage training
- The regression is specific to the test workload or affects general performance
- Any tuning or optimization opportunities exist to mitigate the impact
Also applies to: 107-107
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
Change-Id: I6a01e4ecb12119e1d2040eff44f28ceeefb3886d
Change-Id: Ie7812db8bc256535f019bd59e36d5688312122ef
Change-Id: I13a7033fce6096dd2ad7a5e3c58f935486b28a63
a4c5c74 to
d97449c
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_ittage.cc (1)
222-232: Consider consolidating the duplicate filter logic.The two branches share nearly identical code with only the lambda condition differing. You could refactor to reduce duplication:
- // // only update indirect branches that are not returns - if (getResolvedUpdate()) { - auto remove_it = - std::remove_if(all_entries_to_update.begin(), all_entries_to_update.end(), - [](const BTBEntry &e) { return !(e.isIndirect && !e.isReturn && e.resolved); }); - all_entries_to_update.erase(remove_it, all_entries_to_update.end()); - } else { - auto remove_it = std::remove_if(all_entries_to_update.begin(), all_entries_to_update.end(), - [](const BTBEntry &e) { return !(e.isIndirect && !e.isReturn); }); - all_entries_to_update.erase(remove_it, all_entries_to_update.end()); - } + // only update indirect branches that are not returns + bool resolvedUpdate = getResolvedUpdate(); + auto remove_it = std::remove_if( + all_entries_to_update.begin(), all_entries_to_update.end(), + [resolvedUpdate](const BTBEntry &e) { + return !(e.isIndirect && !e.isReturn && (!resolvedUpdate || e.resolved)); + }); + all_entries_to_update.erase(remove_it, all_entries_to_update.end());Also, line 222 has a double comment prefix (
// //) that should be cleaned up.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
configs/example/kmhv3.py(1 hunks)src/cpu/pred/BranchPredictor.py(1 hunks)src/cpu/pred/btb/btb_ittage.cc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- configs/example/kmhv3.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/pred/btb/btb_ittage.cc (2)
src/cpu/pred/btb/decoupled_bpred.hh (10)
e(202-214)e(202-202)e(216-228)e(216-216)e(230-235)e(230-230)e(237-241)e(237-237)e(268-271)e(268-268)src/cpu/pred/btb/test/decoupled_bpred.hh (6)
e(153-165)e(153-153)e(167-179)e(167-167)e(181-186)e(181-181)
🪛 Ruff (0.14.7)
src/cpu/pred/BranchPredictor.py
1003-1003: Param may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (1)
src/cpu/pred/BranchPredictor.py (1)
1003-1003: LGTM!The default value change for
usingS3PredfromTruetoFalsealigns with the PR's objective of training ITTAGE at the resolve stage. The static analysis warning aboutParamis a false positive — it's correctly imported via the standard gem5 patternfrom m5.params import *on line 29.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/cpu/pred/btb/decoupled_bpred.cc (1)
252-261: Tighten ABTB S3 update gating and simplify previous-PC handlingThe new
it != fetchStreamQueue.end()check is good; it prevents dereferencing an invalid iterator when there is no previous stream entry. However, the current structure still callsabtb->updateUsingS3Pred(...)from theelsebranch even whenabtb->isEnabled()is false. That’s a bit inconsistent with the constructor pattern where disabled components are generally not used, and it duplicates theupdateUsingS3Predcall.You can both (a) avoid touching ABTB when it’s disabled and (b) centralize previous-PC selection with something like:
- auto it = fetchStreamQueue.find(fsqId-1); - if (it != fetchStreamQueue.end() && abtb->isEnabled()) { - auto previous_block_startpc = it->second.startPC; - abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], previous_block_startpc); - } else { - abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], 0); - } + if (abtb->isEnabled()) { + auto it = fetchStreamQueue.find(fsqId - 1); + Addr previous_block_startpc = 0; + if (it != fetchStreamQueue.end()) { + previous_block_startpc = it->second.startPC; + } + abtb->updateUsingS3Pred( + predsOfEachStage[numStages - 1], previous_block_startpc); + }This keeps the “use 0 when no previous stream” behavior, but clearly skips ABTB training when the component is disabled and removes duplicated calls. If
updateUsingS3Predis intentionally callable when disabled, you can still use this pattern for clarity and to document that intent.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/cpu/pred/btb/decoupled_bpred.cc(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
427f3c0 to
ea1a5b2
Compare
Change-Id: If8ad467ec69fc727ce177a942ae3ebc10eceafbe
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.