Conversation
use startPC instead of alignedPC, to calculate index, tag, bank easier Change-Id: Id76bf1658e4286aad2cd4580943ff724bfb7e1e7
WalkthroughRefactors BTBTAGE to use a fetch-block reference Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
src/cpu/pred/btb/btb_tage.hh (1)
167-195: StartPC-based helper signatures are consistent; consider documenting the implied contract.The switch to
const Addr &startPCforlookupHelperand the updatedgetBranchIndexInBlock(Addr pc, Addr startPC)signatures match the implementations and simplify call sites. Semantics remain clear as long as callers pass astartPCwhose block contains all BTB entries being processed. A brief comment on that precondition next togetBranchIndexInBlockwould help future maintainers avoid misuse.src/cpu/pred/btb/btb_tage.cc (5)
195-294: Use of startPC in generateSinglePrediction is coherent with the index/tag math.Switching to
startPCingenerateSinglePredictionfor:
getBranchIndexInBlock(btb_entry.pc, startPC),getTageIndex(startPC, …), andgetTageTag(startPC, …, position)is semantically safe because both
getTageIndexandgetTageTagshift by at leastfloorLog2(blockSize), so low block-offset bits (wherestartPCand an explicitly aligned PC can differ) are discarded before masking/xor. UsinggetBaseTableIndex(startPC)likewise preserves the original indexing for the base table.Minor nit: you recompute the branch position later as
branch_idx = getBranchIndexInBlock(btb_entry.pc, startPC);– you could reuse the earlierpositionto avoid a second subtract/shift, but it’s purely cosmetic.
340-379: putPCHistory cleanly separates startPC for indexing from alignedPC for banking.
putPCHistorynow:
- Accepts
Addr startPC(consistent with the baseTimedBaseBTBPredictorinterface).- Derives
alignedPC = startPC & ~(blockSize - 1)only for bank ID computation and logging.- Passes
startPCtolookupHelper, which is then used for TAGE index/tag computation.Given that
getTageIndex/getTageTagshift away the block-offset bits, usingstartPCinstead ofalignedPCthere preserves behavior while simplifying call signatures. The comment “use 32byte(blockSize) aligned PC for prediction” is now conceptually true via the shifts rather than via the parameter itself; if you touch this again, you might slightly reword it to reflect that.
560-641: handleNewEntryAllocation’s startPC usage is consistent with prediction.
handleNewEntryAllocationnow:
- Computes
positionviagetBranchIndexInBlock(entry.pc, startPC), and- Uses
getTageIndex(startPC, ti, …)andgetTageTag(startPC, ti, …, position)so that new allocations are keyed by the same index and tag basis as the lookup path. This keeps provider selection and later lookups aligned.
If you want a tiny performance/clarity tweak, you could take
const std::shared_ptr<TageMeta>& metasince the function doesn’t modify or retain the pointer, but the current signature is perfectly acceptable.
805-821: Tag computation from PC and history remains compatible with startPC.
getTageTagnow formspcBitsas:Addr pcBits = (pc >> floorLog2(blockSize)) & mask;When callers pass
startPC, this effectively uses bits above the block-offset, which is the same information content you’d get from an explicitly aligned PC. Combined withfoldedHist,altFoldedHist, andposition, this keeps tags stable under the new startPC-based API.If in future you decide to use per-table PC shifts (e.g., from
tablePcShifts), this would be the natural place to integrate them, but nothing here blocks that.
891-899: Deriving alignedPC inside getBranchIndexInBlock simplifies callers; assumptions unchanged.
getBranchIndexInBlocknow takes(Addr pc, Addr startPC)and derives:Addr alignedPC = startPC & ~(blockSize - 1); Addr offset = (pc - alignedPC) >> 1; assert(offset < maxBranchPositions);This removes the need to compute an aligned PC at each call site while preserving the original assumption that all relevant branch PCs reside within the same block window anchored by
startPC. Given that this was already required by the previous API, behavior is unchanged.If you ever worry about this being called with a mismatched
startPC, you could strengthen the debug check (e.g., additionally assertingpc >= alignedPC) or guard against out-of-range offsets in non-assert builds, but that’s not strictly necessary for this refactor.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/cpu/pred/btb/btb_tage.cc(14 hunks)src/cpu/pred/btb/btb_tage.hh(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/cpu/pred/btb/btb_tage.cc (5)
src/cpu/pred/btb/btb_tage.hh (19)
btb_entry(403-405)startPC(168-169)startPC(417-424)pred(370-370)history(130-130)history(134-135)history(139-142)history(139-139)history(144-148)history(144-145)history(161-161)history(202-202)history(382-382)stream(155-155)stream(408-408)entry(88-90)entry(152-152)entry(411-414)startAddr(122-124)src/cpu/pred/btb/btb_mgsc.hh (6)
btb_entry(254-255)stream(162-162)stream(258-258)entry(160-160)entry(260-261)startAddr(134-135)src/cpu/pred/btb/stream_struct.hh (4)
startPC(380-382)startPC(704-734)startPC(704-709)entry(489-511)src/cpu/pred/btb/timed_base_pred.hh (6)
entry(75-75)entry(75-75)entry(80-80)entry(80-80)startAddr(59-61)startAddr(59-61)src/cpu/pred/btb/uras.hh (1)
startAddr(43-44)
src/cpu/pred/btb/btb_tage.hh (2)
src/cpu/pred/btb/stream_struct.hh (9)
startPC(380-382)startPC(704-734)startPC(704-709)pc(738-758)pc(738-741)pc(770-780)pc(770-771)pc(785-793)pc(785-785)src/cpu/pred/btb/jump_ahead_predictor.hh (2)
startPC(155-164)startPC(155-155)
🔇 Additional comments (4)
src/cpu/pred/btb/btb_tage.hh (1)
400-425: Unified startPC API for prediction/allocation helpers looks good.Updating
generateSinglePredictionandhandleNewEntryAllocationto takestartPCkeeps the prediction and allocation paths aligned with each other and with the rest of the BTB predictor interfaces. The declarations match the definitions in the.ccfile; no issues from an interface or correctness perspective.src/cpu/pred/btb/btb_tage.cc (3)
296-330: lookupHelper’s startPC threading aligns with the new API.
lookupHelpernow takesstartPCand forwards it intogenerateSinglePrediction, which is exactly what you want after the signature change. The metadata population (meta->preds[...]) and stats updates are unchanged and remain consistent with the prediction/update paths.
429-555: Update path now uses the same startPC-derived indices as prediction.In
updatePredictorStateAndCheckAllocation, pullingAddr startPC = stream.getRealStartPC();and then using:
getBaseTableIndex(startPC)andgetBranchIndexInBlock(entry.pc, startPC)ensures the base-table index and branch-position used on update match those used during prediction. That keeps the fallback counter training consistent with how predictions were computed. No correctness issues spotted here.
648-789: Update() correctly reuses real start PC for recompute and allocation.The update routine now consistently uses
startAddr = stream.getRealStartPC()when:
- Recomputing providers via
generateSinglePrediction(btb_entry, startAddr, predMeta)whenupdateOnReadis true, and- Calling
handleNewEntryAllocation(startAddr, …).This keeps recomputation and allocation fully in sync with the prediction-time use of
startPCwithout affecting the existing bank-conflict logic, which still relies onalignedPCforgetBankId. Looks solid.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
Change-Id: I608a50da30297fbc90de925374de5b34c2bdbf9d
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/cpu/pred/btb/btb_tage.cc (3)
37-51: Document/guard geometry assumptions forblockWidth,bankBaseShift, andindexShift.The new members:
blockWidth(floorLog2(blockSize))bankBaseShift(instShiftAmt)indexShift(bankBaseShift + ceilLog2(numBanks))implicitly assume:
blockSizeis a power of two (used viafloorLog2andstartPC & ~(blockSize - 1)),baseTableSizeanduseAltOnNaSizeare powers of two (used with- 1masks),numBanks > 0, and in practice often a power of two.These match current defaults, but custom configs that violate them will get non‑obvious behavior (e.g., unreachable bank IDs or uneven base‑table coverage).
Consider adding debug‑time asserts (or at least comments) in the ctor to state these invariants explicitly, e.g.
assert(isPowerOf2(blockSize)),assert(isPowerOf2(baseTableSize)),assert(isPowerOf2(useAltOnNaSize)), andassert(numBanks > 0).Also applies to: 70-91
196-228: EnsurestartPC/startAddris exactly the same value on prediction and update paths.The predictor is now keyed off
startPC/startAddrrather than an explicitalignedPC:
- Prediction:
putPCHistory(Addr startPC, ...)recordslastPredBankId = getBankId(startPC)and callslookupHelper(startPC, ...).lookupHelpercallsgenerateSinglePrediction(btb_entry, startPC, ...).- Update:
update(const FetchStream &stream)derivesAddr startAddr = stream.getRealStartPC();- Uses
getBankId(startAddr)for conflict detection.- When
updateOnRead, recomputes withgenerateSinglePrediction(btb_entry, startAddr, predMeta).- State updates use
getBaseTableIndex(startPC)/getBranchIndexInBlock(entry.pc, startPC)withstartPCtaken from the stream.This is logically consistent only if the
startPCpassed intoputPCHistoryis guaranteed to equalstream.getRealStartPC()later for the same stream entry. If they can ever diverge (e.g., due to alignment adjustments, replay, or front‑end steering), you’ll get:
- bank‑ID mismatches (
lastPredBankIdvsupdateBank),- base‑table and TAGE table updates landing in different sets than the original lookup, degrading accuracy and possibly explaining some of the observed regression.
Suggestion:
- Either:
- enforce
startPC == stream.getRealStartPC()via an assert in debug builds, or- store the exact
startPCused for prediction inTageMeta(or the stream) and consume that inupdate()instead of recomputing fromgetRealStartPC().At minimum, it’s worth double‑checking the call sites in the front‑end pipeline to confirm this invariant holds after the
alignedPC→startPCrefactor.Also applies to: 271-276, 305-321, 347-376, 428-444, 566-587, 648-734
800-809: Double‑check PC shifting and bank/index geometry against the intended RTL mapping.The helper set:
getTageTag:pcBits = (pc >> bankBaseShift) & mask;getTageIndex:pcShift = enableBankConflict ? indexShift : bankBaseShift;getUseAltIdx:shiftedPc = pc >> instShiftAmt;getBaseTableIndex:(pc >> blockWidth) & (baseTableSize - 1);getBranchIndexInBlock:alignedPC = startPC & ~(blockSize - 1); offset = (branchPC - alignedPC) >> instShiftAmt;getBankId:(pc >> bankBaseShift) & ((1 << bankIdWidth) - 1);introduces three different shift domains:
- instruction granularity via
instShiftAmt/bankBaseShift,- block granularity via
blockWidth = floorLog2(blockSize),- bank interleaving via
indexShift = bankBaseShift + ceilLog2(numBanks).This is a reasonable factoring, but it subtly changes which PC bits feed tags/index vs. banks when compared to an
alignedPC‑based scheme. Given the sizeable Ideal BTB regression reported on this PR, this is one of the likely hot spots:
- With
enableBankConflict == true,getTageIndexstarts fromindexShift, so TAGE indices ignore the bank‑select bits used bygetBankId.- With
enableBankConflict == false, indices start frombankBaseShift, sharing low bits with bank IDs.getBranchIndexInBlocknow derives alignment fromblockSize, whilemaxBranchPositionsstill assumes a 64‑byte logical block in the comment, so the effective range depends on the relation betweenblockSize,instShiftAmt, and your actual fetch width.I strongly recommend:
- Cross‑checking these shifts and masks against the RTL/TAGE spec (especially the intended
cfiPosition/ block layout).- Verifying that for the default config (
blockSize=32,instShiftAmtfor your ISA,numBanks=4) the new derived indices and tags for a few sample PCs match what the hardware model expects.- Optionally adding comments near these helpers summarizing which PC bits they consume (e.g., “bankId uses bits [bankBaseShift .. bankBaseShift+bankIdWidth-1]”).
This will help ensure the refactor is behaviorally equivalent to the intended design and not inadvertently degrading prediction quality.
Also applies to: 826-837, 869-879, 881-895
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/cpu/pred/BranchPredictor.py(1 hunks)src/cpu/pred/btb/btb_tage.cc(17 hunks)src/cpu/pred/btb/btb_tage.hh(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cpu/pred/btb/btb_tage.hh
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/pred/btb/btb_tage.cc (2)
src/cpu/pred/btb/btb_tage.hh (28)
btb_entry(403-405)startPC(168-169)startPC(417-424)pred(370-370)history(130-130)history(134-135)history(139-142)history(139-139)history(144-148)history(144-145)history(161-161)history(201-201)history(382-382)stream(155-155)stream(408-408)entry(88-90)entry(152-152)entry(411-414)pc(172-172)pc(175-175)pc(179-179)pc(183-183)pc(186-188)pc(186-186)pc(191-191)pc(198-198)pc(286-286)branchPC(194-194)src/cpu/pred/btb/stream_struct.hh (10)
startPC(380-382)startPC(704-734)startPC(704-709)entry(489-511)pc(738-758)pc(738-741)pc(770-780)pc(770-771)pc(785-793)pc(785-785)
🪛 Ruff (0.14.6)
src/cpu/pred/BranchPredictor.py
1060-1060: Param may be undefined, or defined from star imports
(F405)
🔇 Additional comments (1)
src/cpu/pred/BranchPredictor.py (1)
1037-1062: Reconsider defaultingBTBTAGE.enableBankConflicttoTrue(likely cause of performance regression).Changing
enableBankConflictfrom the previous defaultFalsetoTrueis a semantic change affecting every config instantiatingBTBTAGE, includingMicroTAGEandDecoupledBPUWithBTB.tage.With
enableBankConflictnow on by default:
getTageIndexusespcShift = indexShiftinstead ofbankBaseShift, altering which PC bits are used for indexing.- The update path drops updates when
updateBank == lastPredBankId, reducing effective training frequency.- This behavior change would show up as performance regression in benchmarks expecting conflict-free indexing.
Given that this alters indexing semantics for all BTBTAGE-based configs:
- Verify the regression impact by re-running the workflow with
enableBankConflictforced toFalseto isolate how much of any performance change comes from the default flip vs. other index logic changes, and- Consider keeping the default at
Falseand requiring configs that want realistic bank conflict simulation to opt in explicitly.If the goal is to validate bank conflicts in specific studies, making this opt-in (or at least clearly documented) will avoid surprising users who expect the previous ideal-case behavior from the default BTBTAGE configuration.
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
Change-Id: I41edcd55edfffd5225319f8d49dc6b985fbbdec9
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/Gem5_Docs/frontend/TAGE_BANK_IMPLEMENTATION_PLAN.md (1)
59-59: Update stale API reference in code example.The code snippet at line 59 shows
getBankId(alignedPC), but the PR objective indicates the change replacesalignedPCwithstartPCthroughout. This should be updated for consistency with the implemented changes.- lastPredBankId = getBankId(alignedPC); + lastPredBankId = getBankId(startPC);
🧹 Nitpick comments (2)
src/cpu/pred/btb/test/btb_tage.test.cc (2)
251-271: Clarify and guardfindTableWithEntry’s dependence on prediction metaThe new helper correctly uses
(startPC, branchPC)andindexFoldedHistto mirror how the predictor indexed the last fetch block. However, it silently assumes:
tage->getPredictionMeta()is non-null and still corresponds to a priorputPCHistoryfor thisstartPC, andmeta->indexFoldedHist[t]is valid for allt < numPredictors.Consider adding a brief comment or GTest
ASSERT_NE(meta, nullptr)(and, if practical, asserts onindexFoldedHist[t]) to make these preconditions explicit and fail fast if the helper is ever called out of sequence in future tests.
887-919: BankConflict test: good bank mapping; consider centralizing bank computationThe added comment documenting bank IDs from
(pc >> 1) & 0x3withinstShiftAmt == 1, and the use of0x100vs0x104in Test 2, correctly exercise a “different bank” case under the new startPC-based banking scheme.To avoid future drift between comments, magic constants, and BTBTAGE’s actual bank function, consider introducing a small
getBank(Addr)helper in the test (or reusing an existing API) and deriving the test PCs from that, instead of hard-coding specific values in comments and calls.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/Gem5_Docs/frontend/TAGE_BANK_IMPLEMENTATION_PLAN.md(2 hunks)src/cpu/pred/btb/test/btb_tage.test.cc(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/pred/btb/test/btb_tage.test.cc (2)
src/cpu/pred/btb/btb_ubtb.hh (1)
meta(144-147)src/cpu/pred/btb/btb_ittage.hh (2)
entry(66-66)entry(107-107)
🪛 LanguageTool
docs/Gem5_Docs/frontend/TAGE_BANK_IMPLEMENTATION_PLAN.md
[uncategorized] ~25-~25: 您的意思是“位"于"”吗?
Context: ...的 ceilLog2(numBanks) 位作为 bank id,剩余更高位与 folded history 组合得到 index / tag。这样可以直接利...
(YU7_YU8)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (2)
src/cpu/pred/btb/test/btb_tage.test.cc (2)
300-306: BasicPrediction’sfindTableWithEntrycall matches new API semanticsPassing
0x1000as bothstartPCandbranchPCis consistent with the single-branch-per-block scenario under test and with the updated(startPC, branchPC)helper. No issues from a semantics perspective.
532-535: MultipleBranchSequence correctly distinguishes per-branch allocationUsing a shared
startPC(0x1000) with differentbranchPCvalues (0x1000vs0x1004) when callingfindTableWithEntryis a good fit for the new indexing model: you probe the same fetch-block index while discriminating entries byentry.pc. This aligns with the test’s intention that only the mispredicted second branch should allocate.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
use startPC instead of alignedPC, to calculate index, tag, bank easier
Change-Id: Id76bf1658e4286aad2cd4580943ff724bfb7e1e7
Summary by CodeRabbit
Refactor
Behavior Changes
Tests
Documentation
Other
✏️ Tip: You can customize this high-level summary in your review settings.