Conversation
Change-Id: I75c3f5f704e39ac4dfd16909ce5da365a2de6a91
for tage aligning Change-Id: I41078918deeb1b6a514bcfb0bb5df2df578910d0
Change-Id: I732d89766545b668bfce47ffe6be9f9b7161d569
WalkthroughThis PR modifies the instruction execution window (IEW) module to introduce local resolve queue management with sorting, and adjusts the branch predictor configuration by enabling resolved update flags while disabling specific predictor components. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Comment |
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cpu/o3/iew.cc (1)
1569-1581: Based on my complete verification of the codebase, the original review comment is accurate and well-founded. All three concerns have been confirmed:
Multi-threading correctness issue is real:
resolveQueueis a shared vector with no per-thread tracking, yet entries are forwarded only to the first active thread'siewInfoslot.Lifetime management issue is real:
resolveQueueis never cleared in state transition methods, risking stale entry contamination.Ordering convention is subtle but intentional: Descending sort with
back()pop is correct but easy to misread.
Shared
resolveQueuewithout per-thread tracking creates correctness issues for multi-threaded configurationsThe implementation that accumulates into
IEW::resolveQueueand forwards entries to the front-end raises critical concerns:
Multi-threading semantics are broken
resolveQueueis a single sharedstd::vector<ResolveQueueEntry>, butResolveQueueEntrycontains noThreadIDfield to identify which thread generated each entry.- In
SquashCheckAfterExe()(lines 1569-1581), resolve entries are added without any thread context:for (auto &entry : resolveQueue) { if (entry.resolvedFSQId == fsqId) { entry.resolvedInstPC.push_back(pc); found = true; } }- At the end of
executeInsts()(lines 1796-1801), all entries are forwarded into a single thread's slot:ThreadID tid = *activeThreads->begin(); // Always first thread sortResolveQueue(); if (!resolveQueue.empty()) { ResolveQueueEntry entry = resolveQueue.back(); resolveQueue.pop_back(); toFetch->iewInfo[tid].resolveQueue.push_back(entry); }- If O3 is configured for multi-threading (SMT), this means all resolved updates from any thread get funneled into the first active thread's
iewInfoslot. Threads 1, 2, etc. will never see their own resolve updates or will see updates destined for other threads.- Fix: Either add a
ThreadIDfield toResolveQueueEntryand partition/forward by thread inexecuteInsts(), or makeresolveQueueper-thread indexed (e.g.,std::vector<ResolveQueueEntry> resolveQueue[MaxThreads];).
resolveQueueis never explicitly cleared across pipeline state transitions
resolveQueueis never cleared inclearStates(),startupStage(), or equivalent handlers.- This allows stale entries to persist across drains or CPU handovers, contaminating subsequent contexts.
- Add explicit cleanup:
resolveQueue.clear();inclearStates()and other appropriate state-reset methods.Sort +
back()convention is subtle
- The comparator orders entries in descending FSQ id, so
back()returns the smallest FSQ id (oldest). This is intentional but easy to misread.- Add a clarifying comment or consider using
front()with a reversed comparator for clarity.Given that this path trains the MBTB/TAGE predictor with resolved updates, and multi-threaded O3 may be used in future configs, these issues should be addressed before merging.
Also applies to: 1796-1801
🧹 Nitpick comments (1)
src/cpu/o3/iew.hh (1)
497-503: Ensure header declares its dependency onstd::sort
sortResolveQueue()usesstd::sort, but this header doesn’t include<algorithm>. Relying on transitive includes can be fragile; it’s safer foriew.hhto include<algorithm>itself so any TU including this header compiles independently.You may also want to add a brief comment describing that
resolveQueueEntryCompareorders entries by descendingresolvedFSQId, with callers expected to useback()to get the oldest (smallest) FSQ id, just to make the intent obvious to future readers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
configs/example/kmhv3.py(1 hunks)src/cpu/o3/iew.cc(3 hunks)src/cpu/o3/iew.hh(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
configs/example/kmhv3.py (3)
src/cpu/pred/btb/timed_base_pred.hh (1)
resolvedUpdate(77-77)src/cpu/pred/btb/test/btb_tage.test.cc (1)
tage(275-281)src/cpu/pred/btb/test/abtb.test.cc (1)
abtb(59-67)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (1)
configs/example/kmhv3.py (1)
93-107: DecoupledBPUWithBTB predictor config looks coherent with resolved-update flowEnabling
mbtb.tage.resolvedUpdateand disablingabtb,ittage,mgsc, andrasfor this config is internally consistent and matches the PR’s goal of using resolved updates with a trimmed predictor set. No functional issues spotted here.
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
Summary by CodeRabbit
Chores
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.