cpu-o3: move resolve queue merging from iew to fetch stage#635
cpu-o3: move resolve queue merging from iew to fetch stage#635jensen-yan merged 4 commits intoxs-devfrom
Conversation
Change-Id: Iaba29431fecd59d250c48cc566cb9b18140c5098
WalkthroughTransfers resolve-queue ownership and runtime buffering from IEW to Fetch, replaces the per-stage Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/cpu/o3/fetch.cc (2)
1535-1535: Use const reference to avoid copying.The loop variable
resolvedInstPCshould be a const reference to avoid unnecessary copying ofuint64_tvalues. While the performance impact is minimal foruint64_t, using references is more consistent with the pattern used on line 1506 (const auto &resolved).- for (const auto resolvedInstPC : entry.resolvedInstPC) { + for (const auto& resolvedInstPC : entry.resolvedInstPC) {
253-259: Consider adding.prereq()for new statistics.The new statistics (
resolveQueueFullCycles,resolveQueueFullEvents,resolveEnqueueFailEvent) are added but don't have.prereq()calls like other statistics in this file (e.g., lines 261-296). Adding.prereq()ensures they only appear in output when non-zero, which is consistent with the existing pattern.Add after line 328:
resolveQueueFullCycles .prereq(resolveQueueFullCycles); resolveQueueFullEvents .prereq(resolveQueueFullEvents); resolveEnqueueFailEvent .prereq(resolveEnqueueFailEvent);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/cpu/o3/comm.hh(1 hunks)src/cpu/o3/fetch.cc(3 hunks)src/cpu/o3/fetch.hh(3 hunks)src/cpu/o3/iew.cc(2 hunks)src/cpu/o3/iew.hh(0 hunks)
💤 Files with no reviewable changes (1)
- src/cpu/o3/iew.hh
🧰 Additional context used
🧬 Code graph analysis (2)
src/cpu/o3/iew.cc (1)
src/cpu/o3/iew.hh (16)
tid(173-173)tid(200-200)tid(226-226)tid(258-258)tid(258-258)tid(265-265)tid(265-265)tid(287-290)tid(287-287)tid(315-315)tid(320-320)tid(323-323)tid(326-326)tid(328-328)tid(334-334)tid(335-335)
src/cpu/o3/fetch.cc (1)
src/cpu/pred/btb/decoupled_bpred.hh (3)
stream_id(818-818)stream_id(819-819)stream_id(820-820)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (5)
src/cpu/o3/fetch.hh (1)
616-620: LGTM! Resolve queue member declarations are appropriate.The new
resolveQueueSizeconstant andresolveQueuedeque are properly declared. The use ofstd::dequeis a good choice as it supports efficient front removal and back insertion needed for FIFO queue semantics.src/cpu/o3/comm.hh (1)
251-257: LGTM! New ResolvedCFIEntry structure is well-designed.The structure is clean and provides the necessary fields (
fsqIdandpc) for communicating resolved control-flow information from IEW to Fetch. Using astd::vectoris appropriate for the per-cycle batch of resolved CFIs.src/cpu/o3/fetch.cc (1)
1504-1528: Potential issue: ZeroresolveQueueSizedisables new entry creation entirely.When
resolveQueueSizeis 0, the condition on line 1520 (resolveQueueSize && resolveQueue.size() >= resolveQueueSize) will short-circuit and never be true. However, this also means entries will still be merged if they match an existingfsqIdon lines 1508-1513, but since no new entries can be created (the else branch on line 1520 is always taken when size is 0), the queue will always be empty after startup.If
resolveQueueSize == 0is intended to mean "unlimited", the condition should be:- if (resolveQueueSize && resolveQueue.size() >= resolveQueueSize) { + if (resolveQueueSize > 0 && resolveQueue.size() >= resolveQueueSize) {Or if
resolveQueueSize == 0means "disabled", this should be documented. Please clarify the intended behavior.src/cpu/o3/iew.cc (2)
1567-1573: Clarify: All executed instructions now produce CFI entries, not just control-flow.The
if (inst->isControl())condition is commented out, meaning every executed instruction will produce aResolvedCFIEntry. This changes the semantics from "resolved control-flow instructions" to "all resolved instructions."If this is intentional (e.g., to track all instructions for some purpose), the comment
// if (inst->isControl())should be removed and the member nameresolvedCFIsmight be misleading. If only control-flow instructions should produce entries, the condition should be uncommented.Is this change intentional? If so, consider:
- Removing the commented-out condition
- Potentially renaming
resolvedCFIstoresolvedInstsif it now tracks all instructions
1670-1670: LGTM! Proper clearing of resolvedCFIs at cycle start.Clearing
resolvedCFIsat the beginning ofexecuteInsts()ensures each cycle starts fresh, and the vector is populated with only the current cycle's resolved instructions.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
Change-Id: Ib5159561facea8cf1e8fc4d185a8ba8772a06045
Change-Id: I12bf9bddd5ced6bb4574347ef7e7392e23c0b558
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
configs/example/kmhv3.py (1)
93-100: Clarify why TAGE bank conflicts are disabled for this configThe new
cpu.branchPred.tage.enableBankConflict = Falseis straightforward, but the reason for force‑disabling bank‑conflict handling isn’t obvious from context (especially with other bank‑related knobs in this script). Consider adding a short comment explaining whether this is for performance modeling, to match RTL, or to avoid double‑counting conflicts, so future changes to the BPU don’t “fix” this by accident.src/cpu/o3/iew.cc (1)
1567-1572: ResolvedCFIEntry population: confirmpcsemantics and consideremplace_back.The control‑flow logging looks reasonable, but two points to double‑check:
- If
ResolvedCFIEntry::pcis meant to hold the resolved next/fall‑through PC rather than the branch’s own PC, please confirm thatinst->getPC()is the correct value to store here (vs e.g. a post‑execute PC state or target field frompcState()). This depends on how Fetch consumesResolvedCFIEntry::pcand isn’t obvious from this file alone.- Minor: you can avoid the temporary
entryby usingemplace_back, which is a bit cleaner and may be marginally cheaper:- auto &resolved_cfis = toFetch->iewInfo[tid].resolvedCFIs; - TimeStruct::IewComm::ResolvedCFIEntry entry; - entry.fsqId = inst->getFsqId(); - entry.pc = inst->getPC(); - resolved_cfis.push_back(entry); + auto &resolved_cfis = toFetch->iewInfo[tid].resolvedCFIs; + resolved_cfis.emplace_back(TimeStruct::IewComm::ResolvedCFIEntry{ + .fsqId = inst->getFsqId(), + .pc = inst->getPC(), + });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
configs/example/kmhv3.py(1 hunks)src/cpu/o3/iew.cc(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
configs/example/kmhv3.py (1)
src/cpu/pred/btb/test/btb_tage.test.cc (1)
tage(276-282)
src/cpu/o3/iew.cc (1)
src/cpu/o3/iew.hh (16)
inst(203-203)inst(208-208)inst(211-211)inst(214-214)inst(217-217)inst(223-223)inst(268-268)inst(271-271)inst(271-271)inst(280-280)inst(293-293)inst(295-295)inst(307-307)inst(312-312)inst(370-370)inst(596-596)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
This reverts commit 93c3177.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
Change-Id: Iaba29431fecd59d250c48cc566cb9b18140c5098
Summary by CodeRabbit
Refactor
Monitoring
Chores
✏️ Tip: You can customize this high-level summary in your review settings.