cpu-o3: remove simple functions in fetch#725
Conversation
📝 WalkthroughWalkthroughFetch and BTB code were refactored to consume FSQ head entries instead of FTQ, removing the FetchTargetQueue and FTQ helpers, updating DecoupledBPUWithBTB to deque-backed FSQ, adjusting fetch/I-cache logic, tests, docs, and a small fetch-to-decode delay parameter change. Changes
Sequence Diagram(s)sequenceDiagram
participant Fetch as O3::Fetch
participant BPU as DecoupledBPUWithBTB
participant FSQ as FetchStreamQueue (FSQ)
participant ICache as ICache
Fetch->>BPU: request fsqHasHead / fsqHeadId
BPU-->>FSQ: frontStreamId / fsqHead()
FSQ-->>BPU: FetchStream (startPC, predEndPC, predTaken, predBranchInfo)
BPU-->>Fetch: supplying startPC/target/predTaken
Fetch->>ICache: issue I$ request aligned to FSQ startPC
ICache-->>Fetch: return cacheline(s)
Fetch->>Fetch: fill fetchBuffer, compute nextPC from FSQ stream
Estimated Code Review Effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Change-Id: Ifbe3eabd992b9682c7a0ed8c52f61c014ad219a7
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/cpu/o3/fetch.cc`:
- Around line 2058-2075: The early-return when fetchBuffer[tid].valid is true
must also verify the current PC is still inside the cached range; update
sendNextCacheRequest() to replace the unconditional "if (fetchBuffer[tid].valid)
return;" with a guarded check that fetchBuffer[tid].valid is true AND
pc_state.instAddr() is >= fetchBuffer[tid].startPC and <
(fetchBuffer[tid].startPC + fetchBufferSize) before returning. Use the existing
symbols fetchBuffer[tid].valid, fetchBuffer[tid].startPC, fetchBufferSize and
pc_state.instAddr() so that if the PC has advanced outside the range the
function falls through to fetchCacheLine(ftq_start_pc, tid, pc_state.instAddr())
(and the FTQ/ftq_entry logic) to reissue an I-cache request.
🧹 Nitpick comments (2)
src/cpu/o3/fetch.cc (2)
844-848: Remove duplicate branch-taken log.Line 844 already logs the taken-branch target; the existing message at Lines 846-848 repeats the same content and doubles log volume. Consider keeping one.
🧹 Suggested cleanup
- DPRINTF(Fetch, "[tid:%i] [sn:%llu] Branch at PC %#x " - "predicted to go to %s\n", - tid, inst->seqNum, inst->pcState().instAddr(), next_pc);
1799-1808: Consider checking interrupts before the early return.Line 1801 returns before the interrupt check, so interrupt pending no longer short-circuits fetch prep here (relying on
fetchCacheLine()later). If you intended to preserve the previous “interrupt blocks fetch” behavior/metrics, move the interrupt guard above the early return.🛠️ Optional reorder to keep interrupt gating early
- if (!macroop[tid] && !fetchBuffer[tid].valid) { - return true; - } else if (checkInterrupt(this_pc.instAddr()) && !delayedCommit[tid]) { + if (checkInterrupt(this_pc.instAddr()) && !delayedCommit[tid]) { // Stall CPU if an interrupt is posted ++fetchStats.miscStallCycles; DPRINTF(Fetch, "[tid:%i] Fetch is stalled!\n", tid); return false; + } + if (!macroop[tid] && !fetchBuffer[tid].valid) { + return true; }
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
|
This commit introduces a significant refactor of the fetch mechanism by removing the Fetch Target Queue (FTQ) and transitioning to a Fetch Stream Queue (FSQ) model. The changes include: - Deletion of the FTQ class and its associated dependencies. - Direct consumption of FSQ entries by the fetch process, simplifying the fetch logic. - Adjustments to maintain legacy ID compatibility while ensuring a single queue model. - Code simplifications to reduce complexity and improve maintainability. This refactor aligns the fetch implementation more closely with RTL design principles, enhancing performance and clarity in the fetch stage. Change-Id: I7a8b312785a7cce526192de7125cea458b979c0e
Change-Id: Iba0f86103b735ce4394bb975c319b4360e1b3a9b
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/cpu/o3/fetch.cc`:
- Around line 836-840: There are duplicate DPRINTF debug statements in the fetch
stage that both log branch prediction info using the same vars (tid,
inst->seqNum, inst->pcState().instAddr(), next_pc); remove one of the redundant
DPRINTF calls (either the first or second block) in src/cpu/o3/fetch.cc so only
a single branch prediction log remains (keep the preferred message format and
delete the other), ensuring references to inst, tid, and next_pc remain correct.
♻️ Duplicate comments (1)
src/cpu/o3/fetch.cc (1)
2050-2067: FSQ-based cache request initiation looks correct.The implementation correctly derives
start_pcfromfsqHead().startPCand initializesfetchBuffer[tid].startPCbefore issuing the cache request. ThefsqEmpty()guard prevents access when no FSQ entry is available.However, the concern about missing PC range validation at line 2050 (checking only
fetchBuffer[tid].validwithout verifying the current PC is within the buffer range) remains from a previous review.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
|
Change-Id: Ifbe3eabd992b9682c7a0ed8c52f61c014ad219a7
Summary by CodeRabbit
Refactor
Behavior Change
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.