Skip to content

cpu-o3: remove simple functions in fetch#725

Merged
jensen-yan merged 3 commits intoxs-devfrom
simplify-fetch2-align
Jan 23, 2026
Merged

cpu-o3: remove simple functions in fetch#725
jensen-yan merged 3 commits intoxs-devfrom
simplify-fetch2-align

Conversation

@jensen-yan
Copy link
Copy Markdown
Collaborator

@jensen-yan jensen-yan commented Jan 22, 2026

Change-Id: Ifbe3eabd992b9682c7a0ed8c52f61c014ad219a7

Summary by CodeRabbit

  • Refactor

    • Streamlined instruction fetch and branch-prediction flow to a single stream-based path, simplifying fetch initiation and buffer alignment.
    • Removed legacy helper utilities and consolidated fetch logic for improved maintainability.
  • Behavior Change

    • Increased fetch-to-decode delay from 2 to 3 cycles.
  • Documentation

    • Added a migration plan describing a stream-only fetch/branch-predict approach.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Fetch and BTB code were refactored to consume FSQ head entries instead of FTQ, removing the FetchTargetQueue and FTQ helpers, updating DecoupledBPUWithBTB to deque-backed FSQ, adjusting fetch/I-cache logic, tests, docs, and a small fetch-to-decode delay parameter change.

Changes

Cohort / File(s) Summary
Fetch: FTQ → FSQ rewrite
src/cpu/o3/fetch.cc, src/cpu/o3/fetch.hh, src/cpu/o3/trace/TraceFetch.cc
Replaced FTQ-oriented logic with FSQ-based accessors (fsqHasHead/fsqHead/fsqHeadId/fsqHeadFtqId), updated PC end/taken logic to use stream.predEndPC and predBranchInfo, removed Fetch::needNewFTQEntry and Fetch::getNextFTQStartPC, and unified logging/messages.
Decoupled BPU / BTB refactor
src/cpu/pred/btb/decoupled_bpred.cc, src/cpu/pred/btb/decoupled_bpred.hh, src/cpu/pred/btb/decoupled_bpred_stats.cc
Replaced map-based FTQ handling with deque-backed FetchStream queue, added fsq head accessors, removed fetchTargetQueue interactions and FTQ helpers, adjusted squash/commit/update flows, and updated stats/tracing to use fsqId.
Removed FetchTargetQueue implementation & API
src/cpu/pred/btb/fetch_target_queue.cc, src/cpu/pred/btb/fetch_target_queue.hh
Deleted the FetchTargetQueue class and related types/APIs and all enqueue/supply/finish/squash/reset methods.
Tests removed / test SConscript updates
src/cpu/pred/btb/test/* (e.g., decoupled_bpred.cc, decoupled_bpred.hh, decoupled_bpred.test.cc, fetch_target_queue.test.cc, SConscript)
Removed BTB/FTQ unit tests and test scaffolding for decoupled_bpred and FetchTargetQueue; removed test entries from build scripts.
Documentation / design doc
docs/Gem5_Docs/frontend/fsq-only-bpu-plan.md
Added FSQ-only BTB/BPU migration plan describing phases, invariants, and rollout; planning-level content only.
Config parameter change
src/cpu/o3/BaseO3CPU.py
Increased fetchToDecodeDelay from 2 to 3 cycles.

Sequence Diagram(s)

sequenceDiagram
  participant Fetch as O3::Fetch
  participant BPU as DecoupledBPUWithBTB
  participant FSQ as FetchStreamQueue (FSQ)
  participant ICache as ICache

  Fetch->>BPU: request fsqHasHead / fsqHeadId
  BPU-->>FSQ: frontStreamId / fsqHead()
  FSQ-->>BPU: FetchStream (startPC, predEndPC, predTaken, predBranchInfo)
  BPU-->>Fetch: supplying startPC/target/predTaken
  Fetch->>ICache: issue I$ request aligned to FSQ startPC
  ICache-->>Fetch: return cacheline(s)
  Fetch->>Fetch: fill fetchBuffer, compute nextPC from FSQ stream
Loading

Estimated Code Review Effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested labels

perf

Suggested reviewers

  • tastynoob
  • Yakkhini

Poem

🐰
I nibble codes and hop on streams,
FTQ fades in my springtime dreams,
FSQ leads fetch with nimble feet,
Cleaner paths and fewer treats,
A carrot patch of code complete! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'cpu-o3: remove simple functions in fetch' is misleading. The PR primarily refactors fetch/BPU to remove the FTQ and adopt a single FSQ model with substantial changes across multiple files. Function removal is only one aspect of a much larger architectural change. Revise the title to reflect the main objective, such as 'cpu-o3: refactor fetch to remove FTQ and adopt FSQ-only model' or 'cpu-o3: migrate fetch from FTQ to FSQ-based head consumption'.
Docstring Coverage ⚠️ Warning Docstring coverage is 16.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Change-Id: Ifbe3eabd992b9682c7a0ed8c52f61c014ad219a7
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/cpu/o3/fetch.cc`:
- Around line 2058-2075: The early-return when fetchBuffer[tid].valid is true
must also verify the current PC is still inside the cached range; update
sendNextCacheRequest() to replace the unconditional "if (fetchBuffer[tid].valid)
return;" with a guarded check that fetchBuffer[tid].valid is true AND
pc_state.instAddr() is >= fetchBuffer[tid].startPC and <
(fetchBuffer[tid].startPC + fetchBufferSize) before returning. Use the existing
symbols fetchBuffer[tid].valid, fetchBuffer[tid].startPC, fetchBufferSize and
pc_state.instAddr() so that if the PC has advanced outside the range the
function falls through to fetchCacheLine(ftq_start_pc, tid, pc_state.instAddr())
(and the FTQ/ftq_entry logic) to reissue an I-cache request.
🧹 Nitpick comments (2)
src/cpu/o3/fetch.cc (2)

844-848: Remove duplicate branch-taken log.

Line 844 already logs the taken-branch target; the existing message at Lines 846-848 repeats the same content and doubles log volume. Consider keeping one.

🧹 Suggested cleanup
-    DPRINTF(Fetch, "[tid:%i] [sn:%llu] Branch at PC %#x "
-            "predicted to go to %s\n",
-            tid, inst->seqNum, inst->pcState().instAddr(), next_pc);

1799-1808: Consider checking interrupts before the early return.

Line 1801 returns before the interrupt check, so interrupt pending no longer short-circuits fetch prep here (relying on fetchCacheLine() later). If you intended to preserve the previous “interrupt blocks fetch” behavior/metrics, move the interrupt guard above the early return.

🛠️ Optional reorder to keep interrupt gating early
-        if (!macroop[tid] && !fetchBuffer[tid].valid) {
-            return true;
-        } else if (checkInterrupt(this_pc.instAddr()) && !delayedCommit[tid]) {
+        if (checkInterrupt(this_pc.instAddr()) && !delayedCommit[tid]) {
             // Stall CPU if an interrupt is posted
             ++fetchStats.miscStallCycles;
             DPRINTF(Fetch, "[tid:%i] Fetch is stalled!\n", tid);
             return false;
+        }
+        if (!macroop[tid] && !fetchBuffer[tid].valid) {
+            return true;
         }

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.1981 -
This PR 2.1981 ➡️ 0.0000 (0.00%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: c3832c2
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

PR Master Diff(%)
Score 17.65 17.64 +0.02 🟢

This commit introduces a significant refactor of the fetch mechanism by removing the Fetch Target Queue (FTQ) and transitioning to a Fetch Stream Queue (FSQ) model. The changes include:

- Deletion of the FTQ class and its associated dependencies.
- Direct consumption of FSQ entries by the fetch process, simplifying the fetch logic.
- Adjustments to maintain legacy ID compatibility while ensuring a single queue model.
- Code simplifications to reduce complexity and improve maintainability.

This refactor aligns the fetch implementation more closely with RTL design principles, enhancing performance and clarity in the fetch stage.

Change-Id: I7a8b312785a7cce526192de7125cea458b979c0e
Change-Id: Iba0f86103b735ce4394bb975c319b4360e1b3a9b
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/cpu/o3/fetch.cc`:
- Around line 836-840: There are duplicate DPRINTF debug statements in the fetch
stage that both log branch prediction info using the same vars (tid,
inst->seqNum, inst->pcState().instAddr(), next_pc); remove one of the redundant
DPRINTF calls (either the first or second block) in src/cpu/o3/fetch.cc so only
a single branch prediction log remains (keep the preferred message format and
delete the other), ensuring references to inst, tid, and next_pc remain correct.
♻️ Duplicate comments (1)
src/cpu/o3/fetch.cc (1)

2050-2067: FSQ-based cache request initiation looks correct.

The implementation correctly derives start_pc from fsqHead().startPC and initializes fetchBuffer[tid].startPC before issuing the cache request. The fsqEmpty() guard prevents access when no FSQ entry is available.

However, the concern about missing PC range validation at line 2050 (checking only fetchBuffer[tid].valid without verifying the current PC is within the buffer range) remains from a previous review.

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.1981 -
This PR 2.1727 📉 -0.0254 (-1.16%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 7a7f9e2
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

PR Master Diff(%)
Score 17.56 17.64 -0.46 🔴

[Generated by GEM5 Performance Robot]
commit: 7a7f9e2
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 17.56 17.65 -0.48 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants