cpu-o3: remove simple functions in fetch by jensen-yan · Pull Request #725 · OpenXiangShan/GEM5

jensen-yan · 2026-01-22T07:31:15Z

Change-Id: Ifbe3eabd992b9682c7a0ed8c52f61c014ad219a7

Summary by CodeRabbit

Refactor
- Streamlined instruction fetch and branch-prediction flow to a single stream-based path, simplifying fetch initiation and buffer alignment.
- Removed legacy helper utilities and consolidated fetch logic for improved maintainability.
Behavior Change
- Increased fetch-to-decode delay from 2 to 3 cycles.
Documentation
- Added a migration plan describing a stream-only fetch/branch-predict approach.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-22T07:31:43Z

📝 Walkthrough

Walkthrough

Fetch and BTB code were refactored to consume FSQ head entries instead of FTQ, removing the FetchTargetQueue and FTQ helpers, updating DecoupledBPUWithBTB to deque-backed FSQ, adjusting fetch/I-cache logic, tests, docs, and a small fetch-to-decode delay parameter change.

Changes

Cohort / File(s)	Summary
Fetch: FTQ → FSQ rewrite `src/cpu/o3/fetch.cc`, `src/cpu/o3/fetch.hh`, `src/cpu/o3/trace/TraceFetch.cc`	Replaced FTQ-oriented logic with FSQ-based accessors (fsqHasHead/fsqHead/fsqHeadId/fsqHeadFtqId), updated PC end/taken logic to use stream.predEndPC and predBranchInfo, removed Fetch::needNewFTQEntry and Fetch::getNextFTQStartPC, and unified logging/messages.
Decoupled BPU / BTB refactor `src/cpu/pred/btb/decoupled_bpred.cc`, `src/cpu/pred/btb/decoupled_bpred.hh`, `src/cpu/pred/btb/decoupled_bpred_stats.cc`	Replaced map-based FTQ handling with deque-backed FetchStream queue, added fsq head accessors, removed fetchTargetQueue interactions and FTQ helpers, adjusted squash/commit/update flows, and updated stats/tracing to use fsqId.
Removed FetchTargetQueue implementation & API `src/cpu/pred/btb/fetch_target_queue.cc`, `src/cpu/pred/btb/fetch_target_queue.hh`	Deleted the FetchTargetQueue class and related types/APIs and all enqueue/supply/finish/squash/reset methods.
Tests removed / test SConscript updates `src/cpu/pred/btb/test/*` (e.g., `decoupled_bpred.cc`, `decoupled_bpred.hh`, `decoupled_bpred.test.cc`, `fetch_target_queue.test.cc`, `SConscript`)	Removed BTB/FTQ unit tests and test scaffolding for decoupled_bpred and FetchTargetQueue; removed test entries from build scripts.
Documentation / design doc `docs/Gem5_Docs/frontend/fsq-only-bpu-plan.md`	Added FSQ-only BTB/BPU migration plan describing phases, invariants, and rollout; planning-level content only.
Config parameter change `src/cpu/o3/BaseO3CPU.py`	Increased `fetchToDecodeDelay` from 2 to 3 cycles.

Sequence Diagram(s)

sequenceDiagram
  participant Fetch as O3::Fetch
  participant BPU as DecoupledBPUWithBTB
  participant FSQ as FetchStreamQueue (FSQ)
  participant ICache as ICache

  Fetch->>BPU: request fsqHasHead / fsqHeadId
  BPU-->>FSQ: frontStreamId / fsqHead()
  FSQ-->>BPU: FetchStream (startPC, predEndPC, predTaken, predBranchInfo)
  BPU-->>Fetch: supplying startPC/target/predTaken
  Fetch->>ICache: issue I$ request aligned to FSQ startPC
  ICache-->>Fetch: return cacheline(s)
  Fetch->>Fetch: fill fetchBuffer, compute nextPC from FSQ stream

Estimated Code Review Effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

cpu-o3: simplify fetch， only support decoupled BTB mode #721 — Similar FSQ/FTQ refactor migrating fetch and predictor flow to FSQ head semantics.
cpu-o3: update fetch to decode delay parameters 4 -> 2 #676 — Also modifies the fetch-to-decode delay parameter in BaseO3CPU.
Sc ut #710 — Overlapping changes in BTB/prediction interfaces and history handling.

Suggested labels

perf

Suggested reviewers

tastynoob
Yakkhini

Poem

🐰
I nibble codes and hop on streams,
FTQ fades in my springtime dreams,
FSQ leads fetch with nimble feet,
Cleaner paths and fewer treats,
A carrot patch of code complete! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title 'cpu-o3: remove simple functions in fetch' is misleading. The PR primarily refactors fetch/BPU to remove the FTQ and adopt a single FSQ model with substantial changes across multiple files. Function removal is only one aspect of a much larger architectural change.	Revise the title to reflect the main objective, such as 'cpu-o3: refactor fetch to remove FTQ and adopt FSQ-only model' or 'cpu-o3: migrate fetch from FTQ to FSQ-based head consumption'.
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Change-Id: Ifbe3eabd992b9682c7a0ed8c52f61c014ad219a7

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/cpu/o3/fetch.cc`:
- Around line 2058-2075: The early-return when fetchBuffer[tid].valid is true
must also verify the current PC is still inside the cached range; update
sendNextCacheRequest() to replace the unconditional "if (fetchBuffer[tid].valid)
return;" with a guarded check that fetchBuffer[tid].valid is true AND
pc_state.instAddr() is >= fetchBuffer[tid].startPC and <
(fetchBuffer[tid].startPC + fetchBufferSize) before returning. Use the existing
symbols fetchBuffer[tid].valid, fetchBuffer[tid].startPC, fetchBufferSize and
pc_state.instAddr() so that if the PC has advanced outside the range the
function falls through to fetchCacheLine(ftq_start_pc, tid, pc_state.instAddr())
(and the FTQ/ftq_entry logic) to reissue an I-cache request.

🧹 Nitpick comments (2)

src/cpu/o3/fetch.cc (2)
844-848: Remove duplicate branch-taken log.

Line 844 already logs the taken-branch target; the existing message at Lines 846-848 repeats the same content and doubles log volume. Consider keeping one.
🧹 Suggested cleanup
-    DPRINTF(Fetch, "[tid:%i] [sn:%llu] Branch at PC %#x "
-            "predicted to go to %s\n",
-            tid, inst->seqNum, inst->pcState().instAddr(), next_pc);
1799-1808: Consider checking interrupts before the early return.

Line 1801 returns before the interrupt check, so interrupt pending no longer short-circuits fetch prep here (relying on fetchCacheLine() later). If you intended to preserve the previous “interrupt blocks fetch” behavior/metrics, move the interrupt guard above the early return.
🛠️ Optional reorder to keep interrupt gating early
-        if (!macroop[tid] && !fetchBuffer[tid].valid) {
-            return true;
-        } else if (checkInterrupt(this_pc.instAddr()) && !delayedCommit[tid]) {
+        if (checkInterrupt(this_pc.instAddr()) && !delayedCommit[tid]) {
             // Stall CPU if an interrupt is posted
             ++fetchStats.miscStallCycles;
             DPRINTF(Fetch, "[tid:%i] Fetch is stalled!\n", tid);
             return false;
+        }
+        if (!macroop[tid] && !fetchBuffer[tid].valid) {
+            return true;
         }

src/cpu/o3/fetch.cc

github-actions · 2026-01-22T07:52:05Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1981`	-
This PR	`2.1981`	➡️ `0.0000` (`0.00%`)

✅ Difftest smoke test passed!

XiangShanRobot · 2026-01-22T08:26:14Z

[Generated by GEM5 Performance Robot]
commit: c3832c2
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	17.65	17.64	+0.02 🟢

This commit introduces a significant refactor of the fetch mechanism by removing the Fetch Target Queue (FTQ) and transitioning to a Fetch Stream Queue (FSQ) model. The changes include: - Deletion of the FTQ class and its associated dependencies. - Direct consumption of FSQ entries by the fetch process, simplifying the fetch logic. - Adjustments to maintain legacy ID compatibility while ensuring a single queue model. - Code simplifications to reduce complexity and improve maintainability. This refactor aligns the fetch implementation more closely with RTL design principles, enhancing performance and clarity in the fetch stage. Change-Id: I7a8b312785a7cce526192de7125cea458b979c0e

Change-Id: Iba0f86103b735ce4394bb975c319b4360e1b3a9b

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/cpu/o3/fetch.cc`:
- Around line 836-840: There are duplicate DPRINTF debug statements in the fetch
stage that both log branch prediction info using the same vars (tid,
inst->seqNum, inst->pcState().instAddr(), next_pc); remove one of the redundant
DPRINTF calls (either the first or second block) in src/cpu/o3/fetch.cc so only
a single branch prediction log remains (keep the preferred message format and
delete the other), ensuring references to inst, tid, and next_pc remain correct.

♻️ Duplicate comments (1)

src/cpu/o3/fetch.cc (1)

2050-2067: FSQ-based cache request initiation looks correct.

The implementation correctly derives start_pc from fsqHead().startPC and initializes fetchBuffer[tid].startPC before issuing the cache request. The fsqEmpty() guard prevents access when no FSQ entry is available.

However, the concern about missing PC range validation at line 2050 (checking only fetchBuffer[tid].valid without verifying the current PC is within the buffer range) remains from a previous review.

src/cpu/o3/fetch.cc

github-actions · 2026-01-22T10:54:30Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1981`	-
This PR	`2.1727`	📉 `-0.0254` (`-1.16%`)

✅ Difftest smoke test passed!

XiangShanRobot · 2026-01-22T11:37:49Z

[Generated by GEM5 Performance Robot]
commit: 7a7f9e2
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	17.56	17.64	-0.46 🔴

[Generated by GEM5 Performance Robot]
commit: 7a7f9e2
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

	PR	Previous Commit	Diff(%)
Score	17.56	17.65	-0.48 🔴

cpu-o3: remove simple functions in fetch

c3832c2

Change-Id: Ifbe3eabd992b9682c7a0ed8c52f61c014ad219a7

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

src/cpu/o3/fetch.cc Outdated Show resolved Hide resolved

jensen-yan added 2 commits January 22, 2026 18:45

cpu-o3: since remove ftq, we need to add 1 cycle flush penalty

7a7f9e2

Change-Id: Iba0f86103b735ce4394bb975c319b4360e1b3a9b

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

src/cpu/o3/fetch.cc Show resolved Hide resolved

Yakkhini approved these changes Jan 23, 2026

View reviewed changes

jensen-yan merged commit 9dd5634 into xs-dev Jan 23, 2026
3 checks passed

jensen-yan deleted the simplify-fetch2-align branch January 23, 2026 06:50

coderabbitai bot mentioned this pull request Feb 4, 2026

Ideal 2-Taken & 2-Fetch #736

Closed

coderabbitai bot mentioned this pull request Mar 22, 2026

cpu,arch-riscv,cpu-o3,bpu: align control-PC semantics, fetch coverage, and owner migration #805

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu-o3: remove simple functions in fetch#725

cpu-o3: remove simple functions in fetch#725
jensen-yan merged 3 commits intoxs-devfrom
simplify-fetch2-align

jensen-yan commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 22, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

XiangShanRobot commented Jan 22, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

XiangShanRobot commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jensen-yan commented Jan 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jan 22, 2026

🚀 Coremark Smoke Test Results

Uh oh!

XiangShanRobot commented Jan 22, 2026

Align BTB Performance

Overall Score

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jan 22, 2026

🚀 Coremark Smoke Test Results

Uh oh!

XiangShanRobot commented Jan 22, 2026

Align BTB Performance

Overall Score

Align BTB Performance

Overall Score

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jensen-yan commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 22, 2026 •

edited

Loading