cpu-o3: move resolve queue merging from iew to fetch stage by jensen-yan · Pull Request #635 · OpenXiangShan/GEM5

jensen-yan · 2025-12-04T10:45:55Z

Change-Id: Iaba29431fecd59d250c48cc566cb9b18140c5098

Summary by CodeRabbit

Refactor
- Restructured how resolved control‑flow information is communicated between execution and fetch stages; the fetch stage now manages a buffered resolve queue with merging and capacity handling.
Monitoring
- Added frontend metrics to track resolve‑queue fullness and enqueue events for performance analysis.
Chores
- Default config: disabled bank‑conflict handling for a specific branch predictor setup.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Change-Id: Iaba29431fecd59d250c48cc566cb9b18140c5098

coderabbitai · 2025-12-04T10:46:12Z

Walkthrough

Transfers resolve-queue ownership and runtime buffering from IEW to Fetch, replaces the per-stage resolveQueue with a cycle-produced resolvedCFIs list (fsqId, pc) in the communication struct, and updates Fetch to buffer, merge, and capacity-check incoming resolved control-flow entries.

Changes

Cohort / File(s)	Summary
Communication structure `src/cpu/o3/comm.hh`	Adds nested type `TimeStruct::IewComm::ResolvedCFIEntry { uint64_t fsqId; uint64_t pc; }` and member `resolvedCFIs` (vector). Removes `resolveQueue`.
Fetch stage (interface + impl) `src/cpu/o3/fetch.hh`, `src/cpu/o3/fetch.cc`	Adds `resolveQueueSize` const, `std::deque<ResolveQueueEntry> resolveQueue`, and three stats (`resolveQueueFullCycles`, `resolveQueueFullEvents`, `resolveEnqueueFailEvent`). Integrates incoming `resolvedCFIs` into `resolveQueue` with merge-on-fsqId, capacity checks, enqueue-fail accounting, and deque pop when front entry is completed.
IEW stage (interface + impl) `src/cpu/o3/iew.hh`, `src/cpu/o3/iew.cc`	Removes `resolveQueueSize`, `resolveQueue`, and related IEW stats. Replaces resolve-queue population/propagation with direct appends to `toFetch->iewInfo[tid].resolvedCFIs` and clears `resolvedCFIs` where appropriate.
Config `configs/example/kmhv3.py`	Sets `cpu.branchPred.tage.enableBankConflict = False` for `DecoupledBPUWithBTB` in KMHv3 config.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant IEW as IEW
participant Comm as TimeStruct::IewComm
participant Fetch as Fetch (resolveQueue)
participant BPU as Branch Predictor (BPU)
Note over IEW,Comm: Per-cycle resolution production
IEW->>Comm: append ResolvedCFIEntry(fsqId, pc) to resolvedCFIs
Note over Comm,Fetch: At the same cycle boundary
Fetch->>Comm: read resolvedCFIs
alt existing resolveQueue entry with same fsqId
Fetch->>Fetch: merge PC into existing ResolveQueueEntry.resolvedInstPC
else no merge and space available
Fetch->>Fetch: enqueue new ResolveQueueEntry (bounded by resolveQueueSize)
else no merge and full
Fetch->>Fetch: increment resolveQueueFullEvents / resolveEnqueueFailEvent
end
Note over Fetch,BPU: When front entry completes
Fetch->>BPU: train using ResolveQueueEntry.resolvedInstPC
Fetch->>Fetch: pop front ResolveQueueEntry if processed
Note over IEW,Fetch: resolvedCFIs cleared after consumption

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pay attention to fetch.cc: merging logic for same fsqId, capacity checks, and correct lifecycle (enqueue/pop).
Verify comm.hh: ResolvedCFIEntry layout and thread-safety/usage across cycles.
Check IEW changes: all previous resolveQueue references replaced and no dangling uses remain.

Possibly related PRs

Resolve Queue Alignment in Kunminghu-v3 #626 — Modifies IEW resolve-queue and statistics; directly related to resolve-queue ownership decisions.
Resolve align #621 — Also touches resolve-queue handling and inter-stage resolve propagation; connected to alternative implementations.

Suggested labels

perf, align-kmhv3

Suggested reviewers

Yakkhini
tastynoob

Poem

🐰
I hopped through code with tiny paws,
Swapped queues and tiny metal claws,
Resolved PCs now travel light,
From IEW to Fetch each cycle's flight,
A rabbit cheers for pipeline laws!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately and clearly describes the main change: moving resolve queue merging logic from the IEW stage to the Fetch stage, which is reflected across all modified files.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch resolveQueue-align

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/cpu/o3/fetch.cc (2)
1535-1535: Use const reference to avoid copying.

The loop variable resolvedInstPC should be a const reference to avoid unnecessary copying of uint64_t values. While the performance impact is minimal for uint64_t, using references is more consistent with the pattern used on line 1506 (const auto &resolved).
-        for (const auto resolvedInstPC : entry.resolvedInstPC) {
+        for (const auto& resolvedInstPC : entry.resolvedInstPC) {
253-259: Consider adding .prereq() for new statistics.

The new statistics (resolveQueueFullCycles, resolveQueueFullEvents, resolveEnqueueFailEvent) are added but don't have .prereq() calls like other statistics in this file (e.g., lines 261-296). Adding .prereq() ensures they only appear in output when non-zero, which is consistent with the existing pattern.

Add after line 328:
        resolveQueueFullCycles
            .prereq(resolveQueueFullCycles);
        resolveQueueFullEvents
            .prereq(resolveQueueFullEvents);
        resolveEnqueueFailEvent
            .prereq(resolveEnqueueFailEvent);

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71f89c1 and b9bb7db.

📒 Files selected for processing (5)

src/cpu/o3/comm.hh (1 hunks)
src/cpu/o3/fetch.cc (3 hunks)
src/cpu/o3/fetch.hh (3 hunks)
src/cpu/o3/iew.cc (2 hunks)
src/cpu/o3/iew.hh (0 hunks)

💤 Files with no reviewable changes (1)

src/cpu/o3/iew.hh

🧰 Additional context used

🧬 Code graph analysis (2)

src/cpu/o3/iew.cc (1)

src/cpu/o3/iew.hh (16)

tid (173-173)

tid (200-200)

tid (226-226)

tid (258-258)

tid (258-258)

tid (265-265)

tid (265-265)

tid (287-290)

tid (287-287)

tid (315-315)

tid (320-320)

tid (323-323)

tid (326-326)

tid (328-328)

tid (334-334)

tid (335-335)

src/cpu/o3/fetch.cc (1)

src/cpu/pred/btb/decoupled_bpred.hh (3)

stream_id (818-818)

stream_id (819-819)

stream_id (820-820)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Quick Build, Unit Tests & Smoke Test
GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

🔇 Additional comments (5)

src/cpu/o3/fetch.hh (1)

616-620: LGTM! Resolve queue member declarations are appropriate.

The new resolveQueueSize constant and resolveQueue deque are properly declared. The use of std::deque is a good choice as it supports efficient front removal and back insertion needed for FIFO queue semantics.

src/cpu/o3/comm.hh (1)

251-257: LGTM! New ResolvedCFIEntry structure is well-designed.

The structure is clean and provides the necessary fields (fsqId and pc) for communicating resolved control-flow information from IEW to Fetch. Using a std::vector is appropriate for the per-cycle batch of resolved CFIs.
src/cpu/o3/fetch.cc (1)
1504-1528: Potential issue: Zero resolveQueueSize disables new entry creation entirely.

When resolveQueueSize is 0, the condition on line 1520 (resolveQueueSize && resolveQueue.size() >= resolveQueueSize) will short-circuit and never be true. However, this also means entries will still be merged if they match an existing fsqId on lines 1508-1513, but since no new entries can be created (the else branch on line 1520 is always taken when size is 0), the queue will always be empty after startup.

If resolveQueueSize == 0 is intended to mean "unlimited", the condition should be:
-        if (resolveQueueSize && resolveQueue.size() >= resolveQueueSize) {
+        if (resolveQueueSize > 0 && resolveQueue.size() >= resolveQueueSize) {
Or if resolveQueueSize == 0 means "disabled", this should be documented. Please clarify the intended behavior.
src/cpu/o3/iew.cc (2)

1567-1573: Clarify: All executed instructions now produce CFI entries, not just control-flow.

The if (inst->isControl()) condition is commented out, meaning every executed instruction will produce a ResolvedCFIEntry. This changes the semantics from "resolved control-flow instructions" to "all resolved instructions."

If this is intentional (e.g., to track all instructions for some purpose), the comment // if (inst->isControl()) should be removed and the member name resolvedCFIs might be misleading. If only control-flow instructions should produce entries, the condition should be uncommented.

Is this change intentional? If so, consider:

Removing the commented-out condition

Potentially renaming resolvedCFIs to resolvedInsts if it now tracks all instructions

1670-1670: LGTM! Proper clearing of resolvedCFIs at cycle start.

Clearing resolvedCFIs at the beginning of executeInsts() ensures each cycle starts fresh, and the vector is populated with only the current cycle's resolved instructions.

src/cpu/o3/fetch.cc

src/cpu/o3/fetch.hh

github-actions · 2025-12-04T10:55:43Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`1.7830`	-
This PR	`1.7908`	📈 `+0.0079` (`+0.44%`)

✅ Difftest smoke test passed!

Change-Id: Ib5159561facea8cf1e8fc4d185a8ba8772a06045

Change-Id: I12bf9bddd5ced6bb4574347ef7e7392e23c0b558

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

configs/example/kmhv3.py (1)

93-100: Clarify why TAGE bank conflicts are disabled for this config

The new cpu.branchPred.tage.enableBankConflict = False is straightforward, but the reason for force‑disabling bank‑conflict handling isn’t obvious from context (especially with other bank‑related knobs in this script). Consider adding a short comment explaining whether this is for performance modeling, to match RTL, or to avoid double‑counting conflicts, so future changes to the BPU don’t “fix” this by accident.
src/cpu/o3/iew.cc (1)
1567-1572: ResolvedCFIEntry population: confirm pc semantics and consider emplace_back.

The control‑flow logging looks reasonable, but two points to double‑check:

If ResolvedCFIEntry::pc is meant to hold the resolved next/fall‑through PC rather than the branch’s own PC, please confirm that inst->getPC() is the correct value to store here (vs e.g. a post‑execute PC state or target field from pcState()). This depends on how Fetch consumes ResolvedCFIEntry::pc and isn’t obvious from this file alone.

Minor: you can avoid the temporary entry by using emplace_back, which is a bit cleaner and may be marginally cheaper:
-    auto &resolved_cfis = toFetch->iewInfo[tid].resolvedCFIs;
-    TimeStruct::IewComm::ResolvedCFIEntry entry;
-    entry.fsqId = inst->getFsqId();
-    entry.pc = inst->getPC();
-    resolved_cfis.push_back(entry);
+    auto &resolved_cfis = toFetch->iewInfo[tid].resolvedCFIs;
+    resolved_cfis.emplace_back(TimeStruct::IewComm::ResolvedCFIEntry{
+        .fsqId = inst->getFsqId(),
+        .pc = inst->getPC(),
+    });

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9bb7db and 3d477e2.

📒 Files selected for processing (2)

configs/example/kmhv3.py (1 hunks)
src/cpu/o3/iew.cc (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

configs/example/kmhv3.py (1)

src/cpu/pred/btb/test/btb_tage.test.cc (1)

tage (276-282)

src/cpu/o3/iew.cc (1)

src/cpu/o3/iew.hh (16)

inst (203-203)

inst (208-208)

inst (211-211)

inst (214-214)

inst (217-217)

inst (223-223)

inst (268-268)

inst (271-271)

inst (271-271)

inst (280-280)

inst (293-293)

inst (295-295)

inst (307-307)

inst (312-312)

inst (370-370)

inst (596-596)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Quick Build, Unit Tests & Smoke Test
GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

src/cpu/o3/iew.cc

github-actions · 2025-12-04T11:04:12Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`1.7830`	-
This PR	`1.8593`	📈 `+0.0763` (`+4.28%`)

✅ Difftest smoke test passed!

github-actions · 2025-12-04T11:06:14Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`1.7830`	-
This PR	`1.8650`	📈 `+0.0821` (`+4.60%`)

✅ Difftest smoke test passed!

XiangShanRobot · 2025-12-04T13:07:00Z

[Generated by GEM5 Performance Robot]
commit: 3d477e2
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	15.00	19.83	-24.34 🔴

This reverts commit 93c3177.

github-actions · 2025-12-05T02:31:51Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`1.7830`	-
This PR	`1.6594`	📉 `-0.1236` (`-6.93%`)

✅ Difftest smoke test passed!

XiangShanRobot · 2025-12-05T03:25:39Z

[Generated by GEM5 Performance Robot]
commit: fe334d6
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	14.56	19.83	-26.54 🔴

[Generated by GEM5 Performance Robot]
commit: fe334d6
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

	PR	Previous Commit	Diff(%)
Score	14.56	15.00	-2.91 🔴

cpu-o3: move resolve queue merging from iew to fetch stage

b9bb7db

Change-Id: Iaba29431fecd59d250c48cc566cb9b18140c5098

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

src/cpu/o3/fetch.cc Show resolved Hide resolved

src/cpu/o3/fetch.hh Show resolved Hide resolved

jensen-yan added 2 commits December 4, 2025 18:55

cpu-o3: test no bankconflict

93c3177

Change-Id: Ib5159561facea8cf1e8fc4d185a8ba8772a06045

cpu-o3: test only send control branches

3d477e2

Change-Id: I12bf9bddd5ced6bb4574347ef7e7392e23c0b558

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

src/cpu/o3/iew.cc Show resolved Hide resolved

Revert "cpu-o3: test no bankconflict"

fe334d6

This reverts commit 93c3177.

Yakkhini self-assigned this Dec 5, 2025

CJ362ff self-requested a review December 5, 2025 03:45

Yakkhini approved these changes Dec 5, 2025

View reviewed changes

jensen-yan merged commit 840b7ce into xs-dev Dec 9, 2025
3 checks passed

jensen-yan deleted the resolveQueue-align branch December 9, 2025 08:46

coderabbitai bot mentioned this pull request Dec 10, 2025

align RTL: tage window method, block resolveQ if update failed due to bank conflict. #644

Merged

This was referenced Dec 19, 2025

cpu-o3: add more performance counter in resolve queue #666

Merged

cpu-o3: 2 entry size resolve queue for evaluation #670

Closed

coderabbitai bot mentioned this pull request Feb 28, 2026

cpu-o3: using reverse ordered tick & refactor the stalls logic #756

Merged

coderabbitai bot mentioned this pull request Apr 2, 2026

Full resolve train #810

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu-o3: move resolve queue merging from iew to fetch stage#635

cpu-o3: move resolve queue merging from iew to fetch stage#635
jensen-yan merged 4 commits intoxs-devfrom
resolveQueue-align

jensen-yan commented Dec 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

XiangShanRobot commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

XiangShanRobot commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jensen-yan commented Dec 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Uh oh!

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Uh oh!

XiangShanRobot commented Dec 4, 2025

Ideal BTB Performance

Overall Score

Uh oh!

github-actions bot commented Dec 5, 2025

🚀 Coremark Smoke Test Results

Uh oh!

XiangShanRobot commented Dec 5, 2025

Ideal BTB Performance

Overall Score

Ideal BTB Performance

Overall Score

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jensen-yan commented Dec 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading