Skip to content

cpu-o3: move resolve queue merging from iew to fetch stage#635

Merged
jensen-yan merged 4 commits intoxs-devfrom
resolveQueue-align
Dec 9, 2025
Merged

cpu-o3: move resolve queue merging from iew to fetch stage#635
jensen-yan merged 4 commits intoxs-devfrom
resolveQueue-align

Conversation

@jensen-yan
Copy link
Copy Markdown
Collaborator

@jensen-yan jensen-yan commented Dec 4, 2025

Change-Id: Iaba29431fecd59d250c48cc566cb9b18140c5098

Summary by CodeRabbit

  • Refactor

    • Restructured how resolved control‑flow information is communicated between execution and fetch stages; the fetch stage now manages a buffered resolve queue with merging and capacity handling.
  • Monitoring

    • Added frontend metrics to track resolve‑queue fullness and enqueue events for performance analysis.
  • Chores

    • Default config: disabled bank‑conflict handling for a specific branch predictor setup.

✏️ Tip: You can customize this high-level summary in your review settings.

Change-Id: Iaba29431fecd59d250c48cc566cb9b18140c5098
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 4, 2025

Walkthrough

Transfers resolve-queue ownership and runtime buffering from IEW to Fetch, replaces the per-stage resolveQueue with a cycle-produced resolvedCFIs list (fsqId, pc) in the communication struct, and updates Fetch to buffer, merge, and capacity-check incoming resolved control-flow entries.

Changes

Cohort / File(s) Summary
Communication structure
src/cpu/o3/comm.hh
Adds nested type TimeStruct::IewComm::ResolvedCFIEntry { uint64_t fsqId; uint64_t pc; } and member resolvedCFIs (vector). Removes resolveQueue.
Fetch stage (interface + impl)
src/cpu/o3/fetch.hh, src/cpu/o3/fetch.cc
Adds resolveQueueSize const, std::deque<ResolveQueueEntry> resolveQueue, and three stats (resolveQueueFullCycles, resolveQueueFullEvents, resolveEnqueueFailEvent). Integrates incoming resolvedCFIs into resolveQueue with merge-on-fsqId, capacity checks, enqueue-fail accounting, and deque pop when front entry is completed.
IEW stage (interface + impl)
src/cpu/o3/iew.hh, src/cpu/o3/iew.cc
Removes resolveQueueSize, resolveQueue, and related IEW stats. Replaces resolve-queue population/propagation with direct appends to toFetch->iewInfo[tid].resolvedCFIs and clears resolvedCFIs where appropriate.
Config
configs/example/kmhv3.py
Sets cpu.branchPred.tage.enableBankConflict = False for DecoupledBPUWithBTB in KMHv3 config.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant IEW as IEW
participant Comm as TimeStruct::IewComm
participant Fetch as Fetch (resolveQueue)
participant BPU as Branch Predictor (BPU)
Note over IEW,Comm: Per-cycle resolution production
IEW->>Comm: append ResolvedCFIEntry(fsqId, pc) to resolvedCFIs
Note over Comm,Fetch: At the same cycle boundary
Fetch->>Comm: read resolvedCFIs
alt existing resolveQueue entry with same fsqId
Fetch->>Fetch: merge PC into existing ResolveQueueEntry.resolvedInstPC
else no merge and space available
Fetch->>Fetch: enqueue new ResolveQueueEntry (bounded by resolveQueueSize)
else no merge and full
Fetch->>Fetch: increment resolveQueueFullEvents / resolveEnqueueFailEvent
end
Note over Fetch,BPU: When front entry completes
Fetch->>BPU: train using ResolveQueueEntry.resolvedInstPC
Fetch->>Fetch: pop front ResolveQueueEntry if processed
Note over IEW,Fetch: resolvedCFIs cleared after consumption

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay attention to fetch.cc: merging logic for same fsqId, capacity checks, and correct lifecycle (enqueue/pop).
  • Verify comm.hh: ResolvedCFIEntry layout and thread-safety/usage across cycles.
  • Check IEW changes: all previous resolveQueue references replaced and no dangling uses remain.

Possibly related PRs

Suggested labels

perf, align-kmhv3

Suggested reviewers

  • Yakkhini
  • tastynoob

Poem

🐰
I hopped through code with tiny paws,
Swapped queues and tiny metal claws,
Resolved PCs now travel light,
From IEW to Fetch each cycle's flight,
A rabbit cheers for pipeline laws!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and clearly describes the main change: moving resolve queue merging logic from the IEW stage to the Fetch stage, which is reflected across all modified files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch resolveQueue-align

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/cpu/o3/fetch.cc (2)

1535-1535: Use const reference to avoid copying.

The loop variable resolvedInstPC should be a const reference to avoid unnecessary copying of uint64_t values. While the performance impact is minimal for uint64_t, using references is more consistent with the pattern used on line 1506 (const auto &resolved).

-        for (const auto resolvedInstPC : entry.resolvedInstPC) {
+        for (const auto& resolvedInstPC : entry.resolvedInstPC) {

253-259: Consider adding .prereq() for new statistics.

The new statistics (resolveQueueFullCycles, resolveQueueFullEvents, resolveEnqueueFailEvent) are added but don't have .prereq() calls like other statistics in this file (e.g., lines 261-296). Adding .prereq() ensures they only appear in output when non-zero, which is consistent with the existing pattern.

Add after line 328:

        resolveQueueFullCycles
            .prereq(resolveQueueFullCycles);
        resolveQueueFullEvents
            .prereq(resolveQueueFullEvents);
        resolveEnqueueFailEvent
            .prereq(resolveEnqueueFailEvent);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71f89c1 and b9bb7db.

📒 Files selected for processing (5)
  • src/cpu/o3/comm.hh (1 hunks)
  • src/cpu/o3/fetch.cc (3 hunks)
  • src/cpu/o3/fetch.hh (3 hunks)
  • src/cpu/o3/iew.cc (2 hunks)
  • src/cpu/o3/iew.hh (0 hunks)
💤 Files with no reviewable changes (1)
  • src/cpu/o3/iew.hh
🧰 Additional context used
🧬 Code graph analysis (2)
src/cpu/o3/iew.cc (1)
src/cpu/o3/iew.hh (16)
  • tid (173-173)
  • tid (200-200)
  • tid (226-226)
  • tid (258-258)
  • tid (258-258)
  • tid (265-265)
  • tid (265-265)
  • tid (287-290)
  • tid (287-287)
  • tid (315-315)
  • tid (320-320)
  • tid (323-323)
  • tid (326-326)
  • tid (328-328)
  • tid (334-334)
  • tid (335-335)
src/cpu/o3/fetch.cc (1)
src/cpu/pred/btb/decoupled_bpred.hh (3)
  • stream_id (818-818)
  • stream_id (819-819)
  • stream_id (820-820)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (5)
src/cpu/o3/fetch.hh (1)

616-620: LGTM! Resolve queue member declarations are appropriate.

The new resolveQueueSize constant and resolveQueue deque are properly declared. The use of std::deque is a good choice as it supports efficient front removal and back insertion needed for FIFO queue semantics.

src/cpu/o3/comm.hh (1)

251-257: LGTM! New ResolvedCFIEntry structure is well-designed.

The structure is clean and provides the necessary fields (fsqId and pc) for communicating resolved control-flow information from IEW to Fetch. Using a std::vector is appropriate for the per-cycle batch of resolved CFIs.

src/cpu/o3/fetch.cc (1)

1504-1528: Potential issue: Zero resolveQueueSize disables new entry creation entirely.

When resolveQueueSize is 0, the condition on line 1520 (resolveQueueSize && resolveQueue.size() >= resolveQueueSize) will short-circuit and never be true. However, this also means entries will still be merged if they match an existing fsqId on lines 1508-1513, but since no new entries can be created (the else branch on line 1520 is always taken when size is 0), the queue will always be empty after startup.

If resolveQueueSize == 0 is intended to mean "unlimited", the condition should be:

-        if (resolveQueueSize && resolveQueue.size() >= resolveQueueSize) {
+        if (resolveQueueSize > 0 && resolveQueue.size() >= resolveQueueSize) {

Or if resolveQueueSize == 0 means "disabled", this should be documented. Please clarify the intended behavior.

src/cpu/o3/iew.cc (2)

1567-1573: Clarify: All executed instructions now produce CFI entries, not just control-flow.

The if (inst->isControl()) condition is commented out, meaning every executed instruction will produce a ResolvedCFIEntry. This changes the semantics from "resolved control-flow instructions" to "all resolved instructions."

If this is intentional (e.g., to track all instructions for some purpose), the comment // if (inst->isControl()) should be removed and the member name resolvedCFIs might be misleading. If only control-flow instructions should produce entries, the condition should be uncommented.

Is this change intentional? If so, consider:

  1. Removing the commented-out condition
  2. Potentially renaming resolvedCFIs to resolvedInsts if it now tracks all instructions

1670-1670: LGTM! Proper clearing of resolvedCFIs at cycle start.

Clearing resolvedCFIs at the beginning of executeInsts() ensures each cycle starts fresh, and the vector is populated with only the current cycle's resolved instructions.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.7908 📈 +0.0079 (+0.44%)

✅ Difftest smoke test passed!

Change-Id: Ib5159561facea8cf1e8fc4d185a8ba8772a06045
Change-Id: I12bf9bddd5ced6bb4574347ef7e7392e23c0b558
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
configs/example/kmhv3.py (1)

93-100: Clarify why TAGE bank conflicts are disabled for this config

The new cpu.branchPred.tage.enableBankConflict = False is straightforward, but the reason for force‑disabling bank‑conflict handling isn’t obvious from context (especially with other bank‑related knobs in this script). Consider adding a short comment explaining whether this is for performance modeling, to match RTL, or to avoid double‑counting conflicts, so future changes to the BPU don’t “fix” this by accident.

src/cpu/o3/iew.cc (1)

1567-1572: ResolvedCFIEntry population: confirm pc semantics and consider emplace_back.

The control‑flow logging looks reasonable, but two points to double‑check:

  • If ResolvedCFIEntry::pc is meant to hold the resolved next/fall‑through PC rather than the branch’s own PC, please confirm that inst->getPC() is the correct value to store here (vs e.g. a post‑execute PC state or target field from pcState()). This depends on how Fetch consumes ResolvedCFIEntry::pc and isn’t obvious from this file alone.
  • Minor: you can avoid the temporary entry by using emplace_back, which is a bit cleaner and may be marginally cheaper:
-    auto &resolved_cfis = toFetch->iewInfo[tid].resolvedCFIs;
-    TimeStruct::IewComm::ResolvedCFIEntry entry;
-    entry.fsqId = inst->getFsqId();
-    entry.pc = inst->getPC();
-    resolved_cfis.push_back(entry);
+    auto &resolved_cfis = toFetch->iewInfo[tid].resolvedCFIs;
+    resolved_cfis.emplace_back(TimeStruct::IewComm::ResolvedCFIEntry{
+        .fsqId = inst->getFsqId(),
+        .pc = inst->getPC(),
+    });
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9bb7db and 3d477e2.

📒 Files selected for processing (2)
  • configs/example/kmhv3.py (1 hunks)
  • src/cpu/o3/iew.cc (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
configs/example/kmhv3.py (1)
src/cpu/pred/btb/test/btb_tage.test.cc (1)
  • tage (276-282)
src/cpu/o3/iew.cc (1)
src/cpu/o3/iew.hh (16)
  • inst (203-203)
  • inst (208-208)
  • inst (211-211)
  • inst (214-214)
  • inst (217-217)
  • inst (223-223)
  • inst (268-268)
  • inst (271-271)
  • inst (271-271)
  • inst (280-280)
  • inst (293-293)
  • inst (295-295)
  • inst (307-307)
  • inst (312-312)
  • inst (370-370)
  • inst (596-596)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.8593 📈 +0.0763 (+4.28%)

✅ Difftest smoke test passed!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.8650 📈 +0.0821 (+4.60%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 3d477e2
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.00 19.83 -24.34 🔴

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 5, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7830 -
This PR 1.6594 📉 -0.1236 (-6.93%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: fe334d6
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 14.56 19.83 -26.54 🔴

[Generated by GEM5 Performance Robot]
commit: fe334d6
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 14.56 15.00 -2.91 🔴

@Yakkhini Yakkhini self-assigned this Dec 5, 2025
@CJ362ff CJ362ff self-requested a review December 5, 2025 03:45
@jensen-yan jensen-yan merged commit 840b7ce into xs-dev Dec 9, 2025
3 checks passed
@jensen-yan jensen-yan deleted the resolveQueue-align branch December 9, 2025 08:46
@coderabbitai coderabbitai bot mentioned this pull request Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants