Skip to content

Resolve Queue Alignment in Kunminghu-v3#626

Merged
Yakkhini merged 4 commits intoxs-devfrom
resolve-queue-align
Dec 2, 2025
Merged

Resolve Queue Alignment in Kunminghu-v3#626
Yakkhini merged 4 commits intoxs-devfrom
resolve-queue-align

Conversation

@Yakkhini
Copy link
Copy Markdown
Collaborator

@Yakkhini Yakkhini commented Dec 1, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Configurable branch resolution queue size parameter for system performance tuning
    • Three new statistics for comprehensive queue performance monitoring (saturation cycles, full-queue events, operation tracking)
  • Improvements

    • Enhanced queue management to enforce size constraints, prevent overflow conditions, and provide detailed performance tracking

✏️ Tip: You can customize this high-level summary in your review settings.


image

@Yakkhini Yakkhini added the perf label Dec 1, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 1, 2025

🚀 Performance test triggered: spec06-0.8c

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 1, 2025

Walkthrough

This PR modifies the resolve queue management in the O3 CPU's execution engine by removing sorting logic, implementing size-based queue capacity limits, changing dequeue semantics from LIFO to FIFO, and adding statistics to track queue fullness conditions.

Changes

Cohort / File(s) Summary
Configuration Parameter Addition
src/cpu/o3/BaseO3CPU.py
Added resolveQueueSize parameter (Unsigned, default 16) to configure the number of entries in the branch resolution queue.
Header and Interface Changes
src/cpu/o3/iew.hh
Removed sorting methods (resolveQueueEntryCompare, sortResolveQueue); added private resolveQueueSize field; replaced single resolveQueueFull stat with three new statistics: resolveQueueFullCycles, resolveQueueFullEvents, resolveEnqueueFailEvent.
Implementation and Queue Logic
src/cpu/o3/iew.cc
Modified IEW::SquashCheckAfterExe to enforce size limits during enqueue and track queue fullness; removed sortResolveQueue() call in executeInsts; changed dequeue from back() to front() for FIFO semantics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Queue enqueue logic: Verify size checks prevent overflow and conditional entry creation is correct
  • Dequeue direction change: Confirm LIFO-to-FIFO transition (back()front()) is intentional and consistent
  • Statistics tracking: Ensure new statistics are properly initialized and incremented at correct points
  • Parameter propagation: Verify resolveQueueSize from configuration is correctly bound in constructor

Possibly related PRs

  • Resolve align #621: Directly modifies resolve queue handling in the same files, addressing sorting, queue ownership, and transfer semantics.

Suggested reviewers

  • tastynoob

Poem

🐰 A queue reborn, with boundaries set,
FIFO flows where LIFO bet,
No sorting tricks, just order true,
Full cycles tracked in metrics new!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title specifically refers to 'Resolve Queue Alignment in Kunminghu-v3', which directly matches the main changes: adding resolveQueueSize configuration, modifying resolve queue enqueue/dequeue logic, and managing queue alignment.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch resolve-queue-align

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Yakkhini
Copy link
Copy Markdown
Collaborator Author

Yakkhini commented Dec 1, 2025

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/cpu/o3/iew.cc (1)

1803-1807: Front-pop behavior is correct; consider moving the entry instead of copying

Switching to front() + erase(begin()) matches the new FIFO semantics of resolveQueue. Given the small fixed max (16), the erase(begin()) cost is negligible, but you can avoid an extra copy of ResolveQueueEntry (and its resolvedInstPC payload) by moving directly into the fetch queue:

-    if (!resolveQueue.empty()) {
-        ResolveQueueEntry entry = resolveQueue.front();
-        resolveQueue.erase(resolveQueue.begin());
-        toFetch->iewInfo[tid].resolveQueue.push_back(entry);
-    }
+    if (!resolveQueue.empty()) {
+        toFetch->iewInfo[tid].resolveQueue.push_back(
+            std::move(resolveQueue.front()));
+        resolveQueue.erase(resolveQueue.begin());
+    }

This is optional but keeps things cheap if the queue size ever grows in the future.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67534be and 845e8a1.

📒 Files selected for processing (2)
  • src/cpu/o3/iew.cc (3 hunks)
  • src/cpu/o3/iew.hh (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run SPEC Test / XS-GEM5 - Run performance test (spec06-0.8c)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (2)
src/cpu/o3/iew.hh (1)

531-533: New resolveQueueFull stat member is correctly wired and consistent

Declaration and comment are consistent with the other Scalar stats and its usage in IEWStats::IEWStats and SquashCheckAfterExe.

src/cpu/o3/iew.cc (1)

152-185: resolveQueueFull stat registration matches declaration and usage

The new ADD_STAT(resolveQueueFull, ...) entry uses the right units, integrates cleanly into the initializer list, and matches the header declaration and increment site in SquashCheckAfterExe.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 1, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.8581 -
This PR 1.8577 📉 -0.0004 (-0.02%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 845e8a1
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 14.98 15.05 -0.43 🔴

[Generated by GEM5 Performance Robot]
commit: 845e8a1
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 14.98 14.98 +0.02 🟢

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 845e8a1
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 20.20 15.05 +34.27 🟢

Change-Id: I71b39a45a1a664058edb94fdaae24960b1a362fd
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/cpu/o3/iew.cc (1)

1579-1587: Reconsider the resolveQueueFull stat semantics.

The current implementation increments resolveQueueFull whenever the queue is at capacity, regardless of whether a new FSQ entry was actually declined. This differs from the approach discussed in previous reviews, where the stat should only increment when !found && resolveQueue.size() >= resolveQueueSize (i.e., when a new entry cannot be enqueued due to capacity).

If the intent is to track overflow events (declined new entries), consider:

 if (!found && resolveQueue.size() < resolveQueueSize) {
     ResolveQueueEntry newEntry;
     newEntry.resolvedFSQId = fsqId;
     newEntry.resolvedInstPC.push_back(pc);
     resolveQueue.push_back(newEntry);
+} else if (!found && resolveQueue.size() >= resolveQueueSize) {
+    iewStats.resolveQueueFull++;
 }
-if (resolveQueue.size() >= resolveQueueSize) {
-    iewStats.resolveQueueFull++;
-}

Alternatively, implement two separate counters as suggested by Yakkhini in previous reviews.

Based on learnings, this continues the unresolved discussion from previous reviews about stat semantics.

🧹 Nitpick comments (1)
src/cpu/o3/iew.cc (1)

1803-1804: Consider using std::deque for efficient front-erase.

The dequeue logic now uses erase(begin()) on a std::vector, which is an O(n) operation that shifts all remaining elements. If the resolve queue is dequeued frequently, this could impact performance.

Consider changing resolveQueue from std::vector<ResolveQueueEntry> to std::deque<ResolveQueueEntry> in src/cpu/o3/iew.hh for O(1) front removal:

In src/cpu/o3/iew.hh:

-    std::vector<ResolveQueueEntry> resolveQueue;
+    std::deque<ResolveQueueEntry> resolveQueue;

This change maintains the same interface while improving performance for FIFO operations.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 845e8a1 and caa1f5d.

📒 Files selected for processing (3)
  • src/cpu/o3/BaseO3CPU.py (1 hunks)
  • src/cpu/o3/iew.cc (4 hunks)
  • src/cpu/o3/iew.hh (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/o3/iew.cc (1)
src/cpu/o3/iew.hh (1)
  • iewStats (613-613)
🪛 Ruff (0.14.7)
src/cpu/o3/BaseO3CPU.py

242-242: Param may be undefined, or defined from star imports

(F405)

🔇 Additional comments (4)
src/cpu/o3/iew.hh (1)

498-498: LGTM!

The resolveQueueSize member is properly initialized from parameters and used to enforce capacity limits in the resolve queue.

src/cpu/o3/BaseO3CPU.py (1)

242-242: LGTM!

The new resolveQueueSize parameter is properly defined and aligns with the IEW implementation. The static analysis warning about Param being undefined is a false positive—this is a standard gem5 configuration pattern using star imports from m5.params.

src/cpu/o3/iew.cc (2)

95-95: LGTM!

The resolveQueueSize is properly initialized from the parameter system.


183-184: LGTM, but see related comment on stat semantics.

The statistic is properly defined. However, refer to the comment on line 533-534 in src/cpu/o3/iew.hh regarding the unresolved discussion about stat semantics (saturation vs. overflow tracking).

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 2, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7949 -
This PR 1.7830 📉 -0.0119 (-0.66%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: caa1f5d
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 14.63 19.98 -26.76 🔴

jensen-yan
jensen-yan previously approved these changes Dec 2, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 2, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7949 -
This PR 1.7837 📉 -0.0112 (-0.62%)

✅ Difftest smoke test passed!

Change-Id: I3085a57889b6e8955d835aaaa2d0b6f0059b4f42
Change-Id: Ib15c8a7e887fc29baf2a27d616438e8165213179
@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: d1cb617
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 14.75 19.98 -26.16 🔴

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 2, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7949 -
This PR 1.8626 📈 +0.0678 (+3.78%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 9021a2a
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.01 19.98 -24.86 🔴

[Generated by GEM5 Performance Robot]
commit: 9021a2a
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 15.01 14.75 +1.76 🟢

@Yakkhini Yakkhini force-pushed the resolve-queue-align branch from 9021a2a to ed86727 Compare December 2, 2025 08:43
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 2, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7949 -
This PR 1.7830 📉 -0.0119 (-0.66%)

✅ Difftest smoke test passed!

Change-Id: I2a67a614c08321a82c338d040f8180bcfb4d9753
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/cpu/o3/iew.cc (1)

183-189: Clarify statistics semantics and align with past discussion.

The three statistics partially address past reviewer feedback, but their semantics should be more precisely documented:

  • resolveQueueFullCycles: appears to count every cycle when size >= max (see line 1598)
  • resolveQueueFullEvents: incremented when queue becomes full after an enqueue (line 1590)
  • resolveEnqueueFailEvent: incremented when a new FSQ entry is declined (line 1596)

Per past discussion, consider refining the comments to explicitly state:

  • Whether resolveQueueFullCycles counts all cycles at capacity or only during enqueue attempts
  • Whether resolveQueueFullEvents tracks transitions to full state or enqueue-triggered fullness

Based on learnings, this stat semantics issue was flagged in previous reviews and warrants clarification.

src/cpu/o3/iew.hh (1)

533-538: Enhance statistics comments for clarity.

The comments for the three new statistics could be more precise:

  • resolveQueueFullCycles: "cycles the resolve queue is full" - clarify if this counts all cycles at capacity or only during enqueue attempts
  • resolveQueueFullEvents: "events the resolve queue becomes full" - clarify if this is incremented on every transition to full or only after successful enqueues
  • resolveEnqueueFailEvent: "could not be enqueued" - consider specifying "new FSQ entry" to distinguish from PC appends to existing entries

Example improvement:

-        /** Stat for total cycles the resolve queue is full. */
+        /** Stat for total cycles when the resolve queue is at capacity. */
         statistics::Scalar resolveQueueFullCycles;
-        /** Stat for total events of the resolve queue becomes full. */
+        /** Stat for number of times the queue transitions to full after an enqueue. */
         statistics::Scalar resolveQueueFullEvents;
-        /** Stat for total number of enqueue fail events. */
+        /** Stat for number of times a new FSQ entry could not be enqueued due to full queue. */
         statistics::Scalar resolveEnqueueFailEvent;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9021a2a and 3969088.

📒 Files selected for processing (2)
  • src/cpu/o3/iew.cc (4 hunks)
  • src/cpu/o3/iew.hh (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/o3/iew.cc (2)
src/cpu/o3/dyn_inst.hh (1)
  • fsqId (1493-1497)
src/cpu/o3/iew.hh (1)
  • iewStats (617-617)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (4)
src/cpu/o3/iew.cc (2)

95-95: LGTM: Parameter binding follows standard pattern.

The initialization of resolveQueueSize from params is consistent with other member initializations in this constructor.


1814-1818: FIFO dequeue is correctly implemented and aligns with downstream consumers.

The change from LIFO to FIFO ordering (front() + erase(begin()) instead of back() + erase(end()-1)) is intentional and well-integrated. The enqueue uses push_back() and dequeue uses front()—a standard FIFO pattern. The consumer in fetch.cc (line 1498) iterates through the queue sequentially without order-dependent logic, confirming compatibility with the FIFO semantics. The sortResolveQueue() function has been removed, and no code attempts LIFO access patterns.

src/cpu/o3/iew.hh (2)

498-498: LGTM: Queue size member declaration is appropriate.

The resolveQueueSize member is correctly declared as unsigned in the private section, matching typical capacity/size semantics.


44-44: > Likely an incorrect or invalid review comment.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 2, 2025

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.7949 -
This PR 1.7830 📉 -0.0119 (-0.66%)

✅ Difftest smoke test passed!

@Yakkhini Yakkhini merged commit 12d1d24 into xs-dev Dec 2, 2025
2 of 3 checks passed
@Yakkhini Yakkhini deleted the resolve-queue-align branch December 2, 2025 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants