Skip to content

cpu-o3: add more performance counter in resolve queue#666

Merged
Yakkhini merged 1 commit intoxs-devfrom
resolve-queue-align
Dec 22, 2025
Merged

cpu-o3: add more performance counter in resolve queue#666
Yakkhini merged 1 commit intoxs-devfrom
resolve-queue-align

Conversation

@Yakkhini
Copy link
Copy Markdown
Collaborator

@Yakkhini Yakkhini commented Dec 19, 2025

Summary by CodeRabbit

  • New Features
    • Added new runtime metrics tracking resolve-queue enqueue events, dequeue events, and per-cycle queue occupancy to improve observability.
  • Chores
    • Removed the older "queue full" metric and replaced it with the more granular enqueue/dequeue and occupancy measurements.

✏️ Tip: You can customize this high-level summary in your review settings.


image

@Yakkhini Yakkhini added the perf label Dec 19, 2025
@github-actions
Copy link
Copy Markdown

🚀 Performance test triggered: spec06-0.8c

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 19, 2025

Walkthrough

Replaces the single resolveQueueFullCycles stat with three new stats (resolveDequeueCount, resolveEnqueueCount, resolveQueueOccupancy), updates FetchStatGroup initialization, adjusts enqueue/dequeue logic in handleIEWSignals to track enqueue failures, merges, per-cycle occupancy sampling, and increments dequeue counts.

Changes

Cohort / File(s) Summary
Resolve queue stats & enqueue/dequeue logic
src/cpu/o3/fetch.hh, src/cpu/o3/fetch.cc
Removed resolveQueueFullCycles (Scalar). Added resolveDequeueCount (Scalar), resolveEnqueueCount (Distribution), and resolveQueueOccupancy (Distribution). Updated FetchStatGroup constructor initializers. Modified handleIEWSignals to: track enqueueSize/enqueueCount, record full-queue events and failed-enqueue accumulators, merge or create ResolveQueueEntry items, sample per-cycle occupancy, and increment dequeue counter on successful dequeue. Minor structural edits to support sampling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10–15 minutes

  • Pay attention to:
    • Distribution initialization parameters (ranges/buckets)
    • Correctness of enqueue-size/count tracking and sampling frequency
    • Merge vs. new-entry logic and counter increments
    • Points where dequeue increments occur

Possibly related PRs

Suggested reviewers

  • jensen-yan
  • tastynoob

Poem

🐰 I hopped through queues both short and deep,
Counting enqueues while others sleep.
Merges snug, failures flagged with care,
Occupancy tallied, dequeue counted fair.
A little hop for stats, a big carrot to share 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'cpu-o3: add more performance counter in resolve queue' accurately describes the main changes: adding new performance statistics (resolveDequeueCount, resolveEnqueueCount, resolveQueueOccupancy) to the O3 CPU fetch component's resolve queue.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch resolve-queue-align

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b83159 and 6df9382.

📒 Files selected for processing (2)
  • src/cpu/o3/fetch.cc
  • src/cpu/o3/fetch.hh
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cpu/o3/fetch.hh
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/o3/fetch.cc (1)
src/cpu/o3/fetch.hh (1)
  • fetchStats (1120-1120)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (3)
src/cpu/o3/fetch.cc (3)

257-263: LGTM!

The new statistics declarations appropriately replace the removed resolveQueueFullCycles stat with more granular metrics tracking dequeue count, enqueue count distribution, and queue occupancy distribution.


1516-1544: Verify the enqueue count sampling logic.

The resolveEnqueueCount distribution is sampled only when the queue has available space (line 1541, inside the else block starting at line 1519). When the queue is full (lines 1516-1518), no sample is recorded even though enqueueCount remains 0.

Given that resolveEnqueueCount.init(1, 8, 1) starts at 1 (not 0), this appears intentional—tracking only successful enqueue batch sizes rather than per-cycle behavior. However, please confirm this aligns with your performance analysis goals. If you want to capture all cycles (including zero-enqueue cycles when the queue is full), adjust the distribution range to start at 0 and move the sampling outside the conditional block.


1554-1557: LGTM!

The dequeue count increment correctly tracks successful dequeue events after resolveUpdate returns true and the entry is removed from the queue.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/cpu/o3/fetch.cc (2)

1519-1544: Consider tracking dropped entries when queue is near-full.

When the resolve queue is near-full (line 1519), the code increments resolveQueueFullEvents but doesn't track how many incoming CFI entries are being dropped. This means resolveEnqueueCount is only sampled when entries are successfully enqueued, potentially underreporting the actual enqueue attempts.

Consider adding a statistic to track dropped entries or sampling the attempted enqueue count even when near-full to get a complete picture of resolve queue pressure.

Example enhancement
     if (resolveQueueSize && resolveQueue.size() > resolveQueueSize - 4) {
         fetchStats.resolveQueueFullEvents++;
+        fetchStats.resolveEnqueueCount.sample(0);  // Track that we couldn't enqueue
+        // or add a new stat: fetchStats.resolveEnqueueDropCount += incoming.size();
     } else {

1541-1543: Clarify enqueue count behavior in documentation.

The enqueueCount metric (line 1541) only increments for newly added entries, not for entries that are merged into existing queue entries (lines 1526-1530). This means the statistic tracks "new queue entries created" rather than "total CFIs processed."

Consider adding a comment to clarify this distinction, or tracking both metrics separately if the merged count is also valuable for analysis.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0b82f9 and c17bdd4.

📒 Files selected for processing (2)
  • src/cpu/o3/fetch.cc (5 hunks)
  • src/cpu/o3/fetch.hh (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/o3/fetch.cc (1)
src/cpu/o3/fetch.hh (1)
  • fetchStats (1122-1122)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (6)
src/cpu/o3/fetch.cc (6)

45-45: LGTM!

The <cstdint> header is correctly included to support the uint8_t types used in handleIEWSignals() (lines 1516-1517).


260-266: LGTM!

The statistics initialization is correct with appropriate descriptions and units for tracking resolve queue metrics.


1555-1562: LGTM!

The resolveDequeueCount is correctly incremented only when the resolve update succeeds and an entry is actually dequeued from the resolve queue.


1546-1546: LGTM!

The resolveQueueOccupancy is correctly sampled every cycle (every time handleIEWSignals() is called) to track the distribution of queue occupancy over time.


1519-1546: Note: AI summary inconsistency.

The AI summary states "On near-full queue detection in handleIEWSignals, enqueues are tracked; otherwise, enqueue entries are added and enqueue count tracked." However, the actual code behavior is the opposite: when the queue is near-full, incoming entries are dropped and not tracked, while when not near-full, entries are enqueued and the count is tracked.


336-339: Verify resolveQueueOccupancy histogram upper bound against resolveQueueSize parameter.

The occupancy range of 0-100 is too high for the default resolveQueueSize of 16 entries. Use resolveQueueSize as the upper bound instead. The enqueue count range of 1-8 should also be verified to match the maximum number of resolved control-flow instructions IEW can produce in a single cycle.

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.0662 -
This PR 2.0659 📉 -0.0004 (-0.02%)

✅ Difftest smoke test passed!

@Yakkhini
Copy link
Copy Markdown
Collaborator Author

Master:

  • 0.3c: 16.85
  • 0.8c: 19.91

12 + 4 entry size:

  • 0.3c: 16.88
  • 0.8c: 19.91

@Yakkhini Yakkhini added perf and removed perf labels Dec 19, 2025
@github-actions
Copy link
Copy Markdown

🚀 Performance test triggered: spec06-0.8c

@Yakkhini Yakkhini force-pushed the resolve-queue-align branch from 6c65505 to 6b88c8c Compare December 19, 2025 06:24
@Yakkhini Yakkhini added perf and removed perf labels Dec 19, 2025
@github-actions
Copy link
Copy Markdown

🚀 Performance test triggered: spec06-0.8c

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.0662 -
This PR 2.0583 📉 -0.0080 (-0.39%)

✅ Difftest smoke test passed!

@Yakkhini
Copy link
Copy Markdown
Collaborator Author

Master:

  • 0.3c: 16.85
  • 0.8c: 19.91

(4 + 4) entry size:

  • 0.3c: 16.85
  • 0.8c: 16.91

@Yakkhini Yakkhini added perf and removed perf labels Dec 19, 2025
@github-actions
Copy link
Copy Markdown

🚀 Performance test triggered: spec06-0.8c

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.0662 -
This PR 2.0552 📉 -0.0110 (-0.53%)

✅ Difftest smoke test passed!

@Yakkhini
Copy link
Copy Markdown
Collaborator Author

Master:

  • 0.3c: 16.85
  • 0.8c: 19.91(3)

(28 + 4) entry size:

  • 0.3c: 16.87
  • 0.8c: 16.91(4)

@Yakkhini Yakkhini force-pushed the resolve-queue-align branch from 712f365 to 4b83159 Compare December 22, 2025 03:02
@Yakkhini Yakkhini changed the title Resolve Queue DSE cpu-o3: add more performance counter in resolve queue Dec 22, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/cpu/o3/fetch.hh (1)

1109-1114: Comment copy-paste error already flagged.

The incorrect comment on line 1109 has already been identified in previous reviews. As noted, it should read "Stat for total number of resolve dequeue events" instead of "enqueue events."

src/cpu/o3/fetch.cc (1)

1514-1515: Unused variable and overflow risk already flagged.

The issues with these variable declarations have already been identified in previous reviews:

  1. enqueueSize is only used on line 1519 and could be inlined
  2. uint8_t type for both variables risks overflow if resolvedCFIs.size() exceeds 255
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 712f365 and 4b83159.

📒 Files selected for processing (2)
  • src/cpu/o3/fetch.cc
  • src/cpu/o3/fetch.hh
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/o3/fetch.cc (1)
src/cpu/o3/fetch.hh (1)
  • fetchStats (1120-1120)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (5)
src/cpu/o3/fetch.cc (5)

45-45: LGTM!

The <cstdint> header inclusion is appropriate to support the fixed-width integer types used later in the file.


258-264: LGTM!

The statistics initialization follows the established pattern and correctly describes each metric.


1517-1543: Verify queue fullness threshold is intentional.

The logic correctly handles queue management with merge support for duplicate fsqId entries. However, the threshold check on line 1517 uses a hardcoded value:

if (resolveQueueSize && resolveQueue.size() > resolveQueueSize - 4)

The - 4 leaves 4 slots of headroom before the queue is considered full. While this provides a safety margin, verify:

  1. Is this threshold intentional or should it be configurable?
  2. Does this align with the expected burst size of resolved CFIs from IEW?
  3. Should this be documented why 4 slots is the chosen threshold?

Also note that enqueueCount correctly tracks only newly created entries (not merged ones), which is semantically appropriate for an "enqueue" statistic.


1545-1546: LGTM!

The queue occupancy sampling is correctly placed after enqueue processing to capture the current queue state per cycle.


1558-1558: LGTM!

The dequeue counter is correctly incremented only after successful resolve updates, accurately tracking queue removals.

Change-Id: Ic2a77e02704a21611e68598cd5b71cf9542c8462
@Yakkhini Yakkhini force-pushed the resolve-queue-align branch from 4b83159 to 6df9382 Compare December 22, 2025 03:09
@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.0662 -
This PR 2.0659 📉 -0.0004 (-0.02%)

✅ Difftest smoke test passed!

@Yakkhini
Copy link
Copy Markdown
Collaborator Author

Conclusion:Resolve Queue Size has no significant effect on performance. Maybe a 8 entry queue even less than this size structure should be considered, to shrink microarch area.

@Yakkhini Yakkhini merged commit 8ef695e into xs-dev Dec 22, 2025
3 checks passed
@Yakkhini Yakkhini deleted the resolve-queue-align branch December 22, 2025 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants