cpu-o3: add more performance counter in resolve queue by Yakkhini · Pull Request #666 · OpenXiangShan/GEM5

Yakkhini · 2025-12-19T03:05:52Z

Summary by CodeRabbit

New Features
- Added new runtime metrics tracking resolve-queue enqueue events, dequeue events, and per-cycle queue occupancy to improve observability.
Chores
- Removed the older "queue full" metric and replaced it with the more granular enqueue/dequeue and occupancy measurements.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-12-19T03:06:01Z

🚀 Performance test triggered: spec06-0.8c

coderabbitai · 2025-12-19T03:06:09Z

Walkthrough

Replaces the single resolveQueueFullCycles stat with three new stats (resolveDequeueCount, resolveEnqueueCount, resolveQueueOccupancy), updates FetchStatGroup initialization, adjusts enqueue/dequeue logic in handleIEWSignals to track enqueue failures, merges, per-cycle occupancy sampling, and increments dequeue counts.

Changes

Cohort / File(s)	Summary
Resolve queue stats & enqueue/dequeue logic `src/cpu/o3/fetch.hh`, `src/cpu/o3/fetch.cc`	Removed `resolveQueueFullCycles` (Scalar). Added `resolveDequeueCount` (Scalar), `resolveEnqueueCount` (Distribution), and `resolveQueueOccupancy` (Distribution). Updated FetchStatGroup constructor initializers. Modified `handleIEWSignals` to: track enqueueSize/enqueueCount, record full-queue events and failed-enqueue accumulators, merge or create ResolveQueueEntry items, sample per-cycle occupancy, and increment dequeue counter on successful dequeue. Minor structural edits to support sampling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10–15 minutes

Pay attention to:
- Distribution initialization parameters (ranges/buckets)
- Correctness of enqueue-size/count tracking and sampling frequency
- Merge vs. new-entry logic and counter increments
- Points where dequeue increments occur

Possibly related PRs

cpu-o3: move resolve queue merging from iew to fetch stage #635: Related changes to fetch-stage resolve-queue handling and stats.
Resolve Queue Alignment in Kunminghu-v3 #626: Related modifications to branch resolve-queue enqueue/dequeue and occupancy tracking.

Suggested reviewers

jensen-yan
tastynoob

Poem

🐰 I hopped through queues both short and deep,
Counting enqueues while others sleep.
Merges snug, failures flagged with care,
Occupancy tallied, dequeue counted fair.
A little hop for stats, a big carrot to share 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'cpu-o3: add more performance counter in resolve queue' accurately describes the main changes: adding new performance statistics (resolveDequeueCount, resolveEnqueueCount, resolveQueueOccupancy) to the O3 CPU fetch component's resolve queue.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch resolve-queue-align

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b83159 and 6df9382.

📒 Files selected for processing (2)

src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh

🚧 Files skipped from review as they are similar to previous changes (1)

src/cpu/o3/fetch.hh

🧰 Additional context used

🧬 Code graph analysis (1)

src/cpu/o3/fetch.cc (1)

src/cpu/o3/fetch.hh (1)

fetchStats (1120-1120)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Quick Build, Unit Tests & Smoke Test
GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

🔇 Additional comments (3)

src/cpu/o3/fetch.cc (3)

257-263: LGTM!

The new statistics declarations appropriately replace the removed resolveQueueFullCycles stat with more granular metrics tracking dequeue count, enqueue count distribution, and queue occupancy distribution.

1516-1544: Verify the enqueue count sampling logic.

The resolveEnqueueCount distribution is sampled only when the queue has available space (line 1541, inside the else block starting at line 1519). When the queue is full (lines 1516-1518), no sample is recorded even though enqueueCount remains 0.

Given that resolveEnqueueCount.init(1, 8, 1) starts at 1 (not 0), this appears intentional—tracking only successful enqueue batch sizes rather than per-cycle behavior. However, please confirm this aligns with your performance analysis goals. If you want to capture all cycles (including zero-enqueue cycles when the queue is full), adjust the distribution range to start at 0 and move the sampling outside the conditional block.

1554-1557: LGTM!

The dequeue count increment correctly tracks successful dequeue events after resolveUpdate returns true and the entry is removed from the queue.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/cpu/o3/fetch.cc (2)
1519-1544: Consider tracking dropped entries when queue is near-full.

When the resolve queue is near-full (line 1519), the code increments resolveQueueFullEvents but doesn't track how many incoming CFI entries are being dropped. This means resolveEnqueueCount is only sampled when entries are successfully enqueued, potentially underreporting the actual enqueue attempts.

Consider adding a statistic to track dropped entries or sampling the attempted enqueue count even when near-full to get a complete picture of resolve queue pressure.
Example enhancement
     if (resolveQueueSize && resolveQueue.size() > resolveQueueSize - 4) {
         fetchStats.resolveQueueFullEvents++;
+        fetchStats.resolveEnqueueCount.sample(0);  // Track that we couldn't enqueue
+        // or add a new stat: fetchStats.resolveEnqueueDropCount += incoming.size();
     } else {
1541-1543: Clarify enqueue count behavior in documentation.

The enqueueCount metric (line 1541) only increments for newly added entries, not for entries that are merged into existing queue entries (lines 1526-1530). This means the statistic tracks "new queue entries created" rather than "total CFIs processed."

Consider adding a comment to clarify this distinction, or tracking both metrics separately if the merged count is also valuable for analysis.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0b82f9 and c17bdd4.

📒 Files selected for processing (2)

src/cpu/o3/fetch.cc (5 hunks)
src/cpu/o3/fetch.hh (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/cpu/o3/fetch.cc (1)

src/cpu/o3/fetch.hh (1)

fetchStats (1122-1122)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Quick Build, Unit Tests & Smoke Test
GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

🔇 Additional comments (6)

src/cpu/o3/fetch.cc (6)

45-45: LGTM!

The <cstdint> header is correctly included to support the uint8_t types used in handleIEWSignals() (lines 1516-1517).

260-266: LGTM!

The statistics initialization is correct with appropriate descriptions and units for tracking resolve queue metrics.

1555-1562: LGTM!

The resolveDequeueCount is correctly incremented only when the resolve update succeeds and an entry is actually dequeued from the resolve queue.

1546-1546: LGTM!

The resolveQueueOccupancy is correctly sampled every cycle (every time handleIEWSignals() is called) to track the distribution of queue occupancy over time.

1519-1546: Note: AI summary inconsistency.

The AI summary states "On near-full queue detection in handleIEWSignals, enqueues are tracked; otherwise, enqueue entries are added and enqueue count tracked." However, the actual code behavior is the opposite: when the queue is near-full, incoming entries are dropped and not tracked, while when not near-full, entries are enqueued and the count is tracked.

336-339: Verify resolveQueueOccupancy histogram upper bound against resolveQueueSize parameter.

The occupancy range of 0-100 is too high for the default resolveQueueSize of 16 entries. Use resolveQueueSize as the upper bound instead. The enqueue count range of 1-8 should also be verified to match the maximum number of resolved control-flow instructions IEW can produce in a single cycle.

src/cpu/o3/fetch.cc

src/cpu/o3/fetch.hh

github-actions · 2025-12-19T03:13:42Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.0662`	-
This PR	`2.0659`	📉 `-0.0004` (`-0.02%`)

✅ Difftest smoke test passed!

Yakkhini · 2025-12-19T06:15:50Z

Master:

0.3c: 16.85
0.8c: 19.91

12 + 4 entry size:

0.3c: 16.88
0.8c: 19.91

github-actions · 2025-12-19T06:17:45Z

🚀 Performance test triggered: spec06-0.8c

github-actions · 2025-12-19T06:25:13Z

🚀 Performance test triggered: spec06-0.8c

github-actions · 2025-12-19T07:23:37Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.0662`	-
This PR	`2.0583`	📉 `-0.0080` (`-0.39%`)

✅ Difftest smoke test passed!

Yakkhini · 2025-12-19T09:33:57Z

Master:

0.3c: 16.85
0.8c: 19.91

(4 + 4) entry size:

0.3c: 16.85
0.8c: 16.91

github-actions · 2025-12-19T09:49:22Z

🚀 Performance test triggered: spec06-0.8c

github-actions · 2025-12-19T09:57:18Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.0662`	-
This PR	`2.0552`	📉 `-0.0110` (`-0.53%`)

✅ Difftest smoke test passed!

Yakkhini · 2025-12-22T02:57:28Z

Master:

0.3c: 16.85
0.8c: 19.91(3)

(28 + 4) entry size:

0.3c: 16.87
0.8c: 16.91(4)

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

src/cpu/o3/fetch.hh (1)

1109-1114: Comment copy-paste error already flagged.

The incorrect comment on line 1109 has already been identified in previous reviews. As noted, it should read "Stat for total number of resolve dequeue events" instead of "enqueue events."

src/cpu/o3/fetch.cc (1)

1514-1515: Unused variable and overflow risk already flagged.

The issues with these variable declarations have already been identified in previous reviews:

enqueueSize is only used on line 1519 and could be inlined

uint8_t type for both variables risks overflow if resolvedCFIs.size() exceeds 255

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 712f365 and 4b83159.

📒 Files selected for processing (2)

src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh

🧰 Additional context used

🧬 Code graph analysis (1)

src/cpu/o3/fetch.cc (1)

src/cpu/o3/fetch.hh (1)

fetchStats (1120-1120)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Quick Build, Unit Tests & Smoke Test
GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

🔇 Additional comments (5)

src/cpu/o3/fetch.cc (5)
45-45: LGTM!

The <cstdint> header inclusion is appropriate to support the fixed-width integer types used later in the file.

258-264: LGTM!

The statistics initialization follows the established pattern and correctly describes each metric.

1517-1543: Verify queue fullness threshold is intentional.

The logic correctly handles queue management with merge support for duplicate fsqId entries. However, the threshold check on line 1517 uses a hardcoded value:
if (resolveQueueSize && resolveQueue.size() > resolveQueueSize - 4)
The - 4 leaves 4 slots of headroom before the queue is considered full. While this provides a safety margin, verify:

Is this threshold intentional or should it be configurable?

Does this align with the expected burst size of resolved CFIs from IEW?

Should this be documented why 4 slots is the chosen threshold?

Also note that enqueueCount correctly tracks only newly created entries (not merged ones), which is semantically appropriate for an "enqueue" statistic.

1545-1546: LGTM!

The queue occupancy sampling is correctly placed after enqueue processing to capture the current queue state per cycle.

1558-1558: LGTM!

The dequeue counter is correctly incremented only after successful resolve updates, accurately tracking queue removals.

src/cpu/o3/fetch.cc

Change-Id: Ic2a77e02704a21611e68598cd5b71cf9542c8462

github-actions · 2025-12-22T03:16:09Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.0662`	-
This PR	`2.0659`	📉 `-0.0004` (`-0.02%`)

✅ Difftest smoke test passed!

Yakkhini · 2025-12-22T03:34:33Z

Conclusion：Resolve Queue Size has no significant effect on performance. Maybe a 8 entry queue even less than this size structure should be considered, to shrink microarch area.

src/cpu/o3/fetch.cc

Yakkhini added the perf label Dec 19, 2025

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

src/cpu/o3/fetch.cc Show resolved Hide resolved

src/cpu/o3/fetch.hh Outdated Show resolved Hide resolved

Yakkhini added perf and removed perf labels Dec 19, 2025

Yakkhini force-pushed the resolve-queue-align branch from 6c65505 to 6b88c8c Compare December 19, 2025 06:24

Yakkhini added perf and removed perf labels Dec 19, 2025

Yakkhini force-pushed the resolve-queue-align branch from 712f365 to 4b83159 Compare December 22, 2025 03:02

Yakkhini changed the title ~~Resolve Queue DSE~~ cpu-o3: add more performance counter in resolve queue Dec 22, 2025

Yakkhini assigned jensen-yan Dec 22, 2025

coderabbitai bot reviewed Dec 22, 2025

View reviewed changes

src/cpu/o3/fetch.cc Outdated Show resolved Hide resolved

cpu-o3: add more performance counter in resolve queue

6df9382

Change-Id: Ic2a77e02704a21611e68598cd5b71cf9542c8462

Yakkhini force-pushed the resolve-queue-align branch from 4b83159 to 6df9382 Compare December 22, 2025 03:09

jensen-yan reviewed Dec 22, 2025

View reviewed changes

src/cpu/o3/fetch.cc Show resolved Hide resolved

jensen-yan approved these changes Dec 22, 2025

View reviewed changes

Yakkhini merged commit 8ef695e into xs-dev Dec 22, 2025
3 checks passed

Yakkhini deleted the resolve-queue-align branch December 22, 2025 07:28

coderabbitai bot mentioned this pull request Dec 23, 2025

cpu-o3: 2 entry size resolve queue for evaluation #670

Closed

Conversation

Yakkhini commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

coderabbitai bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 19, 2025

🚀 Coremark Smoke Test Results

Uh oh!

Yakkhini commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

🚀 Coremark Smoke Test Results

Uh oh!

Yakkhini commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

🚀 Coremark Smoke Test Results

Uh oh!

Yakkhini commented Dec 22, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 22, 2025

🚀 Coremark Smoke Test Results

Uh oh!

Yakkhini commented Dec 22, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yakkhini commented Dec 19, 2025 •

edited

Loading

coderabbitai bot commented Dec 19, 2025 •

edited

Loading