Resolve Queue Alignment in Kunminghu-v3#626
Conversation
|
🚀 Performance test triggered: spec06-0.8c |
WalkthroughThis PR modifies the resolve queue management in the O3 CPU's execution engine by removing sorting logic, implementing size-based queue capacity limits, changing dequeue semantics from LIFO to FIFO, and adding statistics to track queue fullness conditions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Commit 47a2a91 CI |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/cpu/o3/iew.cc (1)
1803-1807: Front-pop behavior is correct; consider moving the entry instead of copyingSwitching to
front()+erase(begin())matches the new FIFO semantics ofresolveQueue. Given the small fixed max (16), theerase(begin())cost is negligible, but you can avoid an extra copy ofResolveQueueEntry(and itsresolvedInstPCpayload) by moving directly into the fetch queue:- if (!resolveQueue.empty()) { - ResolveQueueEntry entry = resolveQueue.front(); - resolveQueue.erase(resolveQueue.begin()); - toFetch->iewInfo[tid].resolveQueue.push_back(entry); - } + if (!resolveQueue.empty()) { + toFetch->iewInfo[tid].resolveQueue.push_back( + std::move(resolveQueue.front())); + resolveQueue.erase(resolveQueue.begin()); + }This is optional but keeps things cheap if the queue size ever grows in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/cpu/o3/iew.cc(3 hunks)src/cpu/o3/iew.hh(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Run SPEC Test / XS-GEM5 - Run performance test (spec06-0.8c)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (2)
src/cpu/o3/iew.hh (1)
531-533: NewresolveQueueFullstat member is correctly wired and consistentDeclaration and comment are consistent with the other
Scalarstats and its usage inIEWStats::IEWStatsandSquashCheckAfterExe.src/cpu/o3/iew.cc (1)
152-185:resolveQueueFullstat registration matches declaration and usageThe new
ADD_STAT(resolveQueueFull, ...)entry uses the right units, integrates cleanly into the initializer list, and matches the header declaration and increment site inSquashCheckAfterExe.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
Change-Id: I71b39a45a1a664058edb94fdaae24960b1a362fd
845e8a1 to
caa1f5d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/cpu/o3/iew.cc (1)
1579-1587: Reconsider theresolveQueueFullstat semantics.The current implementation increments
resolveQueueFullwhenever the queue is at capacity, regardless of whether a new FSQ entry was actually declined. This differs from the approach discussed in previous reviews, where the stat should only increment when!found && resolveQueue.size() >= resolveQueueSize(i.e., when a new entry cannot be enqueued due to capacity).If the intent is to track overflow events (declined new entries), consider:
if (!found && resolveQueue.size() < resolveQueueSize) { ResolveQueueEntry newEntry; newEntry.resolvedFSQId = fsqId; newEntry.resolvedInstPC.push_back(pc); resolveQueue.push_back(newEntry); +} else if (!found && resolveQueue.size() >= resolveQueueSize) { + iewStats.resolveQueueFull++; } -if (resolveQueue.size() >= resolveQueueSize) { - iewStats.resolveQueueFull++; -}Alternatively, implement two separate counters as suggested by Yakkhini in previous reviews.
Based on learnings, this continues the unresolved discussion from previous reviews about stat semantics.
🧹 Nitpick comments (1)
src/cpu/o3/iew.cc (1)
1803-1804: Consider usingstd::dequefor efficient front-erase.The dequeue logic now uses
erase(begin())on astd::vector, which is an O(n) operation that shifts all remaining elements. If the resolve queue is dequeued frequently, this could impact performance.Consider changing
resolveQueuefromstd::vector<ResolveQueueEntry>tostd::deque<ResolveQueueEntry>insrc/cpu/o3/iew.hhfor O(1) front removal:In
src/cpu/o3/iew.hh:- std::vector<ResolveQueueEntry> resolveQueue; + std::deque<ResolveQueueEntry> resolveQueue;This change maintains the same interface while improving performance for FIFO operations.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/cpu/o3/BaseO3CPU.py(1 hunks)src/cpu/o3/iew.cc(4 hunks)src/cpu/o3/iew.hh(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/o3/iew.cc (1)
src/cpu/o3/iew.hh (1)
iewStats(613-613)
🪛 Ruff (0.14.7)
src/cpu/o3/BaseO3CPU.py
242-242: Param may be undefined, or defined from star imports
(F405)
🔇 Additional comments (4)
src/cpu/o3/iew.hh (1)
498-498: LGTM!The
resolveQueueSizemember is properly initialized from parameters and used to enforce capacity limits in the resolve queue.src/cpu/o3/BaseO3CPU.py (1)
242-242: LGTM!The new
resolveQueueSizeparameter is properly defined and aligns with the IEW implementation. The static analysis warning aboutParambeing undefined is a false positive—this is a standard gem5 configuration pattern using star imports fromm5.params.src/cpu/o3/iew.cc (2)
95-95: LGTM!The
resolveQueueSizeis properly initialized from the parameter system.
183-184: LGTM, but see related comment on stat semantics.The statistic is properly defined. However, refer to the comment on line 533-534 in
src/cpu/o3/iew.hhregarding the unresolved discussion about stat semantics (saturation vs. overflow tracking).
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
caa1f5d to
d1cb617
Compare
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
Change-Id: I3085a57889b6e8955d835aaaa2d0b6f0059b4f42
Change-Id: Ib15c8a7e887fc29baf2a27d616438e8165213179
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
9021a2a to
ed86727
Compare
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
Change-Id: I2a67a614c08321a82c338d040f8180bcfb4d9753
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/cpu/o3/iew.cc (1)
183-189: Clarify statistics semantics and align with past discussion.The three statistics partially address past reviewer feedback, but their semantics should be more precisely documented:
resolveQueueFullCycles: appears to count every cycle whensize >= max(see line 1598)resolveQueueFullEvents: incremented when queue becomes full after an enqueue (line 1590)resolveEnqueueFailEvent: incremented when a new FSQ entry is declined (line 1596)Per past discussion, consider refining the comments to explicitly state:
- Whether
resolveQueueFullCyclescounts all cycles at capacity or only during enqueue attempts- Whether
resolveQueueFullEventstracks transitions to full state or enqueue-triggered fullnessBased on learnings, this stat semantics issue was flagged in previous reviews and warrants clarification.
src/cpu/o3/iew.hh (1)
533-538: Enhance statistics comments for clarity.The comments for the three new statistics could be more precise:
resolveQueueFullCycles: "cycles the resolve queue is full" - clarify if this counts all cycles at capacity or only during enqueue attemptsresolveQueueFullEvents: "events the resolve queue becomes full" - clarify if this is incremented on every transition to full or only after successful enqueuesresolveEnqueueFailEvent: "could not be enqueued" - consider specifying "new FSQ entry" to distinguish from PC appends to existing entriesExample improvement:
- /** Stat for total cycles the resolve queue is full. */ + /** Stat for total cycles when the resolve queue is at capacity. */ statistics::Scalar resolveQueueFullCycles; - /** Stat for total events of the resolve queue becomes full. */ + /** Stat for number of times the queue transitions to full after an enqueue. */ statistics::Scalar resolveQueueFullEvents; - /** Stat for total number of enqueue fail events. */ + /** Stat for number of times a new FSQ entry could not be enqueued due to full queue. */ statistics::Scalar resolveEnqueueFailEvent;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/cpu/o3/iew.cc(4 hunks)src/cpu/o3/iew.hh(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/o3/iew.cc (2)
src/cpu/o3/dyn_inst.hh (1)
fsqId(1493-1497)src/cpu/o3/iew.hh (1)
iewStats(617-617)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (4)
src/cpu/o3/iew.cc (2)
95-95: LGTM: Parameter binding follows standard pattern.The initialization of
resolveQueueSizefrom params is consistent with other member initializations in this constructor.
1814-1818: FIFO dequeue is correctly implemented and aligns with downstream consumers.The change from LIFO to FIFO ordering (
front()+erase(begin())instead ofback()+erase(end()-1)) is intentional and well-integrated. The enqueue usespush_back()and dequeue usesfront()—a standard FIFO pattern. The consumer infetch.cc(line 1498) iterates through the queue sequentially without order-dependent logic, confirming compatibility with the FIFO semantics. ThesortResolveQueue()function has been removed, and no code attempts LIFO access patterns.src/cpu/o3/iew.hh (2)
498-498: LGTM: Queue size member declaration is appropriate.The
resolveQueueSizemember is correctly declared asunsignedin the private section, matching typical capacity/size semantics.
44-44: > Likely an incorrect or invalid review comment.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
Summary by CodeRabbit
Release Notes
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.