Skip to content

Resolve align#621

Merged
jensen-yan merged 3 commits intoxs-devfrom
resolve-align
Nov 28, 2025
Merged

Resolve align#621
jensen-yan merged 3 commits intoxs-devfrom
resolve-align

Conversation

@jensen-yan
Copy link
Copy Markdown
Collaborator

@jensen-yan jensen-yan commented Nov 28, 2025

Summary by CodeRabbit

  • Chores

    • Updated branch predictor configuration settings for CPU simulation.
  • Bug Fixes

    • Optimized CPU execution logic and resolve queue handling for improved simulation accuracy.

✏️ Tip: You can customize this high-level summary in your review settings.

Yakkhini and others added 3 commits November 28, 2025 11:32
Change-Id: I75c3f5f704e39ac4dfd16909ce5da365a2de6a91
for tage aligning

Change-Id: I41078918deeb1b6a514bcfb0bb5df2df578910d0
Change-Id: I732d89766545b668bfce47ffe6be9f9b7161d569
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 28, 2025

Walkthrough

This PR modifies the instruction execution window (IEW) module to introduce local resolve queue management with sorting, and adjusts the branch predictor configuration by enabling resolved update flags while disabling specific predictor components.

Changes

Cohort / File(s) Change Summary
Branch Predictor Configuration
configs/example/kmhv3.py
Enabled resolvedUpdate flags for mbtb and tage predictors; disabled abtb, ittage, mgsc, and ras components in DecoupledBPUWithBTB configuration.
IEW Header
src/cpu/o3/iew.hh
Added local resolveQueue member variable, introduced resolveQueueEntryCompare static comparator sorting by resolvedFSQId (descending), and added sortResolveQueue() helper method.
IEW Implementation
src/cpu/o3/iew.cc
Modified SquashCheckAfterExe and executeInsts methods to accumulate resolve queue entries locally, sort them, and forward entries to thread-side queue after execution.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify the sorting logic and comparator correctness for resolvedFSQId ordering
  • Confirm local queue accumulation and forwarding mechanism maintains correct entry state transitions
  • Validate branch predictor flag changes align with intended execution model modifications

Possibly related PRs

  • Add kmhv3 #609: Introduces the kmhv3.py configuration file that this PR modifies with predictor settings.

Suggested reviewers

  • tastynoob
  • Yakkhini

Poem

🐰 A queue resolves with sorted grace,
Each entry finds its rightful place,
The predictor blooms, the flags align,
Through IEW's dance, the paths entwine! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Resolve align' is vague and generic, providing minimal insight into the actual changes made to the codebase. Use a more descriptive title that captures the main objective, such as 'Add local resolveQueue buffering in IEW stage' or 'Implement resolve queue sorting and forwarding logic'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch resolve-align

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Comment @coderabbitai help to get the list of available commands and usage tips.

@Yakkhini
Copy link
Copy Markdown
Collaborator

related to #601 #602 implementation.

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 1.9517 -
This PR 1.8589 📉 -0.0929 (-4.76%)

✅ Difftest smoke test passed!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cpu/o3/iew.cc (1)

1569-1581: Based on my complete verification of the codebase, the original review comment is accurate and well-founded. All three concerns have been confirmed:

  1. Multi-threading correctness issue is real: resolveQueue is a shared vector with no per-thread tracking, yet entries are forwarded only to the first active thread's iewInfo slot.

  2. Lifetime management issue is real: resolveQueue is never cleared in state transition methods, risking stale entry contamination.

  3. Ordering convention is subtle but intentional: Descending sort with back() pop is correct but easy to misread.


Shared resolveQueue without per-thread tracking creates correctness issues for multi-threaded configurations

The implementation that accumulates into IEW::resolveQueue and forwards entries to the front-end raises critical concerns:

  1. Multi-threading semantics are broken

    • resolveQueue is a single shared std::vector<ResolveQueueEntry>, but ResolveQueueEntry contains no ThreadID field to identify which thread generated each entry.
    • In SquashCheckAfterExe() (lines 1569-1581), resolve entries are added without any thread context:
      for (auto &entry : resolveQueue) {
          if (entry.resolvedFSQId == fsqId) {
              entry.resolvedInstPC.push_back(pc);
              found = true;
          }
      }
    • At the end of executeInsts() (lines 1796-1801), all entries are forwarded into a single thread's slot:
      ThreadID tid = *activeThreads->begin();  // Always first thread
      sortResolveQueue();
      if (!resolveQueue.empty()) {
          ResolveQueueEntry entry = resolveQueue.back();
          resolveQueue.pop_back();
          toFetch->iewInfo[tid].resolveQueue.push_back(entry);
      }
    • If O3 is configured for multi-threading (SMT), this means all resolved updates from any thread get funneled into the first active thread's iewInfo slot. Threads 1, 2, etc. will never see their own resolve updates or will see updates destined for other threads.
    • Fix: Either add a ThreadID field to ResolveQueueEntry and partition/forward by thread in executeInsts(), or make resolveQueue per-thread indexed (e.g., std::vector<ResolveQueueEntry> resolveQueue[MaxThreads];).
  2. resolveQueue is never explicitly cleared across pipeline state transitions

    • resolveQueue is never cleared in clearStates(), startupStage(), or equivalent handlers.
    • This allows stale entries to persist across drains or CPU handovers, contaminating subsequent contexts.
    • Add explicit cleanup: resolveQueue.clear(); in clearStates() and other appropriate state-reset methods.
  3. Sort + back() convention is subtle

    • The comparator orders entries in descending FSQ id, so back() returns the smallest FSQ id (oldest). This is intentional but easy to misread.
    • Add a clarifying comment or consider using front() with a reversed comparator for clarity.

Given that this path trains the MBTB/TAGE predictor with resolved updates, and multi-threaded O3 may be used in future configs, these issues should be addressed before merging.

Also applies to: 1796-1801

🧹 Nitpick comments (1)
src/cpu/o3/iew.hh (1)

497-503: Ensure header declares its dependency on std::sort

sortResolveQueue() uses std::sort, but this header doesn’t include <algorithm>. Relying on transitive includes can be fragile; it’s safer for iew.hh to include <algorithm> itself so any TU including this header compiles independently.

You may also want to add a brief comment describing that resolveQueueEntryCompare orders entries by descending resolvedFSQId, with callers expected to use back() to get the oldest (smallest) FSQ id, just to make the intent obvious to future readers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 348f8f4 and b50a224.

📒 Files selected for processing (3)
  • configs/example/kmhv3.py (1 hunks)
  • src/cpu/o3/iew.cc (3 hunks)
  • src/cpu/o3/iew.hh (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
configs/example/kmhv3.py (3)
src/cpu/pred/btb/timed_base_pred.hh (1)
  • resolvedUpdate (77-77)
src/cpu/pred/btb/test/btb_tage.test.cc (1)
  • tage (275-281)
src/cpu/pred/btb/test/abtb.test.cc (1)
  • abtb (59-67)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (1)
configs/example/kmhv3.py (1)

93-107: DecoupledBPUWithBTB predictor config looks coherent with resolved-update flow

Enabling mbtb.tage.resolvedUpdate and disabling abtb, ittage, mgsc, and ras for this config is internally consistent and matches the PR’s goal of using resolved updates with a trimmed predictor set. No functional issues spotted here.

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: b50a224
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 15.05 17.40 -13.55 🔴

@jensen-yan jensen-yan merged commit fc34e6b into xs-dev Nov 28, 2025
3 checks passed
@jensen-yan jensen-yan deleted the resolve-align branch November 28, 2025 06:29
@coderabbitai coderabbitai bot mentioned this pull request Dec 29, 2025
This was referenced Jan 19, 2026
@coderabbitai coderabbitai bot mentioned this pull request Jan 28, 2026
@coderabbitai coderabbitai bot mentioned this pull request Mar 3, 2026
@coderabbitai coderabbitai bot mentioned this pull request Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants