diff --git a/.gitignore b/.gitignore index 6825061db1..573d5c5e10 100644 --- a/.gitignore +++ b/.gitignore @@ -65,7 +65,6 @@ CLAUDE.md node_modules/ package-lock.json package.json -AGENTS.md microbench/build/ microbench/output/ diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..0b9c48f8f5 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,103 @@ +# Repository Guidelines + +This file defines the collaboration rules and workflow entry points for this repository. + +- For complex tasks, follow the process first instead of jumping straight into code changes. +- For non-trivial behavioral changes, explain the background, assumptions, risks, and validation plan. +- If documentation conflicts, treat the source code as the ground truth, then consult architecture and process docs. + +## Planning + +If a task involves complex feature development, long-running debugging, performance or behavior alignment, larger refactors, or analysis that spans multiple turns, first create or update an ExecPlan according to [PLANS.md](PLANS.md) before continuing. + +Typical cases that should use an ExecPlan include: + +- gem5 / RTL behavior alignment +- Frontend / BPU / FTQ / redirect / flush investigations +- Performance regression analysis +- Refactors that cross multiple modules +- New features that need to be landed in stages + +## Repository Map + +Start with these directories first: + +- `src/`: core source code (C++ / Python), especially `arch/riscv/`, `cpu/o3/`, and `cpu/pred/` +- `configs/`: runtime configurations, especially `configs/example/kmhv3.py` +- `tests/`: test entry points +- `util/`: helper scripts and tools +- `docs/`: documentation, including architecture and execution plans + +For a higher-level map of the codebase, see [ARCHITECTURE.md](ARCHITECTURE.md). + +## Environment Assumptions + +This repository is primarily developed on shared Linux servers. + +- For full-system, checkpoint, and difftest-related tasks, prefer assuming `GCBV_REF_SO` is available. +- The default CI-style reference path is: + `GCBV_REF_SO=/nfs/home/share/gem5_ci/ref/normal/riscv64-nemu-interpreter-so` +- The default explicit setting for `GCB_RESTORER` is: + `GCB_RESTORER=""` +- Whether `GCB_RESTORER` and `AM_HOME` are needed depends on the task: + - restore-related workflows may require checking `GCB_RESTORER` + - some frontend micro-tests and bare-metal test flows may require checking `AM_HOME` +- Before running environment-dependent tasks, check the relevant variables instead of assuming local defaults are correct. + +## Build, Run, and Test Entry Points + +Common entry points: + +- Build optimized binary: + `scons build/RISCV/gem5.opt --gold-linker -j64` +- Build debug binary: + `scons build/RISCV/gem5.debug --gold-linker -j64 --debug-cycle` +- Run the XiangShan configuration: + `./build/RISCV/gem5.opt ./configs/example/kmhv3.py --raw-cpt --generic-rv-cpt=` +- SE mode example: + `./build/RISCV/gem5.opt ./configs/example/se.py -c ` +- Build all unit tests: + `scons build/RISCV/unittests.opt -j100 --unit-test` + +If you need a more systematic understanding of module boundaries, configuration entry points, or execution flow, read [ARCHITECTURE.md](ARCHITECTURE.md) first. + +## Style and Naming + +- C / C++: follow `.clang-format` +- Python: follow the repository's existing formatting and checking workflow +- Naming: + - types / classes: UpperCamelCase + - functions / methods: lower_snake_case + - constants: ALL_CAPS +- Use English for code comments and commit messages +- Keep changes simple and avoid introducing functionality unrelated to the current task + +## Validation Expectations + +For non-trivial changes, do not stop at code edits alone. Validation should match the level of risk. + +Prefer these principles: + +- Behavioral changes: provide a minimal reproduction, key logs, statistics, or test results +- Refactors: confirm there is no behavioral regression, and compare key statistics when needed +- Frontend / BPU / timing-related changes: prefer targeted workloads, unit tests, or checkpoint-based regression +- Analysis tasks: clearly distinguish confirmed facts, current hypotheses, and unresolved questions + +If full validation cannot be completed in the current environment, explicitly state the gap and the remaining risk. + +## Commit and PR Expectations + +- Use imperative English in commit messages +- Prefer module-prefixed commit titles focused on a single change, for example: + `cpu-o3: Fix tage allocation` +- PRs should explain: + - motivation + - approach + - scope of impact + - validation method and results +- Run the repository's style checks and required tests before submission + +## Related Documents + +- [PLANS.md](PLANS.md): ExecPlan rules for complex tasks +- [ARCHITECTURE.md](ARCHITECTURE.md): high-level architecture map of the repository diff --git a/PLANS.md b/PLANS.md new file mode 100644 index 0000000000..50eff30819 --- /dev/null +++ b/PLANS.md @@ -0,0 +1,268 @@ +# ExecPlan Guide + +This document is adapted from [OpenAI's ExecPlan](https://developers.openai.com/cookbook/articles/codex_exec_plans) guidance and tailored to the needs of this repository. + +It defines how execution plans (ExecPlans) should be written and maintained in this codebase. + +ExecPlans are meant for tasks like these: + +- Tasks that cannot realistically be completed in one or two exchanges +- Work that spans multiple steps, files, or experiments +- Investigations that need recorded evidence, decisions, intermediate findings, and current status +- Long efforts where context can easily drift if it is not written down + +If the task is just a very small change, a simple bug fix, or a single-file adjustment, a separate ExecPlan is usually unnecessary. + +## 1. What an ExecPlan Is + +An ExecPlan is not a casual TODO list and not a lightweight checklist. + +It is a living execution document that should answer questions like: + +- What problem is being solved? +- Why is it worth doing? +- What is already known? +- What exactly happens next? +- How do we know the result is correct? +- What did we learn along the way? +- Why did the plan change midway? + +A good ExecPlan should let someone who does not know the previous conversation pick up the task and continue with reasonable confidence. + +## 2. When to Use an ExecPlan + +Create or update an ExecPlan when one or more of the following is true: + +1. The task will likely take significant time or span multiple conversations +2. The work combines at least two of: research, experimentation, debugging, implementation +3. The task touches multiple modules, files, or evidence sources +4. There is meaningful uncertainty and assumptions need to be validated first +5. The task needs a record of why a decision was made, not just what changed +6. The task may be paused and resumed later + +Typical examples: + +- Aligning gem5 behavior with RTL +- Investigating frontend / BPU / FTQ / flush / redirect behavior +- Analyzing a performance regression +- Landing a larger refactor +- Building a feature that must be implemented in phases +- Prototyping before committing to a final design + +## 3. Writing Principles + +### 3.0 Language + +By default, ExecPlans may be written in Chinese for internal development efficiency. + +Use English when the expected audience includes external contributors, or when the plan is intended to be referenced from public-facing documentation, PR discussion, or broader cross-team communication. + +### 3.1 Self-Contained + +An ExecPlan should be as self-contained as possible. + +Do not assume the reader remembers earlier chat history, and do not write things like "same as discussed above". + +If background is necessary to move the task forward, write it into the current document. + +### 3.2 Outcome-Oriented + +Do not stop at "change function X" or "add field Y". + +Explain: + +- What effect you expect +- What the user or developer should be able to observe afterward +- How that outcome will be validated + +### 3.3 Evidence-Oriented + +Especially for analysis tasks, do not write guesses as if they were facts. + +Clearly separate: + +- confirmed facts +- current hypotheses +- open questions +- the evidence supporting a conclusion + +### 3.4 Continuously Updated + +An ExecPlan is a living document, not a one-time writeup. + +As the work progresses, update: + +- current status +- new findings +- decision changes +- next actions + +### 3.5 Explain Why First + +Implementation details can be expanded later, but important decisions must include their rationale. + +Someone picking up the task later should be able to understand why this approach was chosen instead of another one. + +## 4. Recommended Location + +Prefer one ExecPlan file per complex task rather than putting everything into one large document. + +Recommended directory structure: + +```text +docs/ + exec-plans/ + active/ + completed/ + blocked/ +``` + +Meaning: + +- `active/`: tasks currently in progress +- `completed/`: finished tasks +- `blocked/`: tasks paused pending external conditions + +Use concise, descriptive file names, for example: + +- `docs/exec-plans/active/gem5-rtl-fetch-align.md` +- `docs/exec-plans/active/bpu-override-investigation.md` +- `docs/exec-plans/active/spec06-regression-debug.md` + +## 5. Recommended Structure + +Each ExecPlan should usually contain at least the following sections. + +## Title + +Use a short sentence that describes the goal. +The title should prefer "action + object" over vague naming. + +For example: + +- Align gem5 frontend flush behavior with RTL +- Investigate the IPC regression of a SPEC06 benchmark +- Add verifiable observability for BPU override + +--- + +## Background and Goal + +Use a few paragraphs to explain: + +- what the current problem is +- why it matters +- what result should be achieved +- how that result will be observed + +Focus first on the value of the task and the end result, not on implementation details. + +--- + +## Current Known Information + +Record the facts, observations, and constraints already confirmed. +This can include: + +- relevant modules, files, and paths +- current behavior and how it differs from expectations +- logs, counters, traces, waveforms, or test results already observed +- environment constraints + +Do not write guesses as facts in this section. + +## Hypotheses and Open Questions + +If uncertainty remains, list it explicitly. For example: + +- We currently suspect the issue is the timing of override activation +- We are not yet sure whether the second target comes from mainBTB +- We need to verify whether a counter covers the split-request case + +The purpose of this section is to keep the analysis from becoming muddled over time. + +--- + +## Planned Steps + +List the next steps in order. +Each step should ideally be written as "action + goal + expected output". + +For example: + +1. Read the frontend redirect path and confirm the actual control flow in gem5 +2. Cross-check RTL documentation and implementation, then summarize the flush taxonomy +3. Add the required logs or counters and construct a minimal reproduction +4. Run the chosen workload and verify whether the behavior converges +5. Decide whether to keep the current approach or revise it based on the results + +Avoid cryptic shorthand that only the original author can understand. + +--- + +## Validation + +Always specify how success will be judged. + +Validation may mean: + +- tests pass +- logs match expectations +- counters move in the expected direction +- a workload now behaves like RTL +- a performance regression is eliminated +- a scenario is reproducible and then fixed + +Even for analysis-only tasks, define what "done" means. For example: + +- root cause confirmed +- minimal reproduction identified +- candidate causes ruled out +- a concrete recommendation for the next phase is available + +## Progress + +Progress must be updated continuously. +Use checkboxes with timestamps. + +Example: + +- [x] 2026-03-24 10:00 Read the main frontend redirect path and identify the primary entry points +- [x] 2026-03-24 11:20 Cross-check RTL docs and discover that the flush taxonomy differs from the earlier assumption +- [ ] Add counters for the split-request case and verify whether `inflightLoads` fully covers it +- [ ] Construct a minimal workload to validate the second-target selection logic + +If a step is only partially complete, say what has been finished and what remains. + +--- + +## Findings and Surprises + +Record important new findings that appear during the work. +Especially note things like: + +- an earlier understanding was wrong +- docs and code disagree +- a counter definition is unreliable +- a path is more important than expected +- an experiment disproved an earlier hypothesis + +This section matters because long tasks often fail when important intermediate learning is not written down. + +--- + +## Decision Log + +Whenever an important decision is made or the direction changes, record it. + +Recommended format: + +- Decision: ... +- Reason: ... +- Date: ... + +For example: + +- Decision: add observability before changing behavior +- Reason: the root cause is not fully confirmed yet, so changing behavior immediately is too risky +- Date: 2026-03-24 diff --git a/docs/exec-plans/completed/phr-rtl-alignment.md b/docs/exec-plans/completed/phr-rtl-alignment.md new file mode 100644 index 0000000000..1e136b715d --- /dev/null +++ b/docs/exec-plans/completed/phr-rtl-alignment.md @@ -0,0 +1,102 @@ +# 对齐 gem5 与 RTL 的 PHR target 更新语义 + +## 背景与目标 + +当前在评审 PR #814(`Fix path history info target`)时,发现 gem5 旧实现中 +`FullBTBPrediction::getTarget()` 与 `getPHistInfo()` 对 indirect/return target 的处理不一致。 +PR 将两者统一为同一套 target 解析逻辑。 + +但这个修复是否“符合 RTL”仍需单独验证。这里的核心问题不是 +“PHR 是否必须使用真实 target”,而是: + +- XiangShan RTL 中 path history/PHR 更新时,实际使用的是哪一个 target +- 这个 target 是 BTB entry 原始 target,还是经过 override 后的最终预测 target +- gem5 当前 PR 的行为是否在语义上与 RTL 一致 + +本任务的目标是给出基于代码证据的结论,而不是仅凭经验判断。 + +## 当前已知信息 + +- gem5 预测后更新 path history 的入口在 + `src/cpu/pred/btb/decoupled_bpred.cc`,通过 `finalPred.getPHistInfo()` 取得 + `(pc, target, taken)` 后调用 `pHistShiftIn(...)`。 +- PR #814 之前,`getPHistInfo()` 直接使用 `entry.target`; + `getTarget()` 则会对 indirect target 和 return target 做 override。 +- 因此 gem5 旧实现存在“最终预测 target”和“PHR 使用 target”不一致的可能。 +- 已确认 XiangShan RTL 的 PHR 更新显式依赖 `(cfiPc, target)` 的 path hash,而不是仅依赖 PC。 + +## 假设与待验证问题 + +该部分已完成验证。初始假设中的两个分支里,最终结论如下: + +1. RTL 不要求 PHR 在推测更新时必须等于 backend 最终真实执行 target。 +2. 但 RTL 也不是“随便使用一个近似 target”即可;它要求 PHR 使用当前真正驱动 fetch 前进的 target。 +3. 如果后续出现更晚阶段的 override 或 backend redirect,RTL 会基于新的 target 对 PHR 做修正。 + +## 计划步骤 + +1. 阅读 XiangShan `frontend/bpu/tage` 相关 Scala 实现,确认 `tage` 仅消费 folded PHR,不直接维护 PHR。 +2. 追踪 `frontend/bpu/history/phr` 中的 PHR 更新逻辑,确认更新使用 `pathHash(cfiPc, target)`。 +3. 回到 `frontend/bpu/Bpu.scala`,确认 `s1_prediction.target`、`s3_prediction.target` 与 `redirect.bits.target` 都是“当前生效的 fetch target”,包含 RAS / ITTAGE / override 路径。 +4. 对照 gem5 的 `getPHistInfo()`、`getTarget()` 路径,判断 PR #814 是否与 RTL 一致。 +5. 补充 PR 的 `gcc12-spec06-0.8c` 性能数据分析,确认收益方向是否与该类修复的预期一致。 + +## 验证方式 + +- 在 RTL 中找到 path history 更新代码及其输入来源。 +- 能明确回答“PHR 更新使用的 target 是什么”。 +- 能把该结论映射回 gem5 PR #814 的具体代码修改点。 +- 如有现成 CI 数据,验证性能变化是否主要体现在 conditional-path 学习质量相关指标上。 + +## 结论 + +### RTL 语义 + +- XiangShan `tage` 自身只读取 folded PHR,入口位于 + `frontend/bpu/tage/Tage.scala` 的 `io.fromPhr.foldedPathHist` / + `foldedPathHistForTrain`。 +- 真正维护 PHR 的模块是 + `frontend/bpu/history/phr/Phr.scala`。 +- RTL 在 `Phr.scala` 中用 `pathHash(updateCfiPc, updateTarget)` 更新 PHR; + 因此 PHR 的更新输入明确包含 target。 +- `updateTarget` 的来源优先级为: + `redirect > s3_override > s1_valid`。 +- 这些 target 并不是某个静态 BTB entry target: + - `s1_prediction.target` 会在 return 场景下被 uRAS override; + - `s3_prediction.target` 会在 return 场景下被 RAS override,在其他 indirect 场景下可被 ITTAGE override; + - `redirect.bits.target` 则来自 backend/redirect 的修正结果。 + +### 对 gem5 PR #814 的判断 + +- gem5 旧实现的问题,不是“PHR 没有使用真实 target”,而是“PHR 没有使用当前最终生效的预测 target”。 +- 旧代码中 `getTarget()` 已经会对 indirect / return target 做 override, + 但 `getPHistInfo()` 仍直接使用 `entry.target`。 +- 这会导致 fetch 实际沿着 override 后的 target 前进,而 PHR 却按未 override 的 target 更新。 +- XiangShan RTL 的行为明显要求 PHR 跟随当前生效的 fetch target,而不是允许它与 fetch path 脱节。 +- 因此,PR #814 将 `getPHistInfo()` 与 `getTarget()` 统一到同一套 target 解析逻辑,是与 RTL 语义一致的修复。 + +### 性能结果 + +- PR 已合入。评审过程中使用 `gcc12-spec06-0.8c` 的 Ideal BTB 性能数据做了对比分析。 +- 使用 `python3 run.py --slice gcc12` 对 PR run 与主线 Ideal BTB baseline 做对比后,结果为: + - Int score / GHz:`20.6907 -> 20.8375`,`+0.71%` + - Total branch wrong MPKI:`4.9768 -> 4.8501` + - Conditional branch MPKI:`4.8504 -> 4.7279` +- 主要收益集中在 `gobmk`、`sjeng`、`gcc`,且更明显地体现在 conditional-path 相关错误下降上。 +- indirect / return 自身的聚合 MPKI 变化很小,这与“PHR/path history 一致性修复主要改善后续条件分支学习质量”的预期一致。 + +## 进度 + +- [x] 2026-04-09 12:10 确认该问题属于 gem5/RTL 行为对齐,需要单独记录执行计划。 +- [x] 2026-04-09 12:12 复核 gem5 中 `getTarget()` 与 `getPHistInfo()` 的旧差异。 +- [x] 2026-04-09 12:30 阅读 XiangShan RTL 中 `tage`、`history/phr` 与 `Bpu.scala` 路径,确认 PHR 的 target 来源与 update 优先级。 +- [x] 2026-04-09 12:40 给出 gem5 PR #814 与 RTL 的一致性结论。 +- [x] 2026-04-09 14:10 使用 `gem5_data_proc/run.py --slice gcc12` 分析 PR 的 `gcc12-spec06-0.8c` 数据,并将结论评论到 PR。 +- [x] 2026-04-09 14:20 PR 已合入,执行计划转移到 `completed/`。 + +## 发现与意外情况 + +- 当前用户提出了一个关键反问:PHR 使用的 target 不一定必须等于真实 target。 + 这说明评审不能只看 gem5 内部“是否自洽”,还必须核对 RTL 的实际设计语义。 +- GitHub 上部分 `Manual Performance Test` run 的显示元信息会混入 `xs-dev` 头信息,但实际 perf job checkout 的 commit 可能不同。 + 这次分析中最终以 archive 目录内的 `metadata.txt` 为准,避免错误选取 baseline。