|
22 | 22 | ## 设计决策 |
23 | 23 |
|
24 | 24 | ### 1. Bank分区方案 |
25 | | -``` |
26 | | -blockSize = 32B (2^5) |
27 | | -pc[4:0] - 忽略(32B块内偏移) |
28 | | -pc[6:5] - bank_id (4个bank, NumBanks=4) |
29 | | -pc[N:7] - 用于计算table index(与folded history XOR) |
30 | | -``` |
| 25 | +以 XiangShan 语义对齐:先去掉 `instShiftAmt`(目前为 1)个指令粒度位,再取随后的 `ceilLog2(numBanks)` 位作为 bank id,剩余更高位与 folded history 组合得到 index / tag。这样可以直接利用 `startPC` 的最低有效位,默认只屏蔽半字节对齐,并允许在未来按需扩展 bank 颗粒度。 |
31 | 26 |
|
32 | 27 | ### 2. 周期模型(已确认实际调用顺序)✅ |
33 | 28 | - 一个"周期"内:**先调用 `update(块B)` 进行更新,后调用 `putPCHistory(块A)` 进行预测** |
@@ -72,213 +67,17 @@ void putPCHistory() { |
72 | 67 |
|
73 | 68 | ### 3. 数据结构 |
74 | 69 | - **保持三维数组**:`tageTable[table][index][way]` |
75 | | -- **不改为四维**:避免大规模重构,在计算index时分离bank信息 |
76 | | -- bank信息通过计算函数隐式处理 |
| 70 | +- **不改为四维**:在 index/tag 计算时剥离 bank 比特即可 |
| 71 | +- bank 信息通过 `bankBaseShift`/`bankIdWidth` 描述,内联函数统一取用 |
77 | 72 |
|
78 | 73 | --- |
79 | 74 |
|
80 | 75 | ## 修改清单 |
81 | 76 |
|
82 | | -### 文件1: `btb_tage.hh` |
83 | | - |
84 | | -#### A. 添加Bank相关成员变量(已参数化)✅ |
85 | | -```cpp |
86 | | -class BTBTAGE { |
87 | | -private: |
88 | | - // Bank configuration (参数化,从BranchPredictor.py传入) |
89 | | - const unsigned numBanks; // Number of banks (e.g., 4) |
90 | | - const unsigned bankIdWidth; // log2(numBanks), computed in constructor |
91 | | - const unsigned bankIdShift; // floorLog2(blockSize), e.g., 5 for 32B blocks |
92 | | - const unsigned indexShift; // bankIdShift + bankIdWidth when enabled; fallback uses bankIdShift |
93 | | - |
94 | | - // Track last prediction bank for conflict detection |
95 | | - unsigned lastPredBankId; // Bank ID of last prediction |
96 | | - bool predBankValid; // Whether lastPredBankId is valid |
97 | | -``` |
98 | | -
|
99 | | -#### B. 添加Bank相关统计 |
100 | | -```cpp |
101 | | -struct TageStats { |
102 | | - // ... existing stats ... |
103 | | -
|
104 | | - Scalar updateBankConflict; // Bank冲突次数 |
105 | | - Scalar updateDroppedDueToConflict; // 因冲突丢弃的更新次数 |
106 | | -} |
107 | | -``` |
108 | | - |
109 | | -#### C. 添加Bank计算函数声明 |
110 | | -```cpp |
111 | | -private: |
112 | | - // Get bank ID from aligned PC |
113 | | - // Extract pc[bankIdShift+bankIdWidth-1 : bankIdShift] |
114 | | - // For 32B blocks with 4 banks: pc[6:5] |
115 | | - unsigned getBankId(Addr alignedPC) const; |
116 | | -``` |
117 | | -
|
118 | | ---- |
119 | | -
|
120 | | -### 文件2: `btb_tage.cc` |
121 | | -
|
122 | | -#### A. 实现Bank计算函数(参数化版本)✅ |
123 | | -```cpp |
124 | | -unsigned |
125 | | -BTBTAGE::getBankId(Addr alignedPC) const |
126 | | -{ |
127 | | - // Extract bank ID bits from aligned PC |
128 | | - // bankIdShift is the starting bit position (5 for 32B blocks) |
129 | | - // bankIdWidth is the number of bits (2 for 4 banks) |
130 | | - // Example: pc[6:5] for 32B blocks with 4 banks |
131 | | - return (alignedPC >> bankIdShift) & ((1 << bankIdWidth) - 1); |
132 | | -} |
133 | | -``` |
134 | | - |
135 | | -#### B. 修改 `getTageIndex()` 函数(参数化版本)✅ |
136 | | -**原实现**: |
137 | | -```cpp |
138 | | -Addr getTageIndex(Addr pc, int t, uint64_t foldedHist) { |
139 | | - Addr mask = (1ULL << tableIndexBits[t]) - 1; |
140 | | - Addr pcBits = (pc >> floorLog2(blockSize)) & mask; // pc >> 5 |
141 | | - Addr foldedBits = foldedHist & mask; |
142 | | - return pcBits ^ foldedBits; |
143 | | -} |
144 | | -``` |
145 | | -
|
146 | | -**修改为(依据 enableBankConflict 动态选择 shift)**: |
147 | | -```cpp |
148 | | -Addr |
149 | | -BTBTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist) |
150 | | -{ |
151 | | - // Create mask for tableIndexBits[t] to limit result size |
152 | | - Addr mask = (1ULL << tableIndexBits[t]) - 1; |
153 | | -
|
154 | | - // Index calculation skips bank bits to avoid bank aliasing |
155 | | - // For 32B blocks (5 bits) with 4 banks (2 bits): |
156 | | - // - pc[4:0]: block offset (ignored) |
157 | | - // - pc[6:5]: bank ID (skipped) |
158 | | - // - pc[N:7]: used for index calculation |
159 | | - // Each bank effectively manages 1/4 of the PC space with the same table size |
160 | | - const unsigned pcShift = enableBankConflict ? indexShift : bankIdShift; |
161 | | - Addr pcBits = (pc >> pcShift) & mask; // Skip blockSize + bank bits only when enabled |
162 | | - Addr foldedBits = foldedHist & mask; |
163 | | -
|
164 | | - return pcBits ^ foldedBits; |
165 | | -} |
166 | | -``` |
167 | | - |
168 | | -**重要说明**: |
169 | | -- index计算跳过了bank位,避免bank混叠 |
170 | | -- 每个bank管理不同的PC空间范围 |
171 | | -- 4个bank共享tableSizes,总容量不变 |
172 | | - |
173 | | -#### C. 修改 `putPCHistory()` - 记录预测的bank |
174 | | -```cpp |
175 | | -void |
176 | | -BTBTAGE::putPCHistory(Addr stream_start, const bitset &history, |
177 | | - std::vector<FullBTBPrediction> &stagePreds) { |
178 | | - Addr alignedPC = stream_start & ~(blockSize - 1); |
179 | | - |
180 | | - // Record prediction bank for conflict detection |
181 | | - lastPredBankId = getBankId(alignedPC); |
182 | | - predBankValid = true; |
183 | | - |
184 | | - DPRINTF(TAGE, "putPCHistory startAddr: %#lx, alignedPC: %#lx, bank: %u\n", |
185 | | - stream_start, alignedPC, lastPredBankId); |
186 | | - |
187 | | - // ... rest of the function remains same ... |
188 | | -} |
189 | | -``` |
190 | | -
|
191 | | -#### D. 修改 `update()` - 检测bank冲突 |
192 | | -```cpp |
193 | | -void |
194 | | -BTBTAGE::update(const FetchStream &stream) { |
195 | | - Addr startAddr = stream.getRealStartPC(); |
196 | | - Addr alignedPC = startAddr & ~(blockSize - 1); |
197 | | - unsigned updateBank = getBankId(alignedPC); |
198 | | -
|
199 | | - DPRINTF(TAGE, "update startAddr: %#lx, alignedPC: %#lx, bank: %u\n", |
200 | | - startAddr, alignedPC, updateBank); |
201 | | -
|
202 | | - // Check bank conflict |
203 | | - if (predBankValid && updateBank == lastPredBankId) { |
204 | | - tageStats.updateBankConflict++; |
205 | | - tageStats.updateDroppedDueToConflict++; |
206 | | - DPRINTF(TAGE, "Bank conflict detected: bank %u, dropping update\n", updateBank); |
207 | | - return; // Drop this update |
208 | | - } |
209 | | -
|
210 | | - // ... rest of the function remains same ... |
211 | | -} |
212 | | -``` |
213 | | - |
214 | | -#### E. 初始化Bank状态(构造函数) |
215 | | -```cpp |
216 | | -BTBTAGE::BTBTAGE(...) |
217 | | - : ..., |
218 | | - lastPredBankId(0), |
219 | | - predBankValid(false), |
220 | | - ... |
221 | | -{ |
222 | | - // ... existing initialization ... |
223 | | -} |
224 | | -``` |
225 | | -
|
226 | | -#### F. 统计初始化(TageStats构造函数) |
227 | | -```cpp |
228 | | -BTBTAGE::TageStats::TageStats(...) |
229 | | - : ..., |
230 | | - ADD_STAT(updateBankConflict, statistics::units::Count::get(), |
231 | | - "Number of bank conflicts detected"), |
232 | | - ADD_STAT(updateDroppedDueToConflict, statistics::units::Count::get(), |
233 | | - "Number of updates dropped due to bank conflict"), |
234 | | - ... |
235 | | -{ |
236 | | - // ... existing stats ... |
237 | | -} |
238 | | -``` |
239 | | - |
240 | | -#### G. 强制 updateOnRead = true, 通过添加warn 的额方式 |
241 | | -在构造函数中: |
242 | | -```cpp |
243 | | -#ifndef UNIT_TEST |
244 | | -BTBTAGE::BTBTAGE(const Params& p): |
245 | | - ..., |
246 | | - updateOnRead(p.updateOnRead) |
247 | | - // Add warning if parameter was set differently |
248 | | - if (!p.updateOnRead) { |
249 | | - warn("BTBTAGE: updateOnRead forced to true for bank simulation"); |
250 | | - } |
251 | | -``` |
252 | | -
|
253 | | ---- |
254 | | -
|
255 | | -## 实施步骤(全部完成)✅ |
256 | | -
|
257 | | -### 阶段1: 基础Bank计算 ✅ |
258 | | -1. ✅ 添加参数化的 `getBankId()` 函数 |
259 | | -2. ✅ 在 `putPCHistory()` 和 `update()` 中调用并打印bank信息 |
260 | | -3. ✅ 验证bank计算正确性(debug输出显示bank ID正确) |
261 | | -
|
262 | | -### 阶段2: 修改index计算 ✅ |
263 | | -1. ✅ 修改 `getTageIndex()`:在启用 bank 模拟时使用 `indexShift`,关闭时退回 `pc >> floorLog2(blockSize)` |
264 | | -2. ✅ 运行测试,确认功能正确性(编译通过,运行成功) |
265 | | -
|
266 | | -### 阶段3: 添加bank冲突检测 ✅ |
267 | | -1. ✅ 添加 `lastPredBankId` 和 `predBankValid` 状态 |
268 | | -2. ✅ 在 `putPCHistory()` 中记录预测bank |
269 | | -3. ✅ 在 `update()` 中检测冲突并丢弃更新(方案2:结束时清空) |
270 | | -4. ✅ 添加统计counter(updateBankConflict, updateDroppedDueToConflict) |
271 | | -
|
272 | | -### 阶段4: updateOnRead警告 ✅ |
273 | | -1. ✅ 添加warning信息(当updateOnRead=false时) |
274 | | -
|
275 | | -### 阶段5: 测试与验证 ✅ |
276 | | -1. ✅ 编译成功(gem5.debug,365MB) |
277 | | -2. ✅ 运行dummy测试,观察到: |
278 | | - - Bank ID计算正确(0x800000c0→bank 2, 0x80000100→bank 0) |
279 | | - - 检测到bank冲突(tage.updateBankConflict: 1) |
280 | | - - 丢弃更新正常(tage.updateDroppedDueToConflict: 1) |
281 | | -3. ✅ Debug trace显示预测和更新的bank信息 |
| 77 | +### 调整记录(概要) |
| 78 | +- `btb_tage.hh` 中新增 `blockWidth` 与 `bankBaseShift`,前者仍等于 `floorLog2(blockSize)`,后者默认为 `instShiftAmt`,用于剥离指令对齐位;`indexShift` 计算依赖上述两个常量。 |
| 79 | +- `getBankId()`/`getTageIndex()`/`getTageTag()` 等 helper 统一使用这些新偏移,确保预测与更新都以同样的 `startPC` 语义工作;bank 位只在启用冲突模拟时跳过,其余情况下维持 legacy 行为。 |
| 80 | +- `putPCHistory()`/`update()` 记录 bank id 时直接基于 `startPC`,减少额外对齐开销,统计项维持不变。 |
282 | 81 |
|
283 | 82 | --- |
284 | 83 |
|
@@ -342,114 +141,3 @@ private: |
342 | 141 | } |
343 | 142 | } |
344 | 143 | ``` |
345 | | - |
346 | | ---- |
347 | | - |
348 | | -## 验证标准(已验证)✅ |
349 | | - |
350 | | -### 功能正确性 ✅ |
351 | | -- ✅ **bank计算正确**:debug输出显示bank ID符合预期 |
352 | | - ``` |
353 | | - 0x800000c0 (pc[6:5]=11b) -> bank 2 ✓ |
354 | | - 0x80000100 (pc[6:5]=00b) -> bank 0 ✓ |
355 | | - 0x80000140 (pc[6:5]=10b) -> bank 2 ✓ |
356 | | - 0x80000180 (pc[6:5]=00b) -> bank 0 ✓ |
357 | | - ``` |
358 | | -- ✅ **index计算正确**:启用bank模拟时跳过 bank 位,关闭后退回旧逻辑 |
359 | | -- ✅ **冲突检测正确**:统计显示检测到bank冲突并丢弃更新 |
360 | | - |
361 | | -### 性能指标 ✅ |
362 | | -- ✅ **bank冲突统计正常**: |
363 | | - ``` |
364 | | - tage.updateBankConflict: 1 |
365 | | - tage.updateDroppedDueToConflict: 1 |
366 | | - microtage.updateBankConflict: 0 |
367 | | - ``` |
368 | | -- ✅ **统计项正常工作**:所有新增统计项均正确记录 |
369 | | - |
370 | | -### 实际测试结果 ✅ |
371 | | -```bash |
372 | | -# 测试程序:dummy-riscv64-xs.bin |
373 | | -# 编译:gem5.debug (365MB, 2025-11-19 18:42) |
374 | | -# 运行:成功完成,无崩溃 |
375 | | - |
376 | | -# Debug输出示例: |
377 | | -putPCHistory startAddr: 0x800000c0, alignedPC: 0x800000c0, bank: 2 |
378 | | -putPCHistory startAddr: 0x80000100, alignedPC: 0x80000100, bank: 0 |
379 | | -putPCHistory startAddr: 0x80000140, alignedPC: 0x80000140, bank: 2 |
380 | | - |
381 | | -# 统计结果: |
382 | | -system.cpu.branchPred.tage.updateBankConflict: 1 |
383 | | -system.cpu.branchPred.tage.updateDroppedDueToConflict: 1 |
384 | | -``` |
385 | | - |
386 | | ---- |
387 | | - |
388 | | -## 参数配置(BranchPredictor.py)✅ |
389 | | - |
390 | | -**已实现的参数**: |
391 | | -```python |
392 | | -class BTBTAGE(TimedBaseBTBPredictor): |
393 | | - # ... existing params ... |
394 | | - |
395 | | - # Bank configuration |
396 | | - numBanks = Param.Unsigned(4, "Number of banks for bank conflict simulation") |
397 | | - numDelay = 2 |
398 | | -``` |
399 | | - |
400 | | -**使用方式**: |
401 | | -- 默认配置:`numBanks = 4`,启用bank冲突模拟 |
402 | | -- 修改bank数量:在配置文件中设置不同的numBanks值 |
403 | | -- ⚠️ 当前实现总是启用bank冲突检测(无独立开关) |
404 | | - |
405 | | -**未来扩展**(可选): |
406 | | -```python |
407 | | -enable_bank_simulation = Param.Bool(True, "Enable bank conflict simulation") |
408 | | -``` |
409 | | -如果添加此参数并设为False,可以: |
410 | | -- `getBankId()` 总是返回0 |
411 | | -- 不检测bank冲突 |
412 | | -- 行为退化为原来的版本 |
413 | | - |
414 | | ---- |
415 | | - |
416 | | -## 代码统计(实际)✅ |
417 | | -- **新增代码**:约130行(包括注释) |
418 | | -- **修改代码**:约40行 |
419 | | -- **修改文件**:3个(btb_tage.hh, btb_tage.cc, BranchPredictor.py) |
420 | | -- **编译时间**:约2-3分钟(-j64,服务器配置良好) |
421 | | - |
422 | | -## 实际完成时间 ✅ |
423 | | -- **阶段1-4**:实际编码时间约1小时 |
424 | | -- **阶段5**:编译+测试约10分钟 |
425 | | -- **总计**:约1.5小时(2025-11-19完成) |
426 | | - |
427 | | ---- |
428 | | - |
429 | | -## 后续建议 |
430 | | - |
431 | | -### 1. 运行更多测试 |
432 | | -```bash |
433 | | -# 运行完整的SPEC测试观察冲突率 |
434 | | -python3 debug/run_cpt.py --debug-dir debug/bank_final |
435 | | -grep 'tage.*BankConflict' debug/bank_final/*/stats.txt |
436 | | - |
437 | | -# 统计平均冲突率 |
438 | | -grep 'updateBankConflict' debug/bank_final/*/stats.txt | awk '{sum+=$2} END {print sum}' |
439 | | -``` |
440 | | - |
441 | | -### 2. 性能对比 |
442 | | -```bash |
443 | | -# 对比有/无bank机制的性能差异 |
444 | | -grep 'cpu.ipc' debug/*/stats.txt |
445 | | -grep 'tage.*Mispred' debug/*/stats.txt |
446 | | -``` |
447 | | - |
448 | | -### 3. 可选扩展功能 |
449 | | -- 添加每个bank的访问分布统计 |
450 | | -- 实现开窗机制(如果冲突率过高) |
451 | | -- 支持动态禁用bank模拟(numBanks=1时自动禁用) |
452 | | - |
453 | | ---- |
454 | | - |
455 | | -**✅ 实现已完成并通过测试!Bank机制正常工作。** |
0 commit comments