The Fetch stage is the first pipeline stage in the GEM5 O3 processor model. It is responsible for fetching instructions from the instruction cache and passing them to the Decode stage. In the XiangShan GEM5 customized version, the Fetch stage implements a decoupled frontend design to align with the XiangShan processor architecture.
重构状态: 该文档反映了fetch阶段的重构版本,原有320行的单一fetch函数已被重构为8个模块化函数,代码结构更清晰,专门针对RISC-V架构优化,为后续FDIP和2fetch特性奠定基础。
The Fetch stage communicates with other pipeline stages through a time buffer mechanism, which models the delay in communication between different stages:
TimeBuffer<TimeStruct> *timeBuffer; // Main time buffer for communication
// Wires from other stages to Fetch
TimeBuffer<TimeStruct>::wire fromDecode; // For stall signals and instruction counts
TimeBuffer<TimeStruct>::wire fromRename; // For stall signals
TimeBuffer<TimeStruct>::wire fromIEW; // For stall signals and branch resolution
TimeBuffer<TimeStruct>::wire fromCommit; // For squash signals and interrupts
// Wire to Decode stage
TimeBuffer<FetchStruct>::wire toDecode; // For sending fetched instructionsThe Fetch stage forwards fetched instructions to the Decode stage through the toDecode wire:
// In tick() method:
// Send instruction packet to decode
if (numInst) {
toDecode->insts = std::move(insts);
toDecode->size = numInst;
wroteToTimeBuffer = true;
}The Decode stage can send stall signals back to Fetch via the fromDecode wire:
// In checkSignalsAndUpdate() method:
// Check if the decode stage is stalled
if (fromDecode->decodeStall) {
stalls[tid].decode = true;
}The Commit stage sends several important signals to the Fetch stage:
- Branch misprediction signals: Indicate a branch was mispredicted and the pipeline needs to be squashed
- Interrupt signals: Indicate an interrupt needs to be processed
- Drain signals: For simulation control
// In checkSignalsAndUpdate() method:
// Check for squash from commit
if (fromCommit->commitInfo[tid].squash) {
DPRINTF(Fetch, "[tid:%i] Squashing from commit.\n", tid);
squash(fromCommit->commitInfo[tid].pc,
fromCommit->commitInfo[tid].doneSeqNum,
fromCommit->commitInfo[tid].squashInst, tid);
}
// Check for commit's interrupt signals
if (fromCommit->commitInfo[tid].interruptPending) {
interruptPending = true;
}The IEW (Issue/Execute/Writeback) stage provides branch resolution feedback to the Fetch stage:
// In checkSignalsAndUpdate() method:
// Check for squash from IEW (mispredicted branch)
if (fromIEW->iewInfo[tid].squash) {
DPRINTF(Fetch, "[tid:%i] Squashing from IEW.\n", tid);
squash(fromIEW->iewInfo[tid].pc,
fromIEW->iewInfo[tid].doneSeqNum,
fromIEW->iewInfo[tid].squashInst, tid);
}The Fetch stage interacts with the branch predictor to determine the next PC to fetch:
bool lookupAndUpdateNextPC(const DynInstPtr &inst, PCStateBase &pc) {
// Access branch predictor and update PC
bool predicted_taken = getBp()->predict(inst, pc, inst->pcState());
// ... additional logic ...
return predicted_taken;
}The fetch buffer holds raw instruction data fetched from the instruction cache:
uint8_t *fetchBuffer[MaxThreads]; // Raw instruction data
Addr fetchBufferPC[MaxThreads]; // PC of first instruction in buffer
bool fetchBufferValid[MaxThreads]; // Whether buffer data is valid
unsigned fetchBufferSize; // Size of fetch buffer in bytes
Addr fetchBufferMask; // Mask to align PC to fetch buffer boundaryThe fetch buffer works as a temporary storage between the instruction cache and the instruction queue. Instructions are fetched from the instruction cache in cache-line-sized chunks and stored in the fetch buffer.
The fetch queue stores the processed dynamic instructions before they are sent to decode:
std::deque<DynInstPtr> fetchQueue[MaxThreads]; // Queue of fetched instructions
unsigned fetchQueueSize; // Maximum size of fetch queueFor I-cache access management:
RequestPtr memReq[MaxThreads]; // Primary memory request
RequestPtr anotherMemReq[MaxThreads]; // Used for unaligned access
PacketPtr firstPkt[MaxThreads]; // First packet for I-cache access
PacketPtr secondPkt[MaxThreads]; // Second packet for unaligned access
std::pair<Addr, Addr> accessInfo[MaxThreads]; // Address info for cache access// Overall fetch status
enum FetchStatus { Active, Inactive } _status;
// Per-thread status
enum ThreadStatus {
Running, Idle, Squashing, Blocked, Fetching, TrapPending,
QuiescePending, ItlbWait, IcacheWaitResponse, IcacheWaitRetry,
IcacheAccessComplete, NoGoodAddr, NumFetchStatus
} fetchStatus[MaxThreads];
// Stall tracking
struct Stalls {
bool decode;
bool drain;
} stalls[MaxThreads];
// Stall reason tracking
std::vector<StallReason> stallReason;branch_prediction::BPredUnit *branchPred; // Main branch predictor
branch_prediction::stream_pred::DecoupledStreamBPU *dbsp; // Stream predictor
branch_prediction::ftb_pred::DecoupledBPUWithFTB *dbpftb; // FTB predictor
branch_prediction::btb_pred::DecoupledBPUWithBTB *dbpbtb; // BTB predictorbranch_prediction::ftb_pred::LoopBuffer *loopBuffer; // Loop buffer
bool enableLoopBuffer; // Loop buffer enable flag
unsigned currentLoopIter; // Current loop iteration counter
bool currentFetchTargetInLoop; // If current fetch is in a loop基于重构后的代码实现,fetch阶段的主要执行流程如下:
tick()
|
+--> initializeTickState() // Initialize state for this tick cycle
|
+--> checkSignalsAndUpdate() // Check signals from other stages for all active threads
|
+--> Update fetch status distribution stats
|
+--> Reset pipelined fetch flags
|
+--> fetchAndProcessInstructions() // Perform fetch operations and instruction delivery
|
+--> fetch() // Fetch instructions from active threads (loop for numFetchingThreads)
| |
| +--> selectFetchThread() // Select thread to fetch from
| |
| +--> checkDecoupledFrontend() // Check FTQ availability for decoupled frontend
| |
| +--> prepareFetchAddress() // Handle status transitions and address preparation
| |
| +--> performInstructionFetch() // Main instruction fetching logic
| |
| +--> Instruction fetch loop (while numInst < fetchWidth)
| |
| +--> checkMemoryNeeds() // Check decoder needs and supply bytes
| |
| +--> Inner loop for macroop handling:
| |
| +--> processInstructionDecoding() // Decode and create DynInst
| |
| +--> handleBranchAndNextPC() // Branch prediction and PC update
| |
| +--> Handle macroop transitions
|
+--> Pass stall reasons to decode stage
|
+--> Record instruction fetch statistics
|
+--> handleInterrupts() // Handle interrupt processing (FullSystem)
|
+--> sendInstructionsToDecode() // Send instructions to decode with stall reason updates
|
+--> updateBranchPredictors() // Handle branch prediction updates (BTB/FTB/Stream)
checkSignalsAndUpdate() // For each active thread
|
+--> Update per-thread stall statuses
|
+--> Process decode block/unblock signals
|
+--> Check squash signals from Commit
|
+--> Handle branch misprediction squash
|
+--> Handle trap squash
|
+--> Handle non-control squash
|
+--> Update decoupled branch predictor (BTB/FTB/Stream)
|
+--> Process normal commit updates (update branch predictor)
|
+--> Check squash signals from Decode
|
+--> Handle branch misprediction from decode
|
+--> Update decoupled branch predictor
|
+--> Check drain stall conditions
|
+--> Update fetch status (Blocked -> Running transition)
fetchCacheLine()
|
+--> Create memory request(s)
|
+--> Send request(s) to I-cache
|
+--> Wait for response in recvTimingResp()
|
+--> Process received data
|
+--> Update fetch buffer
|
+--> Update fetch status
lookupAndUpdateNextPC()
|
+--> Check if using decoupled frontend
|
+--> If DecoupledBPUWithBTB: call decoupledPredict()
| |
| +--> Get prediction and usedUpFetchTargets status
| |
| +--> Set instruction loop iteration info
|
+--> If non-decoupled: call traditional branchPred->predict()
|
+--> Handle non-control instructions (advance PC normally)
|
+--> Handle control instructions
|
+--> Set prediction target and taken status
|
+--> Update branch statistics
|
+--> Return prediction result
buildInst()
|
+--> Get sequence number from CPU
|
+--> Create DynInst with static instruction info
|
+--> Set thread-specific information
|
+--> For decoupled frontend: set FSQ and FTQ IDs
|
+--> Add to CPU instruction list
|
+--> Add to fetch queue
|
+--> Handle delayed commit flags
目前主要使用的分支预测器是DecoupledBPUWithBTB,这是一个解耦前端设计,将分支预测与指令获取分离:
// Check if using decoupled frontend
bool isDecoupledFrontend() { return branchPred->isDecoupled(); }
// Different predictor types (目前主要使用BTB)
bool isStreamPred() { return branchPred->isStream(); }
bool isFTBPred() { return branchPred->isFTB(); }
bool isBTBPred() { return branchPred->isBTB(); } // 主要使用的预测器类型
// Track if FTQ is empty
bool shouldStopFetchThisCycle(bool predictedBranch)
{
if (waitForVsetvl) {
return true;
}
if (isDecoupledFrontend()) {
return usedUpFetchTargets;
}
return predictedBranch;
}- 初始化: 在构造函数中检测并初始化BTB预测器
if (isBTBPred()) {
dbpbtb = dynamic_cast<branch_prediction::btb_pred::DecoupledBPUWithBTB*>(branchPred);
assert(dbpbtb);
usedUpFetchTargets = true;
dbpbtb->setCpu(_cpu);
}- 每周期更新: 在updateBranchPredictors()中
if (isBTBPred()) {
assert(dbpbtb);
dbpbtb->tick();
usedUpFetchTargets = !dbpbtb->trySupplyFetchWithTarget(pc[0]->instAddr(), currentFetchTargetInLoop);
}- Fetch Target检查: 在fetch()函数开始时检查FTQ是否有可用目标
if (isBTBPred()) {
if (!dbpbtb->fetchTargetAvailable()) {
dbpbtb->addFtqNotValid();
DPRINTF(Fetch, "Skip fetch when FTQ head is not available\n");
return;
}
}- 分支预测调用: 在lookupAndUpdateNextPC()中
if (isBTBPred()) {
std::tie(predict_taken, usedUpFetchTargets) =
dbpbtb->decoupledPredict(
inst->staticInst, inst->seqNum, next_pc, tid, currentLoopIter);
}- Squash处理: 支持不同类型的squash操作
// Control squash (branch misprediction)
dbpbtb->controlSquash(ftqId, fsqId, oldPC, newPC, staticInst, instBytes, taken, seqNum, tid, loopIter, fromCommit);
// Trap squash
dbpbtb->trapSquash(targetId, streamId, committedPC, newPC, tid, loopIter);
// Non-control squash
dbpbtb->nonControlSquash(targetId, streamId, newPC, 0, tid, loopIter);- Fetch Target Queue (FTQ): 存储预测的fetch目标地址
- Stream Queue (FSQ): 管理指令流信息
- Loop Iteration Tracking: 跟踪循环迭代信息
- Target Availability Check: 每周期检查是否有可用的fetch目标
- Preserved Return Address: 支持函数返回地址的特殊处理
The Fetch stage has its own port to access the instruction cache:
class IcachePort : public RequestPort {
// Handles timing requests to I-cache
virtual bool recvTimingResp(PacketPtr pkt);
// Handles retry signals from I-cache
virtual void recvReqRetry();
};
IcachePort icachePort;The Fetch stage handles instruction address translation:
class FetchTranslation : public BaseMMU::Translation {
// Called when translation completes
void finish(const Fault &fault, const RequestPtr &req,
ThreadContext *tc, BaseMMU::Mode mode);
};
// Event to handle delayed translation results
class FinishTranslationEvent : public Event {
// Process translation result
void process();
};
FinishTranslationEvent finishTranslationEvent;The Fetch stage has multiple thread selection policies:
// Thread selection policies
ThreadID getFetchingThread(); // Main policy selection function
ThreadID roundRobin(); // Round robin policy
ThreadID iqCount(); // Based on instruction queue count
ThreadID lsqCount(); // Based on load/store queue count
ThreadID branchCount(); // Based on branch countstruct FetchStatGroup : public statistics::Group {
// Stall statistics
statistics::Scalar icacheStallCycles;
statistics::Scalar tlbCycles;
statistics::Scalar idleCycles;
statistics::Scalar blockedCycles;
// Instruction statistics
statistics::Scalar insts;
statistics::Scalar branches;
statistics::Scalar predictedBranches;
// Performance metrics
statistics::Formula idleRate;
statistics::Formula branchRate;
statistics::Formula rate;
// Frontend performance metrics
statistics::Formula frontendBound;
statistics::Formula frontendLatencyBound;
statistics::Formula frontendBandwidthBound;
};- 解耦前端设计: 支持BTB、FTB和Stream-based预测(目前主要使用DecoupledBPUWithBTB)
- TAGE, ITTAGE和Loop Predictor: 与XiangShan对齐的高级分支预测
- 指令延迟校准: 时序校准以匹配昆明湖硬件特性
- RISC-V特有支持: 如vsetvl指令的特殊处理
- Loop Buffer: 缓存循环指令以提高能效
- 流水线式I-cache访问: 允许overlapping的多个I-cache访问
- Fetch节流控制: 基于后端压力控制fetch速率
- Misaligned Access处理: 支持跨cache line的指令获取
- Intel TopDown性能分析: 详细的前端性能瓶颈分析
每个时钟周期执行一次的主要函数,包含三个主要阶段:
-
initializeTickState(): 初始化周期状态
- 重置状态变化标志和时间缓冲写入标志
- 更新fetch状态统计分布
- 重置流水线ifetch标志
- 处理vsetvl等待状态(RISC-V特有)
-
fetchAndProcessInstructions(): 执行fetch操作和指令处理
- 循环处理所有活跃线程的fetch操作
- 传递stall原因到decode阶段
- 记录指令fetch统计信息
- 处理中断(FullSystem模式)
- 发送指令到decode阶段并测量前端气泡
-
updateBranchPredictors(): 更新分支预测器
- 调用分支预测器的tick()方法
- 尝试为fetch提供目标地址
- 更新usedUpFetchTargets状态
重构后的fetch()函数更加模块化,分为四个清晰的阶段:
void fetch(bool &status_change) {
ThreadID tid = selectFetchThread(); // 线程选择
if (tid == InvalidThreadID) return;
if (!checkDecoupledFrontend(tid)) return; // 解耦前端检查
Addr fetch_addr;
if (!prepareFetchAddress(tid, status_change, fetch_addr)) return; // 地址准备
performInstructionFetch(tid, fetch_addr, status_change); // 指令获取
}各阶段详细说明:
-
selectFetchThread(): 线程选择和基础检查
- 调用getFetchingThread()选择要fetch的线程
- 处理无效线程ID的情况
- 更新线程fetch统计信息
-
checkDecoupledFrontend(): 解耦前端检查
- 检查FTQ(Fetch Target Queue)是否有可用的fetch目标
- 支持BTB/FTB/Stream三种预测器类型
- 在FTQ为空时设置相应的stall原因并返回
-
prepareFetchAddress(): 地址准备和状态处理
- 处理IcacheAccessComplete状态转换
- 检查fetch buffer有效性和中断条件
- 准备fetch地址,处理cache访问逻辑
- 管理fetchStatus状态转换
-
performInstructionFetch(): 主要指令获取循环
- 执行主要的指令解码和获取逻辑
- 管理fetch宽度和队列大小限制
- 处理分支预测和PC更新
重构后的performInstructionFetch()函数进一步模块化,包含三个专用子函数:
void performInstructionFetch(ThreadID tid, Addr fetch_addr, bool &status_change) {
// 主循环: 处理直到fetch宽度或其他限制
while (numInst < fetchWidth && fetchQueue[tid].size() < fetchQueueSize &&
!shouldStopFetchThisCycle(predictedBranch)) {
// 1. 检查内存需求并供给decoder
stall = checkMemoryNeeds(tid, this_pc, curMacroop);
if (stall != StallReason::NoStall) break;
// 2. 内层循环: 从缓冲的内存中提取尽可能多的指令
do {
instruction = processInstructionDecoding(tid, this_pc, next_pc,
staticInst, curMacroop, newMacro);
handleBranchAndNextPC(instruction, this_pc, next_pc,
predictedBranch, newMacro);
} while (curMacroop && limitChecks);
}
}子函数功能说明:
专门处理RISC-V架构的decoder字节供给:
StallReason checkMemoryNeeds(ThreadID tid, const PCStateBase &this_pc,
StaticInstPtr &curMacroop) {
// 1. Macroop处理: 如果是macroop,不需要新的内存字节
if (curMacroop) return StallReason::NoStall;
// 2. Fetch Buffer检查: 验证buffer有效性和范围
if (!fetchBufferValid[tid] || PC超出范围) {
return StallReason::IcacheStall;
}
// 3. 字节供给: 为RISC-V提供4字节对齐的数据
memcpy(decoder->moreBytesPtr(), fetchBuffer + offset, 4);
decoder->moreBytes(this_pc, fetch_pc);
return StallReason::NoStall;
}统一处理指令解码和动态指令创建:
DynInstPtr processInstructionDecoding(ThreadID tid, PCStateBase &this_pc,
const std::unique_ptr<PCStateBase> &next_pc,
StaticInstPtr &staticInst,
StaticInstPtr &curMacroop, bool &newMacro) {
// 1. 指令解码: 普通指令或macroop microops
if (!curMacroop) {
staticInst = decoder->decode(this_pc); // 解码新指令
if (staticInst->isMacroop()) curMacroop = staticInst;
} else {
staticInst = curMacroop->fetchMicroop(this_pc.microPC()); // 获取microop
newMacro |= staticInst->isLastMicroop();
}
// 2. 动态指令创建: 调用buildInst()创建DynInst
DynInstPtr instruction = buildInst(tid, staticInst, curMacroop, this_pc, *next_pc, true);
// 3. RISC-V特殊处理: vector配置指令处理
if (staticInst->isVectorConfig()) {
waitForVsetvl = decoder->stall();
}
return instruction;
}集中处理分支预测和PC状态管理:
void handleBranchAndNextPC(DynInstPtr instruction, PCStateBase &this_pc,
std::unique_ptr<PCStateBase> &next_pc,
bool &predictedBranch, bool &newMacro) {
// 1. PC状态准备: 保存当前PC到next_pc
set(next_pc, this_pc);
// 2. 分支预测: 区分解耦和非解耦前端
if (!isDecoupledFrontend()) {
predictedBranch |= this_pc.branching();
}
// 对于解耦前端,需要调用lookupAndUpdateNextPC()来更新next_pc,并判断当前pc 是否跳出了当前FTQ,如果跳出了,则需要移动到下一个FTQ
predictedBranch |= lookupAndUpdateNextPC(instruction, *next_pc);
// 3. Macroop转换检查: 检查是否移动到新macroop
newMacro |= this_pc.instAddr() != next_pc->instAddr();
// 4. PC更新: 设置下一周期的PC
set(this_pc, *next_pc);
}处理来自其他流水线阶段的控制信号:
- Decode阶段信号: 处理block/unblock信号
- Commit阶段信号:
- 处理squash信号(分支误预测、trap、非控制squash)
- 更新分支预测器状态
- 处理中断信号
- Decode阶段Squash: 处理来自decode的分支误预测
- 状态转换: 管理Blocked/Running状态转换
- 通过ThreadID区分不同线程状态和操作
- 支持多种线程选择策略(RoundRobin、IQCount、LSQCount等)
- 每个线程独立的fetch状态和缓冲区
- Stall机制: 详细的stall原因跟踪和传递
- 流水线式I-cache访问: 支持overlapping的cache访问
- 状态机管理: 完整的fetch状态转换逻辑
- FTQ管理: Fetch Target Queue提供预测的fetch目标
- 每周期检查: 确保FTQ有可用目标才进行fetch
- Loop支持: 跟踪循环迭代信息和循环内fetch
- Intel TopDown方法: 测量前端气泡(frontend bubbles)
- 详细统计: 收集各种性能指标和stall原因
- Frontend Bound分析: 区分延迟bound和带宽bound
- Fetch Buffer: 缓存从I-cache获取的指令数据
- Misaligned Access: 支持跨cache line的指令获取
- Memory Request管理: 处理I-cache访问和TLB翻译
- Squash操作: 支持多种类型的pipeline flush
- Translation Fault: 处理地址翻译错误
- Cache Miss: 处理I-cache miss和retry逻辑
当前fetch阶段支持跨越2个cacheline的指令获取(misaligned fetch),默认每拍都会访问两个cacheline 来获取66Byte 的fetchBuffer。基于对源代码的详细分析,以下是准确的状态转移图。
正常取指流程:
- Running → ItlbWait (在fetchCacheLine()中发起地址翻译)
- ItlbWait → IcacheWaitResponse (在finishTranslation()中翻译成功,发送cache请求)
- IcacheWaitResponse → IcacheAccessComplete (在processCacheCompletion()中接收到cache数据)
- IcacheAccessComplete → Running (在prepareFetchAddress()中准备下一次fetch)
错误和重试流程:
- ItlbWait → IcacheWaitRetry (cache请求被拒绝,MSHR满)
- IcacheWaitRetry → IcacheWaitResponse (在recvReqRetry()中重试成功)
- ItlbWait → TrapPending (翻译fault)
- ItlbWait → NoGoodAddr (无效物理地址)
Squash流程:
- 任何状态 → Squashing (在checkSignalsAndUpdate()中收到squash信号)
- Squashing → Running (squash处理完成)
graph TD
%% 基础状态
Idle("Idle<br/>空闲状态<br/>线程不活跃")
Running("Running<br/>正常运行<br/>可以fetch指令")
Blocked("Blocked<br/>被阻塞<br/>checkStall()返回true")
Squashing("Squashing<br/>正在清理<br/>pipeline flush")
%% Cache访问状态
ItlbWait("ItlbWait<br/>等待TLB翻译<br/>fetchCacheLine()发起")
IcacheWaitResponse("IcacheWaitResponse<br/>等待I-cache响应<br/>等待2个packet完成")
IcacheWaitRetry("IcacheWaitRetry<br/>等待I-cache重试<br/>MSHR满,请求被拒绝")
IcacheAccessComplete("IcacheAccessComplete<br/>I-cache访问完成<br/>数据已接收")
%% 特殊状态
TrapPending("TrapPending<br/>等待trap处理<br/>翻译fault发生")
QuiescePending("QuiescePending<br/>等待quiesce<br/>quiesce指令处理")
NoGoodAddr("NoGoodAddr<br/>地址无效<br/>超出物理内存范围")
%% 主要状态转换 - 正常fetch流程
Idle -->|"线程激活<br/>checkSignalsAndUpdate()"| Running
Running -->|"需要cache line<br/>fetchCacheLine()"| ItlbWait
%% TLB翻译完成的多种结果
ItlbWait -->|"翻译成功<br/>finishTranslation()<br/>cache请求发送"| IcacheWaitResponse
ItlbWait -->|"cache请求被拒绝<br/>finishTranslation()<br/>MSHR满"| IcacheWaitRetry
ItlbWait -->|"翻译fault<br/>finishTranslation()<br/>handleTranslationFault()"| TrapPending
ItlbWait -->|"无效物理地址<br/>finishTranslation()<br/>超出内存范围"| NoGoodAddr
%% Cache访问流程
IcacheWaitResponse -->|"所有packet到达<br/>processCacheCompletion()<br/>数据完整"| IcacheAccessComplete
IcacheWaitResponse -->|"cache再次拒绝<br/>port busy"| IcacheWaitRetry
IcacheWaitRetry -->|"重试成功<br/>recvReqRetry()<br/>port可用"| IcacheWaitResponse
IcacheAccessComplete -->|"准备下次fetch<br/>prepareFetchAddress()"| Running
%% Squash处理 - 可以从任何状态发生
Running -->|"收到squash信号<br/>checkSignalsAndUpdate()<br/>分支误预测/异常"| Squashing
Blocked -->|"收到squash信号<br/>checkSignalsAndUpdate()"| Squashing
ItlbWait -->|"收到squash信号<br/>doSquash()<br/>清理请求"| Squashing
IcacheWaitResponse -->|"收到squash信号<br/>doSquash()<br/>清理pending请求"| Squashing
IcacheWaitRetry -->|"收到squash信号<br/>doSquash()"| Squashing
IcacheAccessComplete -->|"收到squash信号<br/>doSquash()"| Squashing
TrapPending -->|"收到squash信号<br/>doSquash()"| Squashing
NoGoodAddr -->|"收到squash信号<br/>doSquash()"| Squashing
%% Squash完成后恢复
Squashing -->|"squash处理完成<br/>checkSignalsAndUpdate()<br/>清理状态完成"| Running
%% 阻塞和恢复
Running -->|"检测到stall<br/>checkSignalsAndUpdate()<br/>decode/drain stall"| Blocked
IcacheAccessComplete -->|"下游stall<br/>processCacheCompletion()<br/>decode busy"| Blocked
Blocked -->|"stall清除<br/>checkSignalsAndUpdate()<br/>下游ready"| Running
%% 特殊状态处理
Running -->|"quiesce指令<br/>特殊系统调用"| QuiescePending
TrapPending -->|"trap处理完成<br/>commit阶段处理"| Running
QuiescePending -->|"quiesce完成<br/>系统恢复"| Running
NoGoodAddr -->|"地址修正<br/>重新fetch"| Running
%% FTQ相关转换(解耦前端)
Running -.->|"FTQ空<br/>usedUpFetchTargets=true<br/>需要新fetch target"| Running
%% 样式定义
classDef normal fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef cache fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef special fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef squash fill:#ffebee,stroke:#c62828,stroke-width:2px
class Idle,Running,Blocked normal
class ItlbWait,IcacheWaitResponse,IcacheWaitRetry,IcacheAccessComplete cache
class TrapPending,QuiescePending,NoGoodAddr special
class Squashing squash
- 处理squash信号: 来自commit/decode阶段的pipeline flush
- 处理stall信号: decode/drain等下游阻塞信号
- 状态验证: 检查各种stall条件并更新状态
- 发起TLB翻译: Running → ItlbWait
- 处理翻译结果: 根据翻译结果分发到不同状态
- 多cacheline支持: 创建2个cache请求以获取66字节数据
- 数据接收: IcacheWaitResponse → IcacheAccessComplete
- 完整性检查: 确保所有packet都已接收
- stall检测: 如果下游busy则转到Blocked状态
- 清理请求: 清除所有pending的cache/TLB请求
- 重置状态: 清理fetchBuffer和相关状态
- FTQ重置: 强制获取新的fetch target
- 重试机制: IcacheWaitRetry → IcacheWaitResponse
- 港口管理: 处理cache端口繁忙的情况
当前实现的关键特点:
- 默认跨2个cacheline: 每次fetch访问2个64字节cacheline
- 统一请求管理: 使用CacheRequest结构管理多个packet
- 完整性保证: 只有所有packet到达才转换到IcacheAccessComplete
- 错误恢复: 支持部分packet重试和错误处理
- FTQ检查: 每周期检查Fetch Target Queue是否有可用目标
- 目标耗尽: usedUpFetchTargets标志控制是否需要新的fetch target
- 验证逻辑: validateTranslationRequest()确保请求的有效性
- Loop支持: 跟踪循环迭代和循环内的特殊处理
基于cache状态分离重构,我们引入了WaitingCache状态来明确区分fetch线程的真实状态。新的设计将fetch整体状态与cache访问状态完全分离。
enum ThreadStatus {
Running, // 真正可以进行instruction fetch,无任何等待
Idle, // 线程不活跃,无fetch需求
Blocked, // 被下游阻塞,无法继续fetch
Squashing, // 正在执行pipeline flush
TrapPending, // 等待trap处理(translation fault等)
WaitingCache, // 等待任何形式的cache/TLB响应
};enum CacheRequestStatus {
CacheIdle, // 无活跃请求
TlbWait, // 等待TLB翻译完成
CacheWaitResponse, // 等待cache数据返回
CacheWaitRetry, // 等待cache重试机会
AccessComplete, // 访问完成,数据可用
AccessFailed, // 访问失败(地址无效等)
Cancelled // 请求被取消(squash等)
};graph TD
%% Fetch整体状态转移
subgraph "Fetch ThreadStatus (整体状态)"
Idle_F("Idle<br/>线程不活跃")
Running_F("Running<br/>可以立即fetch<br/>无任何等待")
WaitingCache_F("WaitingCache<br/>等待cache/TLB响应<br/>线程仍在运行")
Blocked_F("Blocked<br/>被下游阻塞")
Squashing_F("Squashing<br/>执行pipeline flush")
TrapPending_F("TrapPending<br/>等待trap处理")
end
%% Cache请求状态转移
subgraph "Cache RequestStatus (每个请求的状态)"
CacheIdle_C("CacheIdle<br/>无活跃请求")
TlbWait_C("TlbWait<br/>等待TLB翻译")
CacheWaitResponse_C("CacheWaitResponse<br/>等待cache响应")
CacheWaitRetry_C("CacheWaitRetry<br/>等待重试机会")
AccessComplete_C("AccessComplete<br/>数据已到达")
AccessFailed_C("AccessFailed<br/>访问失败")
Cancelled_C("Cancelled<br/>请求被取消")
end
%% 主要状态转移 - Fetch整体状态
Idle_F -->|"线程激活"| Running_F
Running_F -->|"需要cache访问<br/>fetchCacheLine()"| WaitingCache_F
WaitingCache_F -->|"cache访问完成<br/>processCacheCompletion()"| Running_F
WaitingCache_F -->|"translation fault<br/>handleTranslationFault()"| TrapPending_F
Running_F -->|"下游stall<br/>decode busy"| Blocked_F
WaitingCache_F -->|"下游stall"| Blocked_F
Blocked_F -->|"stall清除"| Running_F
%% Squash可以从任何状态发生
Running_F -->|"收到squash信号"| Squashing_F
WaitingCache_F -->|"收到squash信号"| Squashing_F
Blocked_F -->|"收到squash信号"| Squashing_F
TrapPending_F -->|"收到squash信号"| Squashing_F
Squashing_F -->|"squash完成"| Running_F
TrapPending_F -->|"trap处理完成"| Running_F
%% Cache请求状态转移
CacheIdle_C -->|"发起TLB翻译<br/>fetchCacheLine()"| TlbWait_C
TlbWait_C -->|"翻译成功<br/>finishTranslation()"| CacheWaitResponse_C
TlbWait_C -->|"翻译失败"| AccessFailed_C
TlbWait_C -->|"squash中断"| Cancelled_C
CacheWaitResponse_C -->|"数据返回<br/>processCacheCompletion()"| AccessComplete_C
CacheWaitResponse_C -->|"cache拒绝<br/>MSHR满"| CacheWaitRetry_C
CacheWaitResponse_C -->|"squash中断"| Cancelled_C
CacheWaitRetry_C -->|"重试成功<br/>recvReqRetry()"| CacheWaitResponse_C
CacheWaitRetry_C -->|"squash中断"| Cancelled_C
AccessComplete_C -->|"状态清理<br/>reset()"| CacheIdle_C
AccessFailed_C -->|"状态清理"| CacheIdle_C
Cancelled_C -->|"状态清理"| CacheIdle_C
%% 关联关系(虚线表示状态检查)
Running_F -.->|"检查cache状态<br/>canFetchInstructions()"| CacheIdle_C
WaitingCache_F -.->|"等待完成<br/>hasPendingCacheRequests()"| TlbWait_C
WaitingCache_F -.->|"等待完成"| CacheWaitResponse_C
WaitingCache_F -.->|"等待完成"| CacheWaitRetry_C
%% 样式定义
classDef fetchStatus fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef cacheStatus fill:#fff3e0,stroke:#f57c00,stroke-width:2px
classDef transition fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
classDef errorStatus fill:#ffebee,stroke:#d32f2f,stroke-width:2px
class Idle_F,Running_F,Blocked_F fetchStatus
class WaitingCache_F transition
class Squashing_F,TrapPending_F errorStatus
class CacheIdle_C,TlbWait_C,CacheWaitResponse_C,CacheWaitRetry_C,AccessComplete_C cacheStatus
class AccessFailed_C,Cancelled_C errorStatus
- Running: 真正可以立即进行instruction fetch,没有任何等待
- WaitingCache: 线程在运行,但等待cache/TLB响应,不能进行新的fetch
- Cache状态独立: 每个cache请求有自己的生命周期状态
// 发起cache请求时
Running → WaitingCache (在fetchCacheLine()中)
// cache完成时
WaitingCache → Running (在processCacheCompletion()中)
// 异常情况
WaitingCache → TrapPending (翻译失败)
WaitingCache → Squashing (收到squash信号)bool canFetchInstructions(ThreadID tid) const {
// 只有真正Running且没有pending cache请求才能fetch
return fetchStatus[tid] == Running &&
cacheReq[tid].getOverallStatus() == CacheIdle;
}
bool isWaitingForCache(ThreadID tid) const {
return fetchStatus[tid] == WaitingCache;
}- Running语义过载: 现在Running只表示可以立即fetch
- 状态转移清晰: WaitingCache明确表示等待cache响应
- 调试友好: 状态转移路径一目了然
- 扩展性好: 为2fetch等特性预留空间
bool validateFetchCacheConsistency(ThreadID tid) const {
auto cacheStatus = cacheReq[tid].getOverallStatus();
auto threadStatus = fetchStatus[tid];
// Running状态时,应该没有pending cache请求
if (threadStatus == Running &&
(cacheStatus == TlbWait || cacheStatus == CacheWaitResponse ||
cacheStatus == CacheWaitRetry)) {
return false;
}
// WaitingCache状态时,应该有active cache请求
if (threadStatus == WaitingCache && cacheStatus == CacheIdle) {
return false;
}
return true;
}这个新的状态设计解决了之前Running状态语义过载的问题,让fetch的状态转移更加清晰和易于理解。