Skip to content

Commit 06ca6f3

Browse files
committed
feat(store_queue): Tomasulo add Store Queue with SQ→LQ forwarding (Week 10)
Implement the Store Queue as an 8-entry circular buffer with commit-ordered memory writes, store-to-load forwarding, MMIO store handling, and FP64 two-phase commit support. RTL: - hw/rtl/cpu_and_mem/cpu/tomasulo/store_queue/store_queue.sv - hw/rtl/cpu_and_mem/cpu/tomasulo/store_queue/store_queue.f - hw/rtl/cpu_and_mem/cpu/tomasulo/store_queue/README.md Integration into tomasulo_wrapper: - SQ allocation from dispatch (SW/SH/SB/FSW/FSD/SC.W) - Address and data updates from MEM_RS issue - Internal SQ↔LQ disambiguation and forwarding wiring - Commit-ordered store writeback via memory interface - L0 cache invalidation on store commit Bug fixes: - store_queue: forwarding scan now treats committed stores as always older than in-flight loads, fixing is_older_than wrap when ROB head advances past the store tag - test_lq_cdb_arbitration: step after memory request to register mem_outstanding before driving response, fixing ignored response Verification: - 27 store_queue unit tests (cocotb + Verilator) - 39 tomasulo_wrapper integration tests (all passing) - Formal properties: allocation, commit ordering, forwarding, flush, MMIO, and FP64 phase handling
1 parent d34479d commit 06ca6f3

File tree

19 files changed

+3319
-192
lines changed

19 files changed

+3319
-192
lines changed

formal/store_queue.sby

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
[tasks]
2+
bmc
3+
cover
4+
5+
[options]
6+
bmc: mode bmc
7+
bmc: depth 12
8+
cover: mode cover
9+
cover: depth 20
10+
11+
[engines]
12+
smtbmc boolector
13+
14+
[script]
15+
read -formal -sv riscv_pkg.sv
16+
read -formal -sv store_queue.sv
17+
prep -top store_queue
18+
19+
[files]
20+
../hw/rtl/cpu_and_mem/cpu/riscv_pkg.sv
21+
../hw/rtl/cpu_and_mem/cpu/tomasulo/store_queue/store_queue.sv

formal/tomasulo_wrapper.sby

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ read -sv int_muldiv_shim.sv
2525
read -sv load_unit.sv
2626
read -sv lq_l0_cache.sv
2727
read -sv load_queue.sv
28+
read -sv store_queue.sv
2829
read -sv divider.sv
2930
read -sv dsp_tiled_multiplier_unsigned.sv
3031
read -sv multiplier.sv
@@ -72,6 +73,7 @@ prep -top tomasulo_wrapper
7273
../hw/rtl/cpu_and_mem/cpu/ma_stage/load_unit.sv
7374
../hw/rtl/cpu_and_mem/cpu/tomasulo/load_queue/lq_l0_cache.sv
7475
../hw/rtl/cpu_and_mem/cpu/tomasulo/load_queue/load_queue.sv
76+
../hw/rtl/cpu_and_mem/cpu/tomasulo/store_queue/store_queue.sv
7577
../hw/rtl/cpu_and_mem/cpu/ex_stage/alu/divider.sv
7678
../hw/rtl/cpu_and_mem/cpu/ex_stage/dsp_tiled_multiplier_unsigned.sv
7779
../hw/rtl/cpu_and_mem/cpu/ex_stage/alu/multiplier.sv

hw/rtl/cpu_and_mem/cpu/tomasulo/load_queue/load_queue.sv

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -288,8 +288,8 @@ module load_queue #(
288288
logic sq_can_issue;
289289
logic sq_do_forward;
290290

291-
assign sq_can_issue = o_sq_check_valid && i_sq_all_older_addrs_known && !i_sq_forward.match;
292-
assign sq_do_forward = o_sq_check_valid && i_sq_forward.can_forward;
291+
assign sq_can_issue = o_sq_check_valid && i_sq_all_older_addrs_known && !i_sq_forward.match;
292+
assign sq_do_forward = o_sq_check_valid && i_sq_forward.can_forward && !lq_is_mmio[issue_mem_idx];
293293

294294
always_comb begin
295295
o_mem_read_en = 1'b0;
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
# Store Queue
2+
3+
Commit-ordered store buffer that holds in-flight store instructions from dispatch
4+
until they write to memory after ROB commit. Entries are allocated in program order
5+
and freed when their memory write completes.
6+
7+
## Architecture
8+
9+
```
10+
Dispatch ──> [Alloc] ──> Tail
11+
|
12+
MEM_RS Issue ──> [Addr Update] ──> CAM search by rob_tag
13+
──> [Data Update] ──> CAM search by rob_tag
14+
|
15+
ROB Commit ──> [Commit Mark] ──> CAM search by rob_tag
16+
|
17+
[Memory Write] ──> Head (committed + addr_valid + data_valid)
18+
| |
19+
[FSD Phase 0] [FSD Phase 1]
20+
| |
21+
[Write Done] ──> Free entry, advance head
22+
|
23+
[L0 Cache Invalidate] ──> to LQ
24+
```
25+
26+
**Forwarding to Load Queue (combinational):**
27+
28+
```
29+
LQ ──> [SQ Check: addr, rob_tag, size]
30+
|
31+
SQ ──> scan all older entries
32+
| |
33+
[all_older_addrs_known] [newest matching store]
34+
|
35+
[can_forward?]
36+
|
37+
SQ ──> [Forward Result: match, can_forward, data] ──> LQ
38+
```
39+
40+
## Storage Strategy
41+
42+
All fields in FFs (not LUTRAM/BRAM). 8 entries at ~115 bits each (~920 bits
43+
total). Rationale:
44+
45+
- **CAM-style tag search**: Address/data update and commit must find matching
46+
`rob_tag` across all entries in parallel.
47+
- **Per-entry invalidation**: Partial flush must clear individual uncommitted
48+
entries by age comparison in a single cycle.
49+
- **Parallel scan**: Forwarding reads all entries to find matching stores.
50+
- **8 entries**: Too small for BRAM, marginal for LUTRAM.
51+
52+
## Entry Structure
53+
54+
| Field | Width | Description |
55+
|-------------|---------|------------------------------------------|
56+
| valid | 1 bit | Entry allocated |
57+
| rob_tag | 5 bits | ROB entry for this store |
58+
| is_fp | 1 bit | FP store (FSW/FSD) |
59+
| addr_valid | 1 bit | Address has been calculated |
60+
| address | 32 bits | Store address |
61+
| data_valid | 1 bit | Data is available |
62+
| data | 64 bits | Store data (FLEN for FSD) |
63+
| size | 2 bits | 00=B, 01=H, 10=W, 11=D (for FSD) |
64+
| is_mmio | 1 bit | MMIO address (bypass cache on commit) |
65+
| fp64_phase | 1 bit | FSD phase: 0=low word, 1=high word |
66+
| committed | 1 bit | ROB has committed this store |
67+
| sent | 1 bit | Written to memory |
68+
| **Total** | **~115 bits** | |
69+
70+
## Ports
71+
72+
| Port | Dir | Type | Description |
73+
|------|-----|------|-------------|
74+
| `i_clk` | in | logic | Clock |
75+
| `i_rst_n` | in | logic | Active-low reset |
76+
| `i_alloc` | in | `sq_alloc_req_t` | Allocation from dispatch |
77+
| `o_full` | out | logic | SQ is full |
78+
| `i_addr_update` | in | `sq_addr_update_t` | Address from MEM_RS issue |
79+
| `i_data_update` | in | `sq_data_update_t` | Data from MEM_RS issue (src2) |
80+
| `i_commit_valid` | in | logic | Store committed by ROB |
81+
| `i_commit_rob_tag` | in | 5 bits | Tag of committed store |
82+
| `i_sq_check_valid` | in | logic | LQ disambiguation request |
83+
| `i_sq_check_addr` | in | XLEN | Load address from LQ |
84+
| `i_sq_check_rob_tag` | in | 5 bits | Load ROB tag from LQ |
85+
| `i_sq_check_size` | in | `mem_size_e` | Load size from LQ |
86+
| `o_sq_all_older_addrs_known` | out | logic | All older stores have addr |
87+
| `o_sq_forward` | out | `sq_forward_result_t` | Forwarding result to LQ |
88+
| `o_mem_write_en` | out | logic | Memory write request |
89+
| `o_mem_write_addr` | out | XLEN | Memory write address |
90+
| `o_mem_write_data` | out | XLEN | Memory write data |
91+
| `o_mem_write_byte_en` | out | 4 bits | Byte-lane enables |
92+
| `i_mem_write_done` | in | logic | Memory write acknowledged |
93+
| `o_cache_invalidate_valid` | out | logic | L0 cache invalidation |
94+
| `o_cache_invalidate_addr` | out | XLEN | Address to invalidate |
95+
| `i_rob_head_tag` | in | 5 bits | ROB head for age comparisons |
96+
| `i_flush_en` | in | logic | Partial flush enable |
97+
| `i_flush_tag` | in | 5 bits | Partial flush tag boundary |
98+
| `i_flush_all` | in | logic | Full pipeline flush |
99+
| `o_empty` | out | logic | SQ is empty |
100+
| `o_count` | out | 4 bits | Number of valid entries |
101+
102+
## Key Behaviors
103+
104+
1. **Allocation**: On `i_alloc.valid && !o_full`, write entry at tail, advance
105+
tail pointer.
106+
107+
2. **Address Update**: CAM search all entries for matching `rob_tag` with
108+
`!addr_valid`. Write address and `is_mmio` flag.
109+
110+
3. **Data Update**: CAM search all entries for matching `rob_tag` with
111+
`!data_valid`. Write store data (FLEN-wide).
112+
113+
4. **Commit**: When ROB commits a store (`i_commit_valid`), CAM search for
114+
matching `rob_tag` and set `committed = 1`.
115+
116+
5. **Memory Write**: Head entry writes to memory when `committed && addr_valid
117+
&& data_valid && !sent`. Single outstanding write. Byte enables generated
118+
from size and address offset.
119+
120+
6. **FSD Two-Phase**: Phase 0 writes low word (addr), phase 1 writes high word
121+
(addr+4). Both must complete before entry is freed.
122+
123+
7. **Store-to-Load Forwarding**: Combinational scan of all entries. For each
124+
valid entry older than the load: check addr_valid for disambiguation, check
125+
address overlap for matching. Forward when exact address match, same size,
126+
WORD/DOUBLE, and data_valid.
127+
128+
8. **L0 Cache Invalidation**: On memory write completion, output the written
129+
address to the LQ's L0 cache for invalidation.
130+
131+
9. **Flush**: `i_flush_all` resets all state. `i_flush_en` invalidates
132+
uncommitted entries younger than `i_flush_tag`. Committed entries are never
133+
flushed (they must complete to memory).
134+
135+
## Verification
136+
137+
- **Formal**: `ifdef FORMAL` block with BMC (depth 12) and cover (depth 20).
138+
Assertions check pointer/count consistency, memory write prerequisites,
139+
forwarding invariants, committed-survives-flush, and reset behavior.
140+
- **Cocotb**: Unit tests covering reset, allocation, address/data update,
141+
commit + memory write (SW/SH/SB), FSD two-phase, FSW, store-to-load
142+
forwarding, MMIO, flush, and constrained random.
143+
144+
## Files
145+
146+
- `store_queue.sv` - Module implementation
147+
- `store_queue.f` - Cocotb compilation file list
148+
- `README.md` - This file
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Store Queue file list
2+
# Commit-ordered store buffer with store-to-load forwarding
3+
4+
# Package dependency
5+
$(ROOT)/hw/rtl/cpu_and_mem/cpu/riscv_pkg.sv
6+
7+
# Module
8+
$(ROOT)/hw/rtl/cpu_and_mem/cpu/tomasulo/store_queue/store_queue.sv

0 commit comments

Comments
 (0)