|
| 1 | +# Store Queue |
| 2 | + |
| 3 | +Commit-ordered store buffer that holds in-flight store instructions from dispatch |
| 4 | +until they write to memory after ROB commit. Entries are allocated in program order |
| 5 | +and freed when their memory write completes. |
| 6 | + |
| 7 | +## Architecture |
| 8 | + |
| 9 | +``` |
| 10 | + Dispatch ──> [Alloc] ──> Tail |
| 11 | + | |
| 12 | + MEM_RS Issue ──> [Addr Update] ──> CAM search by rob_tag |
| 13 | + ──> [Data Update] ──> CAM search by rob_tag |
| 14 | + | |
| 15 | + ROB Commit ──> [Commit Mark] ──> CAM search by rob_tag |
| 16 | + | |
| 17 | + [Memory Write] ──> Head (committed + addr_valid + data_valid) |
| 18 | + | | |
| 19 | + [FSD Phase 0] [FSD Phase 1] |
| 20 | + | | |
| 21 | + [Write Done] ──> Free entry, advance head |
| 22 | + | |
| 23 | + [L0 Cache Invalidate] ──> to LQ |
| 24 | +``` |
| 25 | + |
| 26 | +**Forwarding to Load Queue (combinational):** |
| 27 | + |
| 28 | +``` |
| 29 | + LQ ──> [SQ Check: addr, rob_tag, size] |
| 30 | + | |
| 31 | + SQ ──> scan all older entries |
| 32 | + | | |
| 33 | + [all_older_addrs_known] [newest matching store] |
| 34 | + | |
| 35 | + [can_forward?] |
| 36 | + | |
| 37 | + SQ ──> [Forward Result: match, can_forward, data] ──> LQ |
| 38 | +``` |
| 39 | + |
| 40 | +## Storage Strategy |
| 41 | + |
| 42 | +All fields in FFs (not LUTRAM/BRAM). 8 entries at ~115 bits each (~920 bits |
| 43 | +total). Rationale: |
| 44 | + |
| 45 | +- **CAM-style tag search**: Address/data update and commit must find matching |
| 46 | + `rob_tag` across all entries in parallel. |
| 47 | +- **Per-entry invalidation**: Partial flush must clear individual uncommitted |
| 48 | + entries by age comparison in a single cycle. |
| 49 | +- **Parallel scan**: Forwarding reads all entries to find matching stores. |
| 50 | +- **8 entries**: Too small for BRAM, marginal for LUTRAM. |
| 51 | + |
| 52 | +## Entry Structure |
| 53 | + |
| 54 | +| Field | Width | Description | |
| 55 | +|-------------|---------|------------------------------------------| |
| 56 | +| valid | 1 bit | Entry allocated | |
| 57 | +| rob_tag | 5 bits | ROB entry for this store | |
| 58 | +| is_fp | 1 bit | FP store (FSW/FSD) | |
| 59 | +| addr_valid | 1 bit | Address has been calculated | |
| 60 | +| address | 32 bits | Store address | |
| 61 | +| data_valid | 1 bit | Data is available | |
| 62 | +| data | 64 bits | Store data (FLEN for FSD) | |
| 63 | +| size | 2 bits | 00=B, 01=H, 10=W, 11=D (for FSD) | |
| 64 | +| is_mmio | 1 bit | MMIO address (bypass cache on commit) | |
| 65 | +| fp64_phase | 1 bit | FSD phase: 0=low word, 1=high word | |
| 66 | +| committed | 1 bit | ROB has committed this store | |
| 67 | +| sent | 1 bit | Written to memory | |
| 68 | +| **Total** | **~115 bits** | | |
| 69 | + |
| 70 | +## Ports |
| 71 | + |
| 72 | +| Port | Dir | Type | Description | |
| 73 | +|------|-----|------|-------------| |
| 74 | +| `i_clk` | in | logic | Clock | |
| 75 | +| `i_rst_n` | in | logic | Active-low reset | |
| 76 | +| `i_alloc` | in | `sq_alloc_req_t` | Allocation from dispatch | |
| 77 | +| `o_full` | out | logic | SQ is full | |
| 78 | +| `i_addr_update` | in | `sq_addr_update_t` | Address from MEM_RS issue | |
| 79 | +| `i_data_update` | in | `sq_data_update_t` | Data from MEM_RS issue (src2) | |
| 80 | +| `i_commit_valid` | in | logic | Store committed by ROB | |
| 81 | +| `i_commit_rob_tag` | in | 5 bits | Tag of committed store | |
| 82 | +| `i_sq_check_valid` | in | logic | LQ disambiguation request | |
| 83 | +| `i_sq_check_addr` | in | XLEN | Load address from LQ | |
| 84 | +| `i_sq_check_rob_tag` | in | 5 bits | Load ROB tag from LQ | |
| 85 | +| `i_sq_check_size` | in | `mem_size_e` | Load size from LQ | |
| 86 | +| `o_sq_all_older_addrs_known` | out | logic | All older stores have addr | |
| 87 | +| `o_sq_forward` | out | `sq_forward_result_t` | Forwarding result to LQ | |
| 88 | +| `o_mem_write_en` | out | logic | Memory write request | |
| 89 | +| `o_mem_write_addr` | out | XLEN | Memory write address | |
| 90 | +| `o_mem_write_data` | out | XLEN | Memory write data | |
| 91 | +| `o_mem_write_byte_en` | out | 4 bits | Byte-lane enables | |
| 92 | +| `i_mem_write_done` | in | logic | Memory write acknowledged | |
| 93 | +| `o_cache_invalidate_valid` | out | logic | L0 cache invalidation | |
| 94 | +| `o_cache_invalidate_addr` | out | XLEN | Address to invalidate | |
| 95 | +| `i_rob_head_tag` | in | 5 bits | ROB head for age comparisons | |
| 96 | +| `i_flush_en` | in | logic | Partial flush enable | |
| 97 | +| `i_flush_tag` | in | 5 bits | Partial flush tag boundary | |
| 98 | +| `i_flush_all` | in | logic | Full pipeline flush | |
| 99 | +| `o_empty` | out | logic | SQ is empty | |
| 100 | +| `o_count` | out | 4 bits | Number of valid entries | |
| 101 | + |
| 102 | +## Key Behaviors |
| 103 | + |
| 104 | +1. **Allocation**: On `i_alloc.valid && !o_full`, write entry at tail, advance |
| 105 | + tail pointer. |
| 106 | + |
| 107 | +2. **Address Update**: CAM search all entries for matching `rob_tag` with |
| 108 | + `!addr_valid`. Write address and `is_mmio` flag. |
| 109 | + |
| 110 | +3. **Data Update**: CAM search all entries for matching `rob_tag` with |
| 111 | + `!data_valid`. Write store data (FLEN-wide). |
| 112 | + |
| 113 | +4. **Commit**: When ROB commits a store (`i_commit_valid`), CAM search for |
| 114 | + matching `rob_tag` and set `committed = 1`. |
| 115 | + |
| 116 | +5. **Memory Write**: Head entry writes to memory when `committed && addr_valid |
| 117 | + && data_valid && !sent`. Single outstanding write. Byte enables generated |
| 118 | + from size and address offset. |
| 119 | + |
| 120 | +6. **FSD Two-Phase**: Phase 0 writes low word (addr), phase 1 writes high word |
| 121 | + (addr+4). Both must complete before entry is freed. |
| 122 | + |
| 123 | +7. **Store-to-Load Forwarding**: Combinational scan of all entries. For each |
| 124 | + valid entry older than the load: check addr_valid for disambiguation, check |
| 125 | + address overlap for matching. Forward when exact address match, same size, |
| 126 | + WORD/DOUBLE, and data_valid. |
| 127 | + |
| 128 | +8. **L0 Cache Invalidation**: On memory write completion, output the written |
| 129 | + address to the LQ's L0 cache for invalidation. |
| 130 | + |
| 131 | +9. **Flush**: `i_flush_all` resets all state. `i_flush_en` invalidates |
| 132 | + uncommitted entries younger than `i_flush_tag`. Committed entries are never |
| 133 | + flushed (they must complete to memory). |
| 134 | + |
| 135 | +## Verification |
| 136 | + |
| 137 | +- **Formal**: `ifdef FORMAL` block with BMC (depth 12) and cover (depth 20). |
| 138 | + Assertions check pointer/count consistency, memory write prerequisites, |
| 139 | + forwarding invariants, committed-survives-flush, and reset behavior. |
| 140 | +- **Cocotb**: Unit tests covering reset, allocation, address/data update, |
| 141 | + commit + memory write (SW/SH/SB), FSD two-phase, FSW, store-to-load |
| 142 | + forwarding, MMIO, flush, and constrained random. |
| 143 | + |
| 144 | +## Files |
| 145 | + |
| 146 | +- `store_queue.sv` - Module implementation |
| 147 | +- `store_queue.f` - Cocotb compilation file list |
| 148 | +- `README.md` - This file |
0 commit comments