Skip to content

Commit e746d9a

Browse files
committed
fix(tomasulo): MMIO cache bypass, ICARUS/Yosys compat, AMO test coverage
- Gate cache_fill_valid with !lq_is_mmio to prevent MMIO loads from filling L0 cache (DESIGN.md spec drift) - Fix sc_pending always_ff to separate async reset from sync flush for Yosys formal compatibility - Add !sq_committed guard to SC discard to prevent invalidating committed SQ entries (formal assertion fix) - Replace `inside {}` with explicit comparisons for Yosys compatibility - Rewrite amo_compute function to use assignment instead of return for Yosys compatibility - Add ICARUS guards: `ifndef ICARUS` on concurrent assertion, forward- declare sq_effective_addr, stub mem_rs_next_is_sc - Add missing AMO ports to tomasulo_wrapper_tb and o_next_issue_is_sc to reservation_station_tb for ICARUS elaboration - Mark load_queue as Verilator-only (ICARUS VPI crash on wide structs) - Add 9 wrapper-level integration tests: MMIO store and all 8 AMO ops (AMOADD, AMOXOR, AMOAND, AMOOR, AMOMIN, AMOMAX, AMOMINU, AMOMAXU)
1 parent 5959c04 commit e746d9a

File tree

7 files changed

+416
-35
lines changed

7 files changed

+416
-35
lines changed

hw/rtl/cpu_and_mem/cpu/tomasulo/DESIGN.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1366,12 +1366,12 @@ The existing FROST front-end optimizations are preserved:
13661366
| 6. MEM (load results) |
13671367
| 7. ALU (shortest latency, can tolerate delay) |
13681368
| |
1369-
| DESIGN PARAMETER: NUM_CDB_LANES |
1369+
| DESIGN NOTE: SINGLE CDB LANE |
13701370
| |
1371-
| - Initially NUM_CDB_LANES = 1 (single CDB) |
1372-
| - Parameterized for future expansion |
1373-
| - 2 lanes (INT + FP) would reduce contention significantly |
1374-
| - Trade-off: 2x wakeup comparators in RS, 2x ROB write ports |
1371+
| - Single CDB lane (one broadcast per cycle) |
1372+
| - Multi-lane would require duplicating snoop comparators in every |
1373+
| RS entry and ROB slot, plus multi-port RAT — not justified for |
1374+
| this design's 7-FU configuration with fixed-priority arbitration |
13751375
| |
13761376
+------------------------------------------------------------------------+
13771377
```
@@ -1970,7 +1970,7 @@ existing `cpu_and_mem.sv` interface expectations.
19701970
| |
19711971
| PARAMETERS: |
19721972
| NUM_FUS = 7 (ALU, MUL, DIV, MEM, FP_ADD, FP_MUL, FP_DIV) |
1973-
| NUM_CDB_LANES = 1 (parameterized for future multi-bus expansion) |
1973+
| NUM_CDB_LANES = 1 (single lane; see design note above) |
19741974
| ROB_TAG_WIDTH = 5 bits |
19751975
| VALUE_WIDTH = max(XLEN, FLEN) = 64 bits |
19761976
| |
@@ -2220,7 +2220,7 @@ The schedule incorporates all Tomasulo components plus FROST-specific integratio
22202220
| Week | Dates | Deliverable | Key Tasks |
22212221
|------|-------|-------------|-----------|
22222222
| 1 | 1/20 | **F/D Extensions + Architecture** | FPU integration, F/D instruction support, cocotb tests passing; high-level Tomasulo block diagrams; review existing FROST pipeline for integration points |
2223-
| 2 | 1/27 | **Package Definition + D Timing** | Define tomasulo_pkg.sv (all types, structs, parameters); instruction routing table; ROB/RS entry structures; finish D extension timing closure (~300ps slack) |
2223+
| 2 | 1/27 | **Package Definition + D Timing** | Define Tomasulo types/structs/parameters in riscv_pkg.sv; instruction routing table; ROB/RS entry structures; finish D extension timing closure (~300ps slack) |
22242224
| 3 | 2/3 | **ROB Core Structure** | ROB circular buffer, allocation/deallocation logic, head/tail pointers, unified INT/FP entry fields (dest_rf, fp_flags), basic valid/done tracking |
22252225
| 4 | 2/10 | **ROB Commit + Serialization** | ROB commit logic (INT/FP writeback, FP flag accumulation); exception handling; WFI stall-at-head; CSR execute-at-commit; FENCE/FENCE.I handling |
22262226
| 5 | 2/17 | **RAT + Checkpointing** | INT RAT (x0-x31 mapping), FP RAT (f0-f31 mapping); checkpoint storage (4 checkpoints); checkpoint allocation on branch; restore on misprediction; RAS pointer in checkpoint |
@@ -2326,8 +2326,8 @@ The schedule incorporates all Tomasulo components plus FROST-specific integratio
23262326
- **Core Tomasulo**: ROB, INT RAT, FP RAT, Dispatch unit
23272327
- **Reservation Stations**: INT_RS, MUL_RS, MEM_RS, FP_RS, FMUL_RS, FDIV_RS (generic + instances)
23282328
- **Memory Subsystem**: Load Queue (with L0 cache integration, MMIO, FP64), Store Queue (with forwarding, MMIO, FP64)
2329-
- **Result Broadcast**: CDB arbiter (7 FUs, FLEN-wide, lane-parameterized)
2330-
- **Package**: tomasulo_pkg.sv with all types, structs, parameters
2329+
- **Result Broadcast**: CDB arbiter (7 FUs, FLEN-wide, single-lane)
2330+
- **Package**: All Tomasulo types, structs, and parameters live in riscv_pkg.sv
23312331

23322332
### Integration
23332333
- Integrated FROST-Tomasulo CPU with out-of-order execution for integer, memory, and floating-point operations

hw/rtl/cpu_and_mem/cpu/tomasulo/fu_shims/int_muldiv_shim.sv

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -102,17 +102,21 @@ module int_muldiv_shim (
102102
// ---------------------------------------------------------------------------
103103
// MUL in-flight + flush tracking (single in-flight, unchanged)
104104
// ---------------------------------------------------------------------------
105-
logic mul_in_flight;
106-
logic mul_flushed;
105+
logic mul_in_flight;
106+
logic mul_flushed;
107107

108108
// Forward declarations for valid signals from FUs
109-
logic multiplier_valid_input;
110-
logic multiplier_valid_output;
111-
logic divider_valid_input;
112-
logic divider_valid_output;
109+
logic multiplier_valid_input;
110+
logic multiplier_valid_output;
111+
logic divider_valid_input;
112+
logic divider_valid_output;
113113

114-
logic mul_flush_inflight;
115-
logic mul_flush_launching;
114+
// Forward declarations for signals used before their primary declaration
115+
logic [riscv_pkg::ReorderBufferTagWidth-1:0] mul_tag_reg;
116+
riscv_pkg::instr_op_e mul_op_reg;
117+
118+
logic mul_flush_inflight;
119+
logic mul_flush_launching;
116120

117121
assign mul_flush_inflight = mul_in_flight & (i_flush | (i_flush_en & is_younger(
118122
mul_tag_reg, i_flush_tag, i_rob_head_tag
@@ -173,8 +177,7 @@ module int_muldiv_shim (
173177
.o_completing_next_cycle(mul_completing_next_cycle)
174178
);
175179

176-
logic [riscv_pkg::ReorderBufferTagWidth-1:0] mul_tag_reg;
177-
riscv_pkg::instr_op_e mul_op_reg;
180+
// mul_tag_reg and mul_op_reg declared above (forward declaration for ICARUS)
178181

179182
always_ff @(posedge i_clk or negedge i_rst_n) begin
180183
if (!i_rst_n) begin

hw/rtl/cpu_and_mem/cpu/tomasulo/load_queue/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,10 @@ total). Rationale:
127127
- **Formal**: `ifdef FORMAL` block with BMC (depth 12) and cover (depth 20).
128128
Assertions check pointer/count consistency, issue prerequisites, MMIO
129129
ordering, CDB back-pressure, and flush behavior.
130-
- **Cocotb**: 21 unit tests covering reset, allocation, address update,
130+
- **Cocotb**: 38 unit tests covering reset, allocation, address update,
131131
LW/LB/LBU/LH/LHU, SQ forwarding, MMIO, FLD two-phase, FLW NaN-boxing,
132-
flush, ordering, back-pressure, and constrained random.
132+
flush, ordering, back-pressure, constrained random, LR/SC reservation,
133+
and AMO read-modify-write operations.
133134

134135
## Files
135136

hw/rtl/cpu_and_mem/cpu/tomasulo/load_queue/load_queue.sv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -497,7 +497,7 @@ module load_queue #(
497497
end
498498

499499
assign cache_fill_valid = i_mem_read_valid && mem_outstanding && lq_valid[issued_idx]
500-
&& !lq_is_lr[issued_idx] && !lq_is_amo[issued_idx];
500+
&& !lq_is_mmio[issued_idx] && !lq_is_lr[issued_idx] && !lq_is_amo[issued_idx];
501501
assign cache_fill_addr = cache_fill_actual_addr;
502502
assign cache_fill_data = i_mem_read_data;
503503

hw/rtl/cpu_and_mem/cpu/tomasulo/reorder_buffer/reorder_buffer.sv

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,12 @@ module reorder_buffer (
182182
end
183183
endfunction
184184

185+
// Forward declarations for ICARUS (used in debug assigns before main decl)
186+
logic [ReorderBufferTagWidth:0] head_ptr;
187+
logic [ReorderBufferTagWidth:0] tail_ptr;
188+
logic full;
189+
logic empty;
190+
185191
// ===========================================================================
186192
// Debug Signals (for verification)
187193
// ===========================================================================
@@ -250,17 +256,13 @@ module reorder_buffer (
250256
logic [ReorderBufferDepth-1:0] rob_is_sc;
251257
logic [ReorderBufferDepth-1:0] rob_is_compressed;
252258

253-
// Head and tail pointers (with extra bit for full/empty detection)
254-
logic [ReorderBufferTagWidth:0] head_ptr;
255-
logic [ReorderBufferTagWidth:0] tail_ptr;
259+
// Head and tail pointers (declared above for ICARUS forward ref)
256260

257261
// Derived pointer values (without wrap bit)
258262
logic [ReorderBufferTagWidth-1:0] head_idx;
259263
logic [ReorderBufferTagWidth-1:0] tail_idx;
260264

261-
// Status signals
262-
logic full;
263-
logic empty;
265+
// Status signals (full and empty declared above for ICARUS forward ref)
264266
logic [ReorderBufferTagWidth:0] count;
265267

266268
// Head entry fields for commit — RAM-backed fields are driven by RAM

hw/rtl/cpu_and_mem/cpu/tomasulo/tomasulo_wrapper/tomasulo_wrapper.sv

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,15 @@ module tomasulo_wrapper (
371371
// ===========================================================================
372372
riscv_pkg::cdb_broadcast_t cdb_bus;
373373

374+
// Forward declarations: adapter→arbiter signals (used here, defined below)
375+
riscv_pkg::fu_complete_t alu_adapter_to_arbiter;
376+
riscv_pkg::fu_complete_t mul_adapter_to_arbiter;
377+
riscv_pkg::fu_complete_t div_adapter_to_arbiter;
378+
riscv_pkg::fu_complete_t mem_adapter_to_arbiter;
379+
riscv_pkg::fu_complete_t fp_add_adapter_to_arbiter;
380+
riscv_pkg::fu_complete_t fp_mul_adapter_to_arbiter;
381+
riscv_pkg::fu_complete_t fp_div_adapter_to_arbiter;
382+
374383
`ifdef VERILATOR
375384
// Override slots 0-6: internal FU pipelines replace external i_fu_complete.
376385
// Slots 4-6 use a priority mux: internal FP adapter takes precedence,
@@ -486,7 +495,7 @@ module tomasulo_wrapper (
486495
// ===========================================================================
487496
riscv_pkg::rs_issue_t int_rs_issue_w; // INT_RS issue output
488497
riscv_pkg::fu_complete_t alu_shim_out; // ALU shim → adapter
489-
riscv_pkg::fu_complete_t alu_adapter_to_arbiter; // adapter → arbiter
498+
// alu_adapter_to_arbiter declared above (forward declaration)
490499
logic alu_adapter_result_pending;
491500
logic alu_fu_busy; // always 0 for single-cycle ALU
492501
logic int_rs_fu_ready;
@@ -499,8 +508,7 @@ module tomasulo_wrapper (
499508
riscv_pkg::rs_issue_t mul_rs_issue_w; // MUL_RS issue output (internal)
500509
riscv_pkg::fu_complete_t mul_shim_out; // shim MUL → adapter
501510
riscv_pkg::fu_complete_t div_shim_out; // shim DIV → adapter
502-
riscv_pkg::fu_complete_t mul_adapter_to_arbiter; // adapter → arbiter slot 1
503-
riscv_pkg::fu_complete_t div_adapter_to_arbiter; // adapter → arbiter slot 2
511+
// mul/div_adapter_to_arbiter declared above (forward declaration)
504512
logic mul_adapter_result_pending;
505513
logic div_adapter_result_pending;
506514
logic muldiv_busy;
@@ -521,7 +529,7 @@ module tomasulo_wrapper (
521529
// MEM (Load) Pipeline: LQ → adapter → CDB arbiter slot 3
522530
// ===========================================================================
523531
riscv_pkg::fu_complete_t lq_fu_complete; // LQ → adapter
524-
riscv_pkg::fu_complete_t mem_adapter_to_arbiter; // adapter → arbiter slot 3
532+
// mem_adapter_to_arbiter declared above (forward declaration)
525533
logic mem_adapter_result_pending;
526534

527535
// ===========================================================================
@@ -634,7 +642,7 @@ module tomasulo_wrapper (
634642
// ===========================================================================
635643
riscv_pkg::rs_issue_t fp_rs_issue_w; // FP_RS issue output (internal)
636644
riscv_pkg::fu_complete_t fp_add_shim_out; // shim → adapter
637-
riscv_pkg::fu_complete_t fp_add_adapter_to_arbiter; // adapter → arbiter
645+
// fp_add_adapter_to_arbiter declared above (forward declaration)
638646
logic fp_add_adapter_result_pending;
639647
logic fp_add_busy;
640648
logic fp_rs_fu_ready;
@@ -646,7 +654,7 @@ module tomasulo_wrapper (
646654
// ===========================================================================
647655
riscv_pkg::rs_issue_t fmul_rs_issue_w; // FMUL_RS issue output (internal)
648656
riscv_pkg::fu_complete_t fp_mul_shim_out;
649-
riscv_pkg::fu_complete_t fp_mul_adapter_to_arbiter;
657+
// fp_mul_adapter_to_arbiter declared above (forward declaration)
650658
logic fp_mul_adapter_result_pending;
651659
logic fp_mul_busy;
652660
logic fmul_rs_fu_ready;
@@ -658,7 +666,7 @@ module tomasulo_wrapper (
658666
// ===========================================================================
659667
riscv_pkg::rs_issue_t fdiv_rs_issue_w; // FDIV_RS issue output (internal)
660668
riscv_pkg::fu_complete_t fp_div_shim_out;
661-
riscv_pkg::fu_complete_t fp_div_adapter_to_arbiter;
669+
// fp_div_adapter_to_arbiter declared above (forward declaration)
662670
logic fp_div_adapter_result_pending;
663671
logic fp_div_busy;
664672
logic fdiv_rs_fu_ready;

0 commit comments

Comments
 (0)