Skip to content

Commit 7e4f304

Browse files
authored
docs: add task workflow spec and test cautions (#845)
- docs/workflow.md: new document defining plan/progress spec, execution practices (context recovery, askQuestions, sub-agent delegation), and debugging escalation order - docs/test.md: add FPGA_SIM precautions (residual process check, log preservation), reference DB cross-comparison workflow (hex conversion, ATTACH queries), and phased verification strategy (test ladder, per-phase gating) - docs/README.md: add workflow.md to document index and reading order - AGENTS.md: reference workflow.md for complex tasks
1 parent 89d27d2 commit 7e4f304

File tree

4 files changed

+241
-1
lines changed

4 files changed

+241
-1
lines changed

AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
# Difftest Working Guidelines
22

33
Before working under `difftest/`, determine whether the task is a potentially complex, multi-file change or requires iterative testing/debugging. If so, review the relevant files in `difftest/docs/` as needed before proceeding.
4+
5+
For complex tasks, follow the plan/progress workflow defined in [`difftest/docs/workflow.md`](docs/workflow.md): create a plan, create a progress log, execute in phases, and use `askQuestions` to confirm ambiguities and end-of-conversation next steps.

docs/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,13 @@
77
| [layout.md](./layout.md) | Project directory structure, key files, modification guide |
88
| [hw-flow.md](./hw-flow.md) | Hardware transport pipeline: Preprocess → Squash → Delta -> Batch → Sink |
99
| [sw-check.md](./sw-check.md) | Software checking flow: difftest_step, checkers, reference model, DPI-C |
10-
| [test.md](./test.md) | Build / run / debug commands: EMU, simv, FPGA Sim, and debug workflow |
10+
| [test.md](./test.md) | Build / run / debug commands: EMU, simv, FPGA Sim, reference DB comparison, phased verification |
11+
| [workflow.md](./workflow.md) | Task workflow: plan/progress specs, execution practices, sub-agent delegation, debugging escalation |
1112

1213
## Recommended Reading Order
1314

1415
1. [layout.md](./layout.md) — Understand the overall project structure
1516
2. [hw-flow.md](./hw-flow.md) — Understand the hardware-side data flow
1617
3. [sw-check.md](./sw-check.md) — Understand the software-side checking logic
1718
4. [test.md](./test.md) — Reference when building, running, or debugging
19+
5. [workflow.md](./workflow.md) — Follow when executing complex multi-phase tasks

docs/test.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,18 @@ bash difftest/scripts/fpga_sim/cosim.sh WORKLOAD=$WORKLOAD DIFF=$REF_SO WAVE=1
180180
| `DIFF=PATH` | Reference SO (required) |
181181
| `WAVE=1` | Enable waveform dump |
182182

183+
#### Precautions
184+
185+
- **Residual process check**: before every FPGA Sim run, check for leftover shared memory and processes:
186+
```bash
187+
lsof /dev/shm/xdma_sim* 2>/dev/null
188+
```
189+
If there is output, kill the listed PIDs and remove stale files (`rm -f /dev/shm/xdma_sim*`) before proceeding. Stale processes cause the next run to hang or produce incorrect results silently.
190+
191+
- **Wait for full completion**: always use `bash cosim.sh ... 2>&1 | tee <logfile>` and wait for the script to exit completely before inspecting results. Do not interrupt, background, or `Ctrl-C` mid-run — doing so may leave orphan processes.
192+
193+
- **Log preservation**: use `tee` to save each run's output to a distinct log file. Name logs after the change being tested (e.g. `build/cosim-phaseN-microbench.log`) so they can be compared across runs.
194+
183195
#### Cleanup
184196

185197
```bash
@@ -246,6 +258,46 @@ Relevant code: [`src/test/csrc/common/query.cpp`](../src/test/csrc/common/query.
246258
| simv | `make clean && make simv DIFFTEST_QUERY=1 VCS=verilator -j2` | `./build/simv +workload=$WORKLOAD +diff=$REF_SO` |
247259
| FPGA Sim | Add `DIFFTEST_QUERY=1` to Step 2 (fpga-build) in §2.3 | `bash difftest/scripts/fpga_sim/cosim.sh WORKLOAD=$WORKLOAD DIFF=$REF_SO` |
248260

261+
#### Reference DB Comparison
262+
263+
When debugging transport-stage issues (squash/batch/delta), comparing a suspect Query DB against a known-good **reference DB** is the most effective approach.
264+
265+
**Creating a reference DB:**
266+
267+
Run the same workload with difftest on a known-good code revision (or with a simpler `--difftest-config` that bypasses the suspect stage). Save the resulting `build/difftest_query.db` as your reference:
268+
269+
```bash
270+
cp build/difftest_query.db ref-microbench.db # or ref-linux.db
271+
```
272+
273+
**Hex conversion (required before comparison):**
274+
275+
Query DB stores values in raw format. Convert both the suspect DB and the reference DB to hex for readable comparison:
276+
277+
```bash
278+
python3 difftest/scripts/query/convert_hex.py build/difftest_query.db
279+
# Produces: build/difftest_query_hex.db
280+
281+
python3 difftest/scripts/query/convert_hex.py ref-microbench.db
282+
# Produces: ref-microbench-hex.db
283+
```
284+
285+
**Cross-DB comparison with ATTACH:**
286+
287+
Use SQLite's `ATTACH` to join tables across the suspect and reference DBs in a single query:
288+
289+
```bash
290+
sqlite3 build/difftest_query_hex.db \
291+
"ATTACH 'ref-microbench-hex.db' AS ref;
292+
SELECT a.STEP, a.NFUSED AS dut, b.NFUSED AS ref
293+
FROM main.InstrCommit a
294+
JOIN ref.InstrCommit b ON a.STEP=b.STEP AND a.MY_INDEX=b.MY_INDEX
295+
WHERE a.NFUSED != b.NFUSED
296+
ORDER BY a.STEP LIMIT 20;"
297+
```
298+
299+
Replace the table name (`InstrCommit`) and column (`NFUSED`) with the checker and field that diverged. The first divergent STEP typically points to the root cause.
300+
249301
### 3.3 Waveforms
250302

251303
Waveforms capture hardware signal transitions and are used to inspect timing and event ordering at the RTL level. Build with waveform support enabled, then dump at runtime.
@@ -288,3 +340,37 @@ EMU runtime waveform options:
288340
2. **Locate** the checker source in [`src/test/csrc/difftest/checkers/`](../src/test/csrc/difftest/checkers). Read its comparison logic and understand which fields diverged from the printed DUT/REF state.
289341
3. **Query DB** (if transport stages are suspected): rebuild with `DIFFTEST_QUERY=1`, collect `build/difftest_query.db` from at least two runs (e.g. different `--difftest-config` settings or code revisions). Compare the DBs to narrow which transport stage is implicated.
290342
4. **Waveform** (for RTL-level verification): after forming a hypothesis, rebuild with `EMU_TRACE=1` (or `EMU_TRACE=fst` for simv). Dump focused waveforms for the suspect time range to validate timing and signal ordering.
343+
344+
---
345+
346+
## 5. Phased Verification Strategy
347+
348+
When making multi-step changes to the hardware transport pipeline (e.g. modifying Squash, Batch, and Delta together), use a phased approach to isolate regressions early.
349+
350+
### Principles
351+
352+
- **One logical change per phase.** Each phase should modify one module or one aspect of the pipeline. Do not combine unrelated changes in the same phase.
353+
- **Gate on tests before proceeding.** A phase is complete only when all required tests pass. Never start the next phase on a failing baseline.
354+
- **Use a progress log.** Record each phase's changes, test results, and any debugging notes in a dedicated progress file (e.g. `.github/difftest-progress.md`). This provides a clear audit trail and makes it easier to bisect regressions.
355+
356+
### Test Ladder
357+
358+
Run tests in order of increasing cost. Stop at the first failure.
359+
360+
| Step | Test | Pass Criteria | Approx. Time |
361+
|------|------|---------------|---------------|
362+
| 1 | microbench | `HIT GOOD TRAP`, no `ABORT`/`mismatch` | ~1–2 min |
363+
| 2 | linux (short) | No `ABORT`/`mismatch` within a `timeout 300` window | 5 min |
364+
| 3 | linux (long) | No `ABORT`/`mismatch` within a `timeout 600` window | 10 min |
365+
366+
- Step 1 catches most functional regressions quickly.
367+
- Step 2 exercises more complex boot code paths and interrupt handling.
368+
- Step 3 is a final confidence check for timing-sensitive or rare-event issues. Only run after all phases pass Steps 1–2.
369+
370+
### Typical Workflow
371+
372+
1. **Compile** (full three-step build for FPGA Sim, or `make emu` for EMU).
373+
2. **Run microbench.** If it fails, debug and fix before running linux.
374+
3. **Run linux (5 min).** If it fails, the issue is likely related to more complex instruction sequences or interrupt timing.
375+
4. After **all phases pass** Steps 1–2, run the **final 10-min linux test** once as the acceptance gate.
376+
5. Record results in the progress log after each step.

docs/workflow.md

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# DiffTest Task Workflow
2+
3+
This document defines the standard workflow for complex difftest tasks — those involving multi-file changes, pipeline modifications, or iterative debugging. Simple one-shot edits do not require this process.
4+
5+
## Overview
6+
7+
```
8+
Task received
9+
10+
11+
Create Plan (.github/<task>-plan.md)
12+
13+
14+
Create Progress (.github/<task>-progress.md)
15+
16+
17+
┌─ Per Phase ──────────────────────────┐
18+
│ Re-read plan + progress │
19+
│ │ │
20+
│ ▼ │
21+
│ Implement changes │
22+
│ │ │
23+
│ ▼ │
24+
│ Test (microbench → linux) │
25+
│ │ │
26+
│ ├─ PASS → update progress, next │
27+
│ └─ FAIL → debug, fix, re-test │
28+
└──────────────────────────────────────┘
29+
30+
31+
Final verification (10-min linux)
32+
33+
34+
askQuestions: confirm completion / next steps
35+
```
36+
37+
---
38+
39+
## 1. Plan Document
40+
41+
Every non-trivial task must start with a plan stored at `.github/<task>-plan.md`.
42+
43+
### Required Sections
44+
45+
| Section | Contents |
46+
|---------|----------|
47+
| **Header** | Task title, modification scope (which directories/files may be changed), target configuration, execution requirements |
48+
| **Design Rationale** | High-level description of *what* changes and *why*; core principles and key trade-offs |
49+
| **Prerequisites** | Environment variables, reference files, common build commands, common test commands, debug workflow |
50+
| **Phases** | One section per Phase: files to modify, specific changes with before/after logic, expected behavior, test instructions |
51+
| **Final Verification** | Acceptance test (typically 10-min linux) and exit criteria |
52+
53+
### Guidelines
54+
55+
- **Explicit build and test commands.** Every Phase must include the exact commands to build and test. Do not rely on "same as before" — copy the commands so each Phase is self-contained.
56+
- **Explicit pass/fail criteria.** For each test, state what constitutes PASS (e.g. `HIT GOOD TRAP`, no `ABORT`/`mismatch`) and FAIL.
57+
- **Logic description before code.** Describe the intended behavior change in words before showing code snippets. State the invariants that must hold.
58+
- **Scope boundaries.** Clearly state which files/directories are in-scope and which are off-limits (e.g. "do not modify `src/test/`").
59+
60+
### Versioning
61+
62+
If a plan needs significant revision (not minor fixes), create a new version: `<task>-plan-v2.md`, `<task>-plan-v3.md`, etc. Keep old versions for reference.
63+
64+
---
65+
66+
## 2. Progress Document
67+
68+
Every task with a plan must have a corresponding progress file at `.github/<task>-progress.md` (versioned to match the plan).
69+
70+
### Required Sections Per Phase
71+
72+
| Section | Contents |
73+
|---------|----------|
74+
| **Changes Made** | Bullet list of actual modifications (file, what changed) |
75+
| **Test Results** | Each test with PASS/FAIL, key metrics (instrCnt, cycleCnt, IPC, duration) |
76+
| **Issues & Debugging** | Problems encountered, root-cause analysis, attempted fixes (including failed ones), and final resolution |
77+
78+
### Status Table
79+
80+
Use a summary table at the top for quick overview:
81+
82+
```markdown
83+
| Phase | Change Summary | Microbench | Linux 5min | Linux 10min |
84+
|-------|---------------|-----------|-----------|-------------|
85+
| 1 | Squasher Decoupled | ✅ 783387 | ✅ 0 errors ||
86+
| 2 | DeltaSplitter Decoupled | ✅ 783387 | ✅ 0 errors ||
87+
| Final | All changes ||| ✅ 600s |
88+
```
89+
90+
### Guidelines
91+
92+
- **Record failed attempts.** When a fix fails, document the attempt, the failure mode, and why it was wrong. This prevents repeating the same mistake and preserves debugging context.
93+
- **Update immediately.** Write progress entries as each phase completes, not at the end. If a conversation is interrupted, the progress file is the recovery point.
94+
- **Include quantitative data.** Log exact instrCnt, cycleCnt, step numbers, error messages — not just "passed" or "failed".
95+
96+
---
97+
98+
## 3. Execution Practices
99+
100+
### Context Recovery
101+
102+
At the start of each conversation or after a context reset:
103+
104+
1. **Re-read the plan** (`.github/<task>-plan.md`) to restore the task definition and Phase structure.
105+
2. **Re-read the progress** (`.github/<task>-progress.md`) to determine which Phase is current and what has already been tried.
106+
3. **Re-read relevant docs** (`difftest/docs/`) if the task involves unfamiliar modules.
107+
108+
Do not rely on memory or assumptions about prior state. The plan and progress files are the source of truth.
109+
110+
### Confirming Ambiguities
111+
112+
When encountering unclear requirements, design choices, or unexpected test results:
113+
114+
- Use `askQuestions` to confirm details with the user before proceeding.
115+
- Prefer asking early (before implementing a speculative fix) over asking late (after a failed debugging cycle).
116+
- Typical situations: scope clarification, which approach to take when multiple are viable, whether a failing test is a known issue, whether to proceed to next Phase despite a partial result.
117+
118+
### End-of-Conversation Check
119+
120+
At the end of every conversation:
121+
122+
- Use `askQuestions` to confirm whether there are further requirements, open questions, or next steps.
123+
- This ensures nothing is left implicit and gives the user a chance to redirect before the context is lost.
124+
125+
### Sub-Agent Delegation
126+
127+
Delegate the following to sub-agents (e.g. Explore agent) to avoid filling the main conversation context window:
128+
129+
| Task | Why Delegate |
130+
|------|-------------|
131+
| Reading and analyzing log files (mismatch logs, build logs) | Logs are large and detailed; extracting the root cause is a focused subtask |
132+
| Query DB inspection (sqlite3 queries, cross-DB comparisons) | Involves multiple queries and iterative interpretation |
133+
| Waveform analysis hypotheses | Requires reading signal traces and correlating with RTL logic |
134+
| Large-scale code reading for audit or review | Reviewing many files produces verbose context |
135+
136+
The sub-agent should return only the **conclusion and key evidence** (e.g. "mismatch at step 4482, field NFUSED: DUT=65, REF=64, caused by ..."), not raw query output.
137+
138+
---
139+
140+
## 4. Debugging Escalation
141+
142+
When a test fails, follow this escalation order:
143+
144+
1. **Console output** — read the last ~100 lines for checker name, cycle, and DUT/REF state.
145+
2. **Query DB comparison** — convert to hex, compare with reference DB using `ATTACH`. Identify the first divergent STEP.
146+
3. **Waveform** — rebuild with `EMU_TRACE=fst`, dump the suspect time range, inspect signal transitions.
147+
148+
At each level, form a hypothesis before escalating. If the hypothesis can be verified without the next level, do so.
149+
150+
Detailed command references: see [test.md §3 (Debugging Artifacts)](./test.md#3-debugging-artifacts) and [test.md §4 (Debug Workflow)](./test.md#4-debug-workflow).

0 commit comments

Comments
 (0)