Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Difftest Working Guidelines

Before working under `difftest/`, determine whether the task is a potentially complex, multi-file change or requires iterative testing/debugging. If so, review the relevant files in `difftest/docs/` as needed before proceeding.

For complex tasks, follow the plan/progress workflow defined in [`difftest/docs/workflow.md`](docs/workflow.md): create a plan, create a progress log, execute in phases, and use `askQuestions` to confirm ambiguities and end-of-conversation next steps.
4 changes: 3 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@
| [layout.md](./layout.md) | Project directory structure, key files, modification guide |
| [hw-flow.md](./hw-flow.md) | Hardware transport pipeline: Preprocess → Squash → Delta -> Batch → Sink |
| [sw-check.md](./sw-check.md) | Software checking flow: difftest_step, checkers, reference model, DPI-C |
| [test.md](./test.md) | Build / run / debug commands: EMU, simv, FPGA Sim, and debug workflow |
| [test.md](./test.md) | Build / run / debug commands: EMU, simv, FPGA Sim, reference DB comparison, phased verification |
| [workflow.md](./workflow.md) | Task workflow: plan/progress specs, execution practices, sub-agent delegation, debugging escalation |

## Recommended Reading Order

1. [layout.md](./layout.md) — Understand the overall project structure
2. [hw-flow.md](./hw-flow.md) — Understand the hardware-side data flow
3. [sw-check.md](./sw-check.md) — Understand the software-side checking logic
4. [test.md](./test.md) — Reference when building, running, or debugging
5. [workflow.md](./workflow.md) — Follow when executing complex multi-phase tasks
86 changes: 86 additions & 0 deletions docs/test.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,18 @@ bash difftest/scripts/fpga_sim/cosim.sh WORKLOAD=$WORKLOAD DIFF=$REF_SO WAVE=1
| `DIFF=PATH` | Reference SO (required) |
| `WAVE=1` | Enable waveform dump |

#### Precautions

- **Residual process check**: before every FPGA Sim run, check for leftover shared memory and processes:
```bash
lsof /dev/shm/xdma_sim* 2>/dev/null
```
If there is output, kill the listed PIDs and remove stale files (`rm -f /dev/shm/xdma_sim*`) before proceeding. Stale processes cause the next run to hang or produce incorrect results silently.

- **Wait for full completion**: always use `bash cosim.sh ... 2>&1 | tee <logfile>` and wait for the script to exit completely before inspecting results. Do not interrupt, background, or `Ctrl-C` mid-run — doing so may leave orphan processes.

- **Log preservation**: use `tee` to save each run's output to a distinct log file. Name logs after the change being tested (e.g. `build/cosim-phaseN-microbench.log`) so they can be compared across runs.

#### Cleanup

```bash
Expand Down Expand Up @@ -246,6 +258,46 @@ Relevant code: [`src/test/csrc/common/query.cpp`](../src/test/csrc/common/query.
| simv | `make clean && make simv DIFFTEST_QUERY=1 VCS=verilator -j2` | `./build/simv +workload=$WORKLOAD +diff=$REF_SO` |
| FPGA Sim | Add `DIFFTEST_QUERY=1` to Step 2 (fpga-build) in §2.3 | `bash difftest/scripts/fpga_sim/cosim.sh WORKLOAD=$WORKLOAD DIFF=$REF_SO` |

#### Reference DB Comparison

When debugging transport-stage issues (squash/batch/delta), comparing a suspect Query DB against a known-good **reference DB** is the most effective approach.

**Creating a reference DB:**

Run the same workload with difftest on a known-good code revision (or with a simpler `--difftest-config` that bypasses the suspect stage). Save the resulting `build/difftest_query.db` as your reference:

```bash
cp build/difftest_query.db ref-microbench.db # or ref-linux.db
```

**Hex conversion (required before comparison):**

Query DB stores values in raw format. Convert both the suspect DB and the reference DB to hex for readable comparison:

```bash
python3 difftest/scripts/query/convert_hex.py build/difftest_query.db
# Produces: build/difftest_query_hex.db

python3 difftest/scripts/query/convert_hex.py ref-microbench.db
# Produces: ref-microbench-hex.db
```

**Cross-DB comparison with ATTACH:**

Use SQLite's `ATTACH` to join tables across the suspect and reference DBs in a single query:

```bash
sqlite3 build/difftest_query_hex.db \
"ATTACH 'ref-microbench-hex.db' AS ref;
SELECT a.STEP, a.NFUSED AS dut, b.NFUSED AS ref
FROM main.InstrCommit a
JOIN ref.InstrCommit b ON a.STEP=b.STEP AND a.MY_INDEX=b.MY_INDEX
WHERE a.NFUSED != b.NFUSED
ORDER BY a.STEP LIMIT 20;"
```

Replace the table name (`InstrCommit`) and column (`NFUSED`) with the checker and field that diverged. The first divergent STEP typically points to the root cause.

### 3.3 Waveforms

Waveforms capture hardware signal transitions and are used to inspect timing and event ordering at the RTL level. Build with waveform support enabled, then dump at runtime.
Expand Down Expand Up @@ -288,3 +340,37 @@ EMU runtime waveform options:
2. **Locate** the checker source in [`src/test/csrc/difftest/checkers/`](../src/test/csrc/difftest/checkers). Read its comparison logic and understand which fields diverged from the printed DUT/REF state.
3. **Query DB** (if transport stages are suspected): rebuild with `DIFFTEST_QUERY=1`, collect `build/difftest_query.db` from at least two runs (e.g. different `--difftest-config` settings or code revisions). Compare the DBs to narrow which transport stage is implicated.
4. **Waveform** (for RTL-level verification): after forming a hypothesis, rebuild with `EMU_TRACE=1` (or `EMU_TRACE=fst` for simv). Dump focused waveforms for the suspect time range to validate timing and signal ordering.

---

## 5. Phased Verification Strategy

When making multi-step changes to the hardware transport pipeline (e.g. modifying Squash, Batch, and Delta together), use a phased approach to isolate regressions early.

### Principles

- **One logical change per phase.** Each phase should modify one module or one aspect of the pipeline. Do not combine unrelated changes in the same phase.
- **Gate on tests before proceeding.** A phase is complete only when all required tests pass. Never start the next phase on a failing baseline.
- **Use a progress log.** Record each phase's changes, test results, and any debugging notes in a dedicated progress file (e.g. `.github/difftest-progress.md`). This provides a clear audit trail and makes it easier to bisect regressions.

### Test Ladder

Run tests in order of increasing cost. Stop at the first failure.

| Step | Test | Pass Criteria | Approx. Time |
|------|------|---------------|---------------|
| 1 | microbench | `HIT GOOD TRAP`, no `ABORT`/`mismatch` | ~1–2 min |
| 2 | linux (short) | No `ABORT`/`mismatch` within a `timeout 300` window | 5 min |
| 3 | linux (long) | No `ABORT`/`mismatch` within a `timeout 600` window | 10 min |

- Step 1 catches most functional regressions quickly.
- Step 2 exercises more complex boot code paths and interrupt handling.
- Step 3 is a final confidence check for timing-sensitive or rare-event issues. Only run after all phases pass Steps 1–2.

### Typical Workflow

1. **Compile** (full three-step build for FPGA Sim, or `make emu` for EMU).
2. **Run microbench.** If it fails, debug and fix before running linux.
3. **Run linux (5 min).** If it fails, the issue is likely related to more complex instruction sequences or interrupt timing.
4. After **all phases pass** Steps 1–2, run the **final 10-min linux test** once as the acceptance gate.
5. Record results in the progress log after each step.
150 changes: 150 additions & 0 deletions docs/workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# DiffTest Task Workflow

This document defines the standard workflow for complex difftest tasks — those involving multi-file changes, pipeline modifications, or iterative debugging. Simple one-shot edits do not require this process.

## Overview

```
Task received
Create Plan (.github/<task>-plan.md)
Create Progress (.github/<task>-progress.md)
┌─ Per Phase ──────────────────────────┐
│ Re-read plan + progress │
│ │ │
│ ▼ │
│ Implement changes │
│ │ │
│ ▼ │
│ Test (microbench → linux) │
│ │ │
│ ├─ PASS → update progress, next │
│ └─ FAIL → debug, fix, re-test │
└──────────────────────────────────────┘
Final verification (10-min linux)
askQuestions: confirm completion / next steps
```

---

## 1. Plan Document

Every non-trivial task must start with a plan stored at `.github/<task>-plan.md`.

### Required Sections

| Section | Contents |
|---------|----------|
| **Header** | Task title, modification scope (which directories/files may be changed), target configuration, execution requirements |
| **Design Rationale** | High-level description of *what* changes and *why*; core principles and key trade-offs |
| **Prerequisites** | Environment variables, reference files, common build commands, common test commands, debug workflow |
| **Phases** | One section per Phase: files to modify, specific changes with before/after logic, expected behavior, test instructions |
| **Final Verification** | Acceptance test (typically 10-min linux) and exit criteria |

### Guidelines

- **Explicit build and test commands.** Every Phase must include the exact commands to build and test. Do not rely on "same as before" — copy the commands so each Phase is self-contained.
- **Explicit pass/fail criteria.** For each test, state what constitutes PASS (e.g. `HIT GOOD TRAP`, no `ABORT`/`mismatch`) and FAIL.
- **Logic description before code.** Describe the intended behavior change in words before showing code snippets. State the invariants that must hold.
- **Scope boundaries.** Clearly state which files/directories are in-scope and which are off-limits (e.g. "do not modify `src/test/`").

### Versioning

If a plan needs significant revision (not minor fixes), create a new version: `<task>-plan-v2.md`, `<task>-plan-v3.md`, etc. Keep old versions for reference.

---

## 2. Progress Document

Every task with a plan must have a corresponding progress file at `.github/<task>-progress.md` (versioned to match the plan).

### Required Sections Per Phase

| Section | Contents |
|---------|----------|
| **Changes Made** | Bullet list of actual modifications (file, what changed) |
| **Test Results** | Each test with PASS/FAIL, key metrics (instrCnt, cycleCnt, IPC, duration) |
| **Issues & Debugging** | Problems encountered, root-cause analysis, attempted fixes (including failed ones), and final resolution |

### Status Table

Use a summary table at the top for quick overview:

```markdown
| Phase | Change Summary | Microbench | Linux 5min | Linux 10min |
|-------|---------------|-----------|-----------|-------------|
| 1 | Squasher Decoupled | ✅ 783387 | ✅ 0 errors | — |
| 2 | DeltaSplitter Decoupled | ✅ 783387 | ✅ 0 errors | — |
| Final | All changes | — | — | ✅ 600s |
```

### Guidelines

- **Record failed attempts.** When a fix fails, document the attempt, the failure mode, and why it was wrong. This prevents repeating the same mistake and preserves debugging context.
- **Update immediately.** Write progress entries as each phase completes, not at the end. If a conversation is interrupted, the progress file is the recovery point.
- **Include quantitative data.** Log exact instrCnt, cycleCnt, step numbers, error messages — not just "passed" or "failed".

---

## 3. Execution Practices

### Context Recovery

At the start of each conversation or after a context reset:

1. **Re-read the plan** (`.github/<task>-plan.md`) to restore the task definition and Phase structure.
2. **Re-read the progress** (`.github/<task>-progress.md`) to determine which Phase is current and what has already been tried.
3. **Re-read relevant docs** (`difftest/docs/`) if the task involves unfamiliar modules.

Do not rely on memory or assumptions about prior state. The plan and progress files are the source of truth.

### Confirming Ambiguities

When encountering unclear requirements, design choices, or unexpected test results:

- Use `askQuestions` to confirm details with the user before proceeding.
- Prefer asking early (before implementing a speculative fix) over asking late (after a failed debugging cycle).
- Typical situations: scope clarification, which approach to take when multiple are viable, whether a failing test is a known issue, whether to proceed to next Phase despite a partial result.

### End-of-Conversation Check

At the end of every conversation:

- Use `askQuestions` to confirm whether there are further requirements, open questions, or next steps.
- This ensures nothing is left implicit and gives the user a chance to redirect before the context is lost.

### Sub-Agent Delegation

Delegate the following to sub-agents (e.g. Explore agent) to avoid filling the main conversation context window:

| Task | Why Delegate |
|------|-------------|
| Reading and analyzing log files (mismatch logs, build logs) | Logs are large and detailed; extracting the root cause is a focused subtask |
| Query DB inspection (sqlite3 queries, cross-DB comparisons) | Involves multiple queries and iterative interpretation |
| Waveform analysis hypotheses | Requires reading signal traces and correlating with RTL logic |
| Large-scale code reading for audit or review | Reviewing many files produces verbose context |

The sub-agent should return only the **conclusion and key evidence** (e.g. "mismatch at step 4482, field NFUSED: DUT=65, REF=64, caused by ..."), not raw query output.

---

## 4. Debugging Escalation

When a test fails, follow this escalation order:

1. **Console output** — read the last ~100 lines for checker name, cycle, and DUT/REF state.
2. **Query DB comparison** — convert to hex, compare with reference DB using `ATTACH`. Identify the first divergent STEP.
3. **Waveform** — rebuild with `EMU_TRACE=fst`, dump the suspect time range, inspect signal transitions.

At each level, form a hypothesis before escalating. If the hypothesis can be verified without the next level, do so.

Detailed command references: see [test.md §3 (Debugging Artifacts)](./test.md#3-debugging-artifacts) and [test.md §4 (Debug Workflow)](./test.md#4-debug-workflow).
Loading