|
| 1 | +# DiffTest Task Workflow |
| 2 | + |
| 3 | +This document defines the standard workflow for complex difftest tasks — those involving multi-file changes, pipeline modifications, or iterative debugging. Simple one-shot edits do not require this process. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +``` |
| 8 | +Task received |
| 9 | + │ |
| 10 | + ▼ |
| 11 | +Create Plan (.github/<task>-plan.md) |
| 12 | + │ |
| 13 | + ▼ |
| 14 | +Create Progress (.github/<task>-progress.md) |
| 15 | + │ |
| 16 | + ▼ |
| 17 | +┌─ Per Phase ──────────────────────────┐ |
| 18 | +│ Re-read plan + progress │ |
| 19 | +│ │ │ |
| 20 | +│ ▼ │ |
| 21 | +│ Implement changes │ |
| 22 | +│ │ │ |
| 23 | +│ ▼ │ |
| 24 | +│ Test (microbench → linux) │ |
| 25 | +│ │ │ |
| 26 | +│ ├─ PASS → update progress, next │ |
| 27 | +│ └─ FAIL → debug, fix, re-test │ |
| 28 | +└──────────────────────────────────────┘ |
| 29 | + │ |
| 30 | + ▼ |
| 31 | +Final verification (10-min linux) |
| 32 | + │ |
| 33 | + ▼ |
| 34 | +askQuestions: confirm completion / next steps |
| 35 | +``` |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +## 1. Plan Document |
| 40 | + |
| 41 | +Every non-trivial task must start with a plan stored at `.github/<task>-plan.md`. |
| 42 | + |
| 43 | +### Required Sections |
| 44 | + |
| 45 | +| Section | Contents | |
| 46 | +|---------|----------| |
| 47 | +| **Header** | Task title, modification scope (which directories/files may be changed), target configuration, execution requirements | |
| 48 | +| **Design Rationale** | High-level description of *what* changes and *why*; core principles and key trade-offs | |
| 49 | +| **Prerequisites** | Environment variables, reference files, common build commands, common test commands, debug workflow | |
| 50 | +| **Phases** | One section per Phase: files to modify, specific changes with before/after logic, expected behavior, test instructions | |
| 51 | +| **Final Verification** | Acceptance test (typically 10-min linux) and exit criteria | |
| 52 | + |
| 53 | +### Guidelines |
| 54 | + |
| 55 | +- **Explicit build and test commands.** Every Phase must include the exact commands to build and test. Do not rely on "same as before" — copy the commands so each Phase is self-contained. |
| 56 | +- **Explicit pass/fail criteria.** For each test, state what constitutes PASS (e.g. `HIT GOOD TRAP`, no `ABORT`/`mismatch`) and FAIL. |
| 57 | +- **Logic description before code.** Describe the intended behavior change in words before showing code snippets. State the invariants that must hold. |
| 58 | +- **Scope boundaries.** Clearly state which files/directories are in-scope and which are off-limits (e.g. "do not modify `src/test/`"). |
| 59 | + |
| 60 | +### Versioning |
| 61 | + |
| 62 | +If a plan needs significant revision (not minor fixes), create a new version: `<task>-plan-v2.md`, `<task>-plan-v3.md`, etc. Keep old versions for reference. |
| 63 | + |
| 64 | +--- |
| 65 | + |
| 66 | +## 2. Progress Document |
| 67 | + |
| 68 | +Every task with a plan must have a corresponding progress file at `.github/<task>-progress.md` (versioned to match the plan). |
| 69 | + |
| 70 | +### Required Sections Per Phase |
| 71 | + |
| 72 | +| Section | Contents | |
| 73 | +|---------|----------| |
| 74 | +| **Changes Made** | Bullet list of actual modifications (file, what changed) | |
| 75 | +| **Test Results** | Each test with PASS/FAIL, key metrics (instrCnt, cycleCnt, IPC, duration) | |
| 76 | +| **Issues & Debugging** | Problems encountered, root-cause analysis, attempted fixes (including failed ones), and final resolution | |
| 77 | + |
| 78 | +### Status Table |
| 79 | + |
| 80 | +Use a summary table at the top for quick overview: |
| 81 | + |
| 82 | +```markdown |
| 83 | +| Phase | Change Summary | Microbench | Linux 5min | Linux 10min | |
| 84 | +|-------|---------------|-----------|-----------|-------------| |
| 85 | +| 1 | Squasher Decoupled | ✅ 783387 | ✅ 0 errors | — | |
| 86 | +| 2 | DeltaSplitter Decoupled | ✅ 783387 | ✅ 0 errors | — | |
| 87 | +| Final | All changes | — | — | ✅ 600s | |
| 88 | +``` |
| 89 | + |
| 90 | +### Guidelines |
| 91 | + |
| 92 | +- **Record failed attempts.** When a fix fails, document the attempt, the failure mode, and why it was wrong. This prevents repeating the same mistake and preserves debugging context. |
| 93 | +- **Update immediately.** Write progress entries as each phase completes, not at the end. If a conversation is interrupted, the progress file is the recovery point. |
| 94 | +- **Include quantitative data.** Log exact instrCnt, cycleCnt, step numbers, error messages — not just "passed" or "failed". |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## 3. Execution Practices |
| 99 | + |
| 100 | +### Context Recovery |
| 101 | + |
| 102 | +At the start of each conversation or after a context reset: |
| 103 | + |
| 104 | +1. **Re-read the plan** (`.github/<task>-plan.md`) to restore the task definition and Phase structure. |
| 105 | +2. **Re-read the progress** (`.github/<task>-progress.md`) to determine which Phase is current and what has already been tried. |
| 106 | +3. **Re-read relevant docs** (`difftest/docs/`) if the task involves unfamiliar modules. |
| 107 | + |
| 108 | +Do not rely on memory or assumptions about prior state. The plan and progress files are the source of truth. |
| 109 | + |
| 110 | +### Confirming Ambiguities |
| 111 | + |
| 112 | +When encountering unclear requirements, design choices, or unexpected test results: |
| 113 | + |
| 114 | +- Use `askQuestions` to confirm details with the user before proceeding. |
| 115 | +- Prefer asking early (before implementing a speculative fix) over asking late (after a failed debugging cycle). |
| 116 | +- Typical situations: scope clarification, which approach to take when multiple are viable, whether a failing test is a known issue, whether to proceed to next Phase despite a partial result. |
| 117 | + |
| 118 | +### End-of-Conversation Check |
| 119 | + |
| 120 | +At the end of every conversation: |
| 121 | + |
| 122 | +- Use `askQuestions` to confirm whether there are further requirements, open questions, or next steps. |
| 123 | +- This ensures nothing is left implicit and gives the user a chance to redirect before the context is lost. |
| 124 | + |
| 125 | +### Sub-Agent Delegation |
| 126 | + |
| 127 | +Delegate the following to sub-agents (e.g. Explore agent) to avoid filling the main conversation context window: |
| 128 | + |
| 129 | +| Task | Why Delegate | |
| 130 | +|------|-------------| |
| 131 | +| Reading and analyzing log files (mismatch logs, build logs) | Logs are large and detailed; extracting the root cause is a focused subtask | |
| 132 | +| Query DB inspection (sqlite3 queries, cross-DB comparisons) | Involves multiple queries and iterative interpretation | |
| 133 | +| Waveform analysis hypotheses | Requires reading signal traces and correlating with RTL logic | |
| 134 | +| Large-scale code reading for audit or review | Reviewing many files produces verbose context | |
| 135 | + |
| 136 | +The sub-agent should return only the **conclusion and key evidence** (e.g. "mismatch at step 4482, field NFUSED: DUT=65, REF=64, caused by ..."), not raw query output. |
| 137 | + |
| 138 | +--- |
| 139 | + |
| 140 | +## 4. Debugging Escalation |
| 141 | + |
| 142 | +When a test fails, follow this escalation order: |
| 143 | + |
| 144 | +1. **Console output** — read the last ~100 lines for checker name, cycle, and DUT/REF state. |
| 145 | +2. **Query DB comparison** — convert to hex, compare with reference DB using `ATTACH`. Identify the first divergent STEP. |
| 146 | +3. **Waveform** — rebuild with `EMU_TRACE=fst`, dump the suspect time range, inspect signal transitions. |
| 147 | + |
| 148 | +At each level, form a hypothesis before escalating. If the hypothesis can be verified without the next level, do so. |
| 149 | + |
| 150 | +Detailed command references: see [test.md §3 (Debugging Artifacts)](./test.md#3-debugging-artifacts) and [test.md §4 (Debug Workflow)](./test.md#4-debug-workflow). |
0 commit comments