Skip to content

Commit 22c52b6

Browse files
committed
Merge branch 'main' into KIL-471/create-run-config-error
2 parents e16f193 + 143b06b commit 22c52b6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+3227
-1237
lines changed

.agents/code_review_guidelines.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
- Editing globals: rarely a good idea. When done it should be thoughtful and clear: singletons clearly designed to be singletons and labeled as such. Never set globals on external libs (structlog) unless this project is an “application” (server always run at top level) and not a library (potentially called from many apps).
1414

1515
### Python specific guide
16-
1716
- Code should be "Pythonic"
1817
- We use `asyncio` where ever possible. Avoid threads unless there's a good reason we can't use async.
1918
- Python json.dumps should always set `ensure_ascii=False`
@@ -25,3 +24,7 @@ The SDK in `/libs/core` is a SDK/library we expose to third parties. We code rev
2524
- Changing existing APIs that break current users should be avoided. Call out breaking API changes, and confirm with user that we're okay with this break.
2625
- All visible classes/vars should have docstrings explaining their purpose. These will be pulled into 3rd party docs automatically. The doc strings should be written for 3rd party devs learning the SDK.
2726
- Performance: the base_adapter and litellm_adapter are performance critical. They are the core run-loop of our agent system. We should avoid anything that would slow them down (file reads should be done once and passed in, etc). It's critical to avoid blocking IO - a process may be executing hundreds of these in parallel.
27+
28+
### Project specific guide
29+
30+
- **`ModelName` enum and user input:** Do not use the `ModelName` enum for validation or typing of user-provided model identifiers (for example in a Pydantic request body that validates an API payload). Kiln loads additional models over the air; those models can use names that are not members of the locally shipped `ModelName` enum. If request validation is tied to the enum, a model that is valid according to the merged model list will fail validation. Appropriate uses of `ModelName` include aliasing a constant chosen at build time (for example default config that references a known shipped model) and entries inside the `ml_model_list` provider definitions.

.cursor/skills/kiln-add-model/SKILL.md

Lines changed: 50 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -181,15 +181,28 @@ If the model supports configurable reasoning effort (not just on/off), add `avai
181181

182182
## Phase 4 – Run Tests
183183

184-
Tests call real LLMs and cost money. Just execute commands directly — Cursor prompts for approval.
184+
Tests call real LLMs and cost money. Ideally the user only needs to consent to two script executions: the smoke test, then the full parallel suite.
185185

186186
**Vertex AI authentication:** Vertex tests require active gcloud credentials. If you are changing a model that uses Vertex, you must not run the test until asking the user to run `gcloud auth application-default login` before trying. These failures are auth issues, not model config problems.
187187

188188
**`-k` filter syntax:** Always use bracket notation for model+provider filtering, never `and`:
189189
- Good: `-k "test_name[glm_5-fireworks_ai]"` or `-k "glm_5"`
190190
- Bad: `-k "glm_5 and fireworks"``and` is a pytest keyword expression that can match wrong tests
191191

192-
### 4a. Smoke test — verify slug works
192+
### 4a. Enable parallel testing
193+
194+
Before running paid tests, enable parallel testing in `pytest.ini`:
195+
196+
```ini
197+
# Change this line:
198+
# addopts = -n auto
199+
# To:
200+
addopts = -n 8
201+
```
202+
203+
**Important:** Revert this change after all tests complete (re-comment the line).
204+
205+
### 4b. Smoke test — verify slug works
193206

194207
Run a single test+provider combo first:
195208

@@ -199,7 +212,7 @@ uv run pytest --runpaid --ollama -k "test_data_gen_sample_all_models_providers[M
199212

200213
If it fails, fix the slug/config before proceeding. Use `--collect-only` to find exact parameter IDs if unsure.
201214

202-
### 4b. Full test suite
215+
### 4c. Full test suite
203216

204217
```bash
205218
uv run pytest --runpaid --ollama -k "MODEL_ENUM" -v 2>&1 | grep -E "PASSED|FAILED|ERROR|short test|=====|collected"
@@ -211,7 +224,7 @@ uv run pytest --runpaid --ollama -k "MODEL_ENUM" -v 2>&1 | grep -E "PASSED|FAILE
211224
3. Re-run that single test to verify
212225
4. Only re-run the full suite once the single test passes
213226

214-
### 4c. Extraction tests (if `supports_doc_extraction=True`)
227+
### 4d. Extraction tests (if `supports_doc_extraction=True`)
215228

216229
Tests are in `libs/core/kiln_ai/adapters/extractors/test_litellm_extractor.py`.
217230

@@ -225,10 +238,39 @@ uv run pytest --runpaid --ollama libs/core/kiln_ai/adapters/extractors/test_lite
225238

226239
If a provider rejects a data type (400 error), remove that `KilnMimeType` and re-run.
227240

241+
### 4e. Revert parallel testing
242+
243+
After all tests complete, **revert `pytest.ini`** back to the commented-out state:
244+
245+
```ini
246+
# addopts = -n auto
247+
```
248+
249+
### 4f. Test output format
250+
251+
After all tests finish, present results to the user as:
252+
253+
1. **Two paragraphs of nuance** – describe any unusual findings, things you tried and reverted, known pre-existing failures vs new failures, API quirks discovered, and any config adjustments made during testing.
254+
255+
2. **Per-model per-test dump** – organized by model name and provider, using this format:
256+
257+
```text
258+
Model Name (provider):
259+
✅ test_name[model_enum-provider]
260+
❌ test_name[model_enum-provider] -- brief failure reason
261+
⏭️ test_name[model_enum-provider]
262+
```
263+
264+
Use ✅ for PASSED, ❌ for FAILED (with brief reason), ⏭️ for SKIPPED.
265+
228266
---
229267

230268
## Phase 5 – Discord Announcement
231269

270+
**Do NOT draft the Discord announcement automatically.** After presenting test results, ask the user if they want a Discord announcement drafted. Only proceed if they confirm.
271+
272+
When requested, use this format:
273+
232274
```
233275
New Model: [Model Name] 🚀
234276
[One-liner about the model and that it's now in Kiln]
@@ -288,9 +330,12 @@ Rules:
288330
- [ ] Preserve existing comments from predecessor (e.g. reasoning notes, MIME type groupings)
289331
- [ ] Zero-sum applied if model is suggested for evals/data gen
290332
- [ ] RAG config templates updated if the new model replaces one used in `app/web_ui/src/routes/(app)/docs/rag_configs/[project_id]/add_search_tool/rag_config_templates.ts`
333+
- [ ] Parallel testing enabled in `pytest.ini` (`addopts = -n 8`)
291334
- [ ] Smoke test passed
292335
- [ ] Full test suite passed
293-
- [ ] Discord announcement drafted
336+
- [ ] Per-model per-test result dump presented with nuance paragraphs
337+
- [ ] Parallel testing reverted in `pytest.ini` (re-commented)
338+
- [ ] Discord announcement drafted (only if user requests it)
294339

295340
---
296341

.cursor/skills/specs/references/cmd_code_review.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@ If something important is only in conversation history, that's a bug in the proc
1717

1818
Always run as a sub-agent — spawned fresh, no prior context from coding.
1919

20-
→ Read [references/spawning_subagents.md](references/spawning_subagents.md) for how to spawn sub-agents.
20+
→ Read [spawning_subagents.md](.cursor/skills/specs/references/spawning_subagents.md) for how to spawn sub-agents.
2121

22-
Pass the prompt from [references/cr_agent_prompt.md](references/cr_agent_prompt.md), plus scope description.
22+
Pass the prompt from [cr_agent_prompt.md](.cursor/skills/specs/references/cr_agent_prompt.md), plus scope description.
2323

2424
### Example invocation
2525

@@ -65,5 +65,5 @@ The loop continues until clean.
6565

6666
## References
6767

68-
- [references/spawning_subagents.md](references/spawning_subagents.md) — How to spawn sub-agents
69-
- [references/cr_agent_prompt.md](references/cr_agent_prompt.md) — Prompt passed to CR sub-agent
68+
- [spawning_subagents.md](.cursor/skills/specs/references/spawning_subagents.md) — How to spawn sub-agents
69+
- [cr_agent_prompt.md](.cursor/skills/specs/references/cr_agent_prompt.md) — Prompt passed to CR sub-agent
Lines changed: 112 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `/spec implement` — Implement Project
22

3-
Implement the active project. Routes to single-phase or full implementation.
3+
Implement the active project. The top-level agent acts as a strict manager/coordinator — it orchestrates sub-agents but never writes code or reviews it.
44

55
## Pre-Checks
66

@@ -22,7 +22,6 @@ Check that all spec artifacts through `implementation_plan.md` have `status: com
2222
If any are missing or `status: draft`:
2323

2424
> Project spec is incomplete. The following artifacts need attention:
25-
>
2625
> - [missing/draft artifacts]
2726
>
2827
> Use `/spec continue` to finish speccing before implementing.
@@ -34,90 +33,140 @@ If any are missing or `status: draft`:
3433
- `/spec implement all` or `/spec impl all`: All remaining phases
3534
- `/spec implement phase N` or `/spec impl phase N`: Specific single phase
3635

37-
## Single Phase Implementation
36+
## Manager Role
37+
38+
The manager orchestrates the implementation process. It does NOT code, review code, run tests, or make technical decisions.
39+
40+
The manager's responsibilities:
41+
- Spawn coding sub-agents and CR sub-agents at the right times
42+
- Route CR feedback back to the coding agent
43+
- Verify that commits actually landed (via `git status`)
44+
- Surface phase summaries and roadblocks to the user
45+
- Send minimal, well-structured prompts that point to reference files — not restate their content
46+
47+
## Single Phase Flow
48+
49+
If the target phase is already complete (checkbox checked in `implementation_plan.md`), tell the user and stop — don't re-implement it.
50+
51+
### Step 1: Spawn Coding Agent
52+
53+
Spawn a new coding sub-agent using the Initial Coding Prompt template below.
3854

39-
Implement one phase autonomously. The coding agent works without user assistance from start to finish.
55+
→ Read [spawning_subagents.md](.cursor/skills/specs/references/spawning_subagents.md) for how to spawn sub-agents.
4056

41-
### Coding Persona
57+
The coding agent returns either:
58+
- A summary indicating it's ready for code review
59+
- A roadblock message (see Escalation below)
4260

43-
You are a very skilled senior engineer IC. Your code:
61+
### Step 2: CR Loop
4462

45-
- Explains itself through great naming and composition
46-
- Uses comments only for external constraints, not to describe poorly structured code
47-
- Is test-driven: tests that catch real breakage, don't need constant refactoring, target 95%+ coverage, reuse test helpers
63+
1. Spawn a fresh CR sub-agent using the CR Agent Prompt template below
64+
2. CR agent returns structured feedback with severity labels
65+
3. If the review is clean: proceed to Step 3
66+
4. If issues exist:
67+
- Resume the coding agent with the CR Feedback Prompt template, passing the CR output
68+
- Coding agent addresses issues and returns a summary
69+
- Spawn a new CR sub-agent, passing prior feedback in a `<prior_cr_feedback>` block
70+
- Repeat until CR returns clean
4871

49-
You're willing to flag when a requirement leads to bad technical outcomes — but you don't re-litigate plan-level decisions that were already confirmed during speccing.
72+
→ Read [spawning_subagents.md](.cursor/skills/specs/references/spawning_subagents.md) for how to spawn sub-agents.
5073

51-
### Implementation Loop
74+
### Step 3: Commit
5275

53-
1. **Read the implementation plan** and identify the target phase
54-
2. **Read spec and architecture docs** for context
55-
3. **Write phase plan** to `/phase_plans/phase_N.md`:
56-
- Overview: what this phase accomplishes and why
57-
- Steps: ordered, specific. Files to change, exact changes, code snippets for signatures
58-
- Tests: specific automated test cases by name and what they verify
59-
- Completion criteria: checklist of what must be true when done
60-
4. **Build the code** per the phase plan
61-
5. **Run automated checks** (lint, format, type-check, build). Follow project-specific commands from system prompt. Iterate until clean.
62-
6. **Write tests** per the phase plan's test section
63-
7. **Run tests**. Iterate until passing.
64-
8. **Run automated checks again** (tests/fixes may introduce lint/format issues). Iterate until clean.
65-
9. **Self code-review via sub-agent**:
66-
- → Read [references/spawning_subagents.md](references/spawning_subagents.md) for how to spawn
67-
- Pass the prompt from [references/cr_agent_prompt.md](references/cr_agent_prompt.md) to the sub-agent
68-
- Include: "A coding agent just implemented phase N of [project]. Review the changes using `git diff`. The spec for this project can be found [here](link_to_spec_folder)."
69-
- Iterate per CR Iteration Loop below
70-
10. **Run automated checks one final time** (CR fixes may introduce issues). Iterate until clean.
71-
11. **Mark phase complete** in `implementation_plan.md` (toggle checkbox only)
72-
12. **Stop and present summary** of what was built
76+
Resume the coding agent with the Commit Prompt template below. The coding agent commits all changes, marks the phase complete, and returns the commit message.
7377

74-
### CR Iteration Loop
78+
### Step 4: Verify
7579

76-
1. Spawn CR sub-agent with clean context. Pass the CR prompt from `cr_agent_prompt.md`.
77-
2. CR returns feedback with severity labels (critical/moderate/mild).
78-
3. If issues exist:
79-
- Fix each issue (or rarely, add a code comment explaining the technical rationale)
80-
- Spawn a new CR sub-agent, passing the same CR prompt plus `<prior_cr_feedback>` block
81-
4. The re-review agent:
82-
- Verifies prior issues are addressed
83-
- Checks for new issues from fixes
84-
5. Loop until CR returns clean.
80+
Run `git status` to confirm:
81+
- Working tree is clean (no uncommitted changes)
82+
- The commit exists
8583

86-
### Non-Interactive Rule
84+
If `git status` shows uncommitted changes, resume the coding agent:
8785

88-
The coding phase is autonomous. Don't stop to ask the user for help.
86+
> Commit appears incomplete — `git status` shows uncommitted changes. Please commit all changes.
8987
90-
**One exception:** You discover a genuinely new technical constraint not known at design time that materially changes the plan (e.g., an API doesn't support an assumed operation, a framework has an undocumented limitation).
88+
Verify again after.
9189

92-
In this case — and only this case — pause and surface the issue to the user for a decision.
90+
### Step 5: Present Summary
91+
92+
Show the phase summary to the user.
9393

9494
## Implement All
9595

96-
A lightweight coordinator that runs all remaining phases in sequence.
96+
Run all remaining phases in sequence:
97+
98+
1. Read `implementation_plan.md`, find all incomplete phases
99+
2. For each phase: run the Single Phase Flow above
100+
3. Between phases: show the phase summary, then immediately continue to next phase (don't stop to ask)
101+
4. After all phases: present a final summary
102+
103+
If a target phase is already complete (checkbox checked), skip it.
104+
105+
## Prompt Templates
106+
107+
These are the exact prompts the manager sends to sub-agents. Use them verbatim, filling in the bracketed values.
108+
109+
### Initial Coding Prompt
110+
111+
```
112+
You are a coding agent implementing a phase of a spec-driven project.
113+
114+
**Phase:** [N]
115+
**Project specs:** [specs/projects/PROJECT_NAME/]
116+
117+
Read `.cursor/skills/specs/references/coding_phase_prompt.md` for your full instructions. Follow them precisely.
118+
119+
Return a short summary of what you built when implementation is complete and ready for code review.
120+
```
121+
122+
### CR Feedback Prompt (resume coding agent)
123+
124+
```
125+
A code reviewer found issues with your implementation. Address all feedback below, then run automated checks until clean.
126+
127+
Return a short summary of changes made when ready for re-review.
128+
129+
<cr_feedback>
130+
[CR agent's output]
131+
</cr_feedback>
132+
```
133+
134+
### Commit Prompt (resume coding agent)
135+
136+
```
137+
Your code has passed review. Commit all changes with a descriptive message summarizing the work done in this phase. Mark the phase checkbox complete in implementation_plan.md.
138+
139+
Return the commit message you used.
140+
```
141+
142+
### CR Agent Prompt
143+
144+
```
145+
Review code changes for phase [N] of the project at [specs/projects/PROJECT_NAME/].
97146
98-
### Coordinator Process
147+
Read `.cursor/skills/specs/references/cr_agent_prompt.md` for your full review instructions. Follow them precisely.
148+
```
99149

100-
1. Get next incomplete phase from `implementation_plan.md`
101-
2. Spawn a sub-agent with clean context to run the single-phase implementation flow above
102-
- → Read [references/spawning_subagents.md](references/spawning_subagents.md) for how to spawn
103-
- Pass: phase number, project path, instruction to follow single-phase implementation
104-
3. **Auto-commit**: `"Phase N implementation of [project name]\n\n[description of work in phase]"`
105-
4. Show the phase summary from the subagent to the user
106-
5. Continue to next phase (don't stop)
107-
6. Loop until all phases complete
150+
For re-reviews, append:
108151

109-
### Coordinator Context
152+
```
153+
<prior_cr_feedback>
154+
[Previous CR output]
155+
</prior_cr_feedback>
156+
```
110157

111-
The coordinator has minimal context — it just manages the loop. Each phase sub-agent gets clean context.
158+
## Escalation
112159

113-
CR happens inside each phase's implementation loop, not at coordinator level.
160+
The coding agent may surface a technical roadblock instead of a "ready for CR" summary. This happens when the coding agent's "one exception" rule triggers — a genuinely new technical constraint not known at design time.
114161

115-
### Passed to Phase Sub-Agents
162+
When the manager receives a roadblock message:
116163

117-
For implement-all, pass the content of [references/coding_phase_prompt.md](references/coding_phase_prompt.md) to each phase sub-agent. This prompt contains the full single-phase implementation instructions.
164+
1. Present the roadblock to the user and wait for a decision
165+
2. Resume the coding agent with the user's decision
166+
3. Continue the single-phase flow from wherever the coding agent left off
118167

119168
## References
120169

121-
- [references/spawning_subagents.md](references/spawning_subagents.md) — How to spawn sub-agents
122-
- [references/coding_phase_prompt.md](references/coding_phase_prompt.md)Prompt passed to coding sub-agents
123-
- [references/cr_agent_prompt.md](references/cr_agent_prompt.md)Prompt passed to CR sub-agents
170+
- [spawning_subagents.md](.cursor/skills/specs/references/spawning_subagents.md) — How to spawn and resume sub-agents
171+
- [coding_phase_prompt.md](.cursor/skills/specs/references/coding_phase_prompt.md)Full instructions for coding sub-agents
172+
- [cr_agent_prompt.md](.cursor/skills/specs/references/cr_agent_prompt.md)Full instructions for CR sub-agents

.cursor/skills/specs/references/cmd_new_project.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ If they approve, mark `status: complete`. If they want changes, make them and as
103103

104104
## Step 2: Functional Spec
105105

106-
→ Read [references/step_functional_spec.md](references/step_functional_spec.md) and follow it.
106+
→ Read [step_functional_spec.md](.cursor/skills/specs/references/step_functional_spec.md) and follow it.
107107

108108
## Step 3: UI Design (Conditional)
109109

@@ -117,11 +117,11 @@ If they confirm, skip to Step 4.
117117

118118
If UI is needed:
119119

120-
→ Read [references/step_ui_design.md](references/step_ui_design.md) and follow it.
120+
→ Read [step_ui_design.md](.cursor/skills/specs/references/step_ui_design.md) and follow it.
121121

122122
## Step 4: Architecture
123123

124-
→ Read [references/step_architecture.md](references/step_architecture.md) and follow it.
124+
→ Read [step_architecture.md](.cursor/skills/specs/references/step_architecture.md) and follow it.
125125

126126
## Step 5: Component Designs (Conditional)
127127

@@ -132,7 +132,7 @@ During the architecture step, you'll decide whether component designs are needed
132132

133133
If component designs are needed:
134134

135-
→ Read [references/step_component_designs.md](references/step_component_designs.md) and follow it.
135+
→ Read [step_component_designs.md](.cursor/skills/specs/references/step_component_designs.md) and follow it.
136136

137137
If not needed, proceed directly to Step 6.
138138

0 commit comments

Comments
 (0)