You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .cursor/skills/kiln-add-model/SKILL.md
+50-5Lines changed: 50 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -181,15 +181,28 @@ If the model supports configurable reasoning effort (not just on/off), add `avai
181
181
182
182
## Phase 4 – Run Tests
183
183
184
-
Tests call real LLMs and cost money. Just execute commands directly — Cursor prompts for approval.
184
+
Tests call real LLMs and cost money. Ideally the user only needs to consent to two script executions: the smoke test, then the full parallel suite.
185
185
186
186
**Vertex AI authentication:** Vertex tests require active gcloud credentials. If you are changing a model that uses Vertex, you must not run the test until asking the user to run `gcloud auth application-default login` before trying. These failures are auth issues, not model config problems.
187
187
188
188
**`-k` filter syntax:** Always use bracket notation for model+provider filtering, never `and`:
189
189
- Good: `-k "test_name[glm_5-fireworks_ai]"` or `-k "glm_5"`
190
190
- Bad: `-k "glm_5 and fireworks"` — `and` is a pytest keyword expression that can match wrong tests
191
191
192
-
### 4a. Smoke test — verify slug works
192
+
### 4a. Enable parallel testing
193
+
194
+
Before running paid tests, enable parallel testing in `pytest.ini`:
195
+
196
+
```ini
197
+
# Change this line:
198
+
# addopts = -n auto
199
+
# To:
200
+
addopts = -n 8
201
+
```
202
+
203
+
**Important:** Revert this change after all tests complete (re-comment the line).
Tests are in `libs/core/kiln_ai/adapters/extractors/test_litellm_extractor.py`.
217
230
@@ -225,10 +238,39 @@ uv run pytest --runpaid --ollama libs/core/kiln_ai/adapters/extractors/test_lite
225
238
226
239
If a provider rejects a data type (400 error), remove that `KilnMimeType` and re-run.
227
240
241
+
### 4e. Revert parallel testing
242
+
243
+
After all tests complete, **revert `pytest.ini`** back to the commented-out state:
244
+
245
+
```ini
246
+
# addopts = -n auto
247
+
```
248
+
249
+
### 4f. Test output format
250
+
251
+
After all tests finish, present results to the user as:
252
+
253
+
1.**Two paragraphs of nuance** – describe any unusual findings, things you tried and reverted, known pre-existing failures vs new failures, API quirks discovered, and any config adjustments made during testing.
254
+
255
+
2.**Per-model per-test dump** – organized by model name and provider, using this format:
Use ✅ for PASSED, ❌ for FAILED (with brief reason), ⏭️ for SKIPPED.
265
+
228
266
---
229
267
230
268
## Phase 5 – Discord Announcement
231
269
270
+
**Do NOT draft the Discord announcement automatically.** After presenting test results, ask the user if they want a Discord announcement drafted. Only proceed if they confirm.
271
+
272
+
When requested, use this format:
273
+
232
274
```
233
275
New Model: [Model Name] 🚀
234
276
[One-liner about the model and that it's now in Kiln]
@@ -288,9 +330,12 @@ Rules:
288
330
-[ ] Preserve existing comments from predecessor (e.g. reasoning notes, MIME type groupings)
289
331
-[ ] Zero-sum applied if model is suggested for evals/data gen
290
332
-[ ] RAG config templates updated if the new model replaces one used in `app/web_ui/src/routes/(app)/docs/rag_configs/[project_id]/add_search_tool/rag_config_templates.ts`
0 commit comments