Skip to content

Commit 76cf9cf

Browse files
Add GPT-5.4 Mini/Nano, Mistral Small 4, Nemotron 3 Super, MiniMax M2.7, MiMo-V2 Pro/Flash/Omni (#1159)
* Update ml_model_list.py * Update SKILL.md * ollama * rabbit nit * nemotron family + better example in skill * Update SKILL.md
1 parent 57702e3 commit 76cf9cf

File tree

2 files changed

+252
-10
lines changed

2 files changed

+252
-10
lines changed

.cursor/skills/kiln-add-model/SKILL.md

Lines changed: 50 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -181,15 +181,28 @@ If the model supports configurable reasoning effort (not just on/off), add `avai
181181

182182
## Phase 4 – Run Tests
183183

184-
Tests call real LLMs and cost money. Just execute commands directly — Cursor prompts for approval.
184+
Tests call real LLMs and cost money. Ideally the user only needs to consent to two script executions: the smoke test, then the full parallel suite.
185185

186186
**Vertex AI authentication:** Vertex tests require active gcloud credentials. If you are changing a model that uses Vertex, you must not run the test until asking the user to run `gcloud auth application-default login` before trying. These failures are auth issues, not model config problems.
187187

188188
**`-k` filter syntax:** Always use bracket notation for model+provider filtering, never `and`:
189189
- Good: `-k "test_name[glm_5-fireworks_ai]"` or `-k "glm_5"`
190190
- Bad: `-k "glm_5 and fireworks"``and` is a pytest keyword expression that can match wrong tests
191191

192-
### 4a. Smoke test — verify slug works
192+
### 4a. Enable parallel testing
193+
194+
Before running paid tests, enable parallel testing in `pytest.ini`:
195+
196+
```ini
197+
# Change this line:
198+
# addopts = -n auto
199+
# To:
200+
addopts = -n 8
201+
```
202+
203+
**Important:** Revert this change after all tests complete (re-comment the line).
204+
205+
### 4b. Smoke test — verify slug works
193206

194207
Run a single test+provider combo first:
195208

@@ -199,7 +212,7 @@ uv run pytest --runpaid --ollama -k "test_data_gen_sample_all_models_providers[M
199212

200213
If it fails, fix the slug/config before proceeding. Use `--collect-only` to find exact parameter IDs if unsure.
201214

202-
### 4b. Full test suite
215+
### 4c. Full test suite
203216

204217
```bash
205218
uv run pytest --runpaid --ollama -k "MODEL_ENUM" -v 2>&1 | grep -E "PASSED|FAILED|ERROR|short test|=====|collected"
@@ -211,7 +224,7 @@ uv run pytest --runpaid --ollama -k "MODEL_ENUM" -v 2>&1 | grep -E "PASSED|FAILE
211224
3. Re-run that single test to verify
212225
4. Only re-run the full suite once the single test passes
213226

214-
### 4c. Extraction tests (if `supports_doc_extraction=True`)
227+
### 4d. Extraction tests (if `supports_doc_extraction=True`)
215228

216229
Tests are in `libs/core/kiln_ai/adapters/extractors/test_litellm_extractor.py`.
217230

@@ -225,10 +238,39 @@ uv run pytest --runpaid --ollama libs/core/kiln_ai/adapters/extractors/test_lite
225238

226239
If a provider rejects a data type (400 error), remove that `KilnMimeType` and re-run.
227240

241+
### 4e. Revert parallel testing
242+
243+
After all tests complete, **revert `pytest.ini`** back to the commented-out state:
244+
245+
```ini
246+
# addopts = -n auto
247+
```
248+
249+
### 4f. Test output format
250+
251+
After all tests finish, present results to the user as:
252+
253+
1. **Two paragraphs of nuance** – describe any unusual findings, things you tried and reverted, known pre-existing failures vs new failures, API quirks discovered, and any config adjustments made during testing.
254+
255+
2. **Per-model per-test dump** – organized by model name and provider, using this format:
256+
257+
```text
258+
Model Name (provider):
259+
✅ test_name[model_enum-provider]
260+
❌ test_name[model_enum-provider] -- brief failure reason
261+
⏭️ test_name[model_enum-provider]
262+
```
263+
264+
Use ✅ for PASSED, ❌ for FAILED (with brief reason), ⏭️ for SKIPPED.
265+
228266
---
229267

230268
## Phase 5 – Discord Announcement
231269

270+
**Do NOT draft the Discord announcement automatically.** After presenting test results, ask the user if they want a Discord announcement drafted. Only proceed if they confirm.
271+
272+
When requested, use this format:
273+
232274
```
233275
New Model: [Model Name] 🚀
234276
[One-liner about the model and that it's now in Kiln]
@@ -288,9 +330,12 @@ Rules:
288330
- [ ] Preserve existing comments from predecessor (e.g. reasoning notes, MIME type groupings)
289331
- [ ] Zero-sum applied if model is suggested for evals/data gen
290332
- [ ] RAG config templates updated if the new model replaces one used in `app/web_ui/src/routes/(app)/docs/rag_configs/[project_id]/add_search_tool/rag_config_templates.ts`
333+
- [ ] Parallel testing enabled in `pytest.ini` (`addopts = -n 8`)
291334
- [ ] Smoke test passed
292335
- [ ] Full test suite passed
293-
- [ ] Discord announcement drafted
336+
- [ ] Per-model per-test result dump presented with nuance paragraphs
337+
- [ ] Parallel testing reverted in `pytest.ini` (re-commented)
338+
- [ ] Discord announcement drafted (only if user requests it)
294339

295340
---
296341

0 commit comments

Comments
 (0)