You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .agents/code_review_guidelines.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,7 @@
10
10
- Missing comments: comments should document the "why" not the what. If code does something unexpected, and the "why" is non obvious, the why should be documented.
11
11
- Code in the incorrect place: adding code to a class/file where it doesn’t belong
12
12
- Repeated Code: we should use helper functions, test parameterization and other features for code reuse. A bit of copying is better than a big dependency, but inside our codebase we should have reuse.
13
+
-`TODO` comments: before the final PR, all `TODO` comments must be resolved. Any code or comment that must be changed before merging to main must include the exact string `TODO` in the comment — `FIXME`, `HACK`, `XXX`, and other alternatives do not count, as only `TODO` is enforced by CI. `TODO` comments are acceptable in intermediate commits but must be cleaned up before the final PR/phase.
13
14
- Editing globals: rarely a good idea. When done it should be thoughtful and clear: singletons clearly designed to be singletons and labeled as such. Never set globals on external libs (structlog) unless this project is an “application” (server always run at top level) and not a library (potentially called from many apps).
14
15
15
16
### Python specific guide
@@ -25,6 +26,13 @@ The SDK in `/libs/core` is a SDK/library we expose to third parties. We code rev
25
26
- All visible classes/vars should have docstrings explaining their purpose. These will be pulled into 3rd party docs automatically. The doc strings should be written for 3rd party devs learning the SDK.
26
27
- Performance: the base_adapter and litellm_adapter are performance critical. They are the core run-loop of our agent system. We should avoid anything that would slow them down (file reads should be done once and passed in, etc). It's critical to avoid blocking IO - a process may be executing hundreds of these in parallel.
27
28
29
+
### UI-Specific Review Guides
30
+
31
+
If the change contains UI changes read:
32
+
33
+
-`./frontend_design_guide.md`
34
+
-`./frontend_controls.md`
35
+
28
36
### FastAPI / OpenAPI Standards
29
37
30
38
If the change impacts API endpoints, read `.agents/api_code_review.md` for instructions on how to code review API endpoints.
Copy file name to clipboardExpand all lines: .agents/frontend_controls.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,12 +11,14 @@ The following controls are commonly used in our design language:
11
11
-`app_page.svelte` - a page of our app including title, subtitle, and action buttons in standard position/size
12
12
-`property_list.svelte` - a list of properties in a grid with name, value and optional tooltips/links. Optional list title.
13
13
-`form_element.svelte`/`form_container.svelte`/`form_list.svelte` - a series of controls for building forms with submit buttons, spinners, errors, validation, input controls, etc.
14
-
-`info_tooltip.svelte` - a way to display a tooltip from an “i” info icon
14
+
-`info_tooltip.svelte` - a way to display a tooltip from an “i” info icon, and uses floating_ui to not break doc flow.
15
15
-`warning.svelte` - show message box with icon and text. Can be a warning, informational or success.
16
16
-`intro.svelte` - used for empty screens before data is added. Teaches user about concept, and has buttons guiding them to an action.
17
17
-`dialog.svelte` a modal dialog with close button, title, area for content, and action buttons.
18
18
-`edit_dialog.svelte` a dialog for editing properties like name/description. Has save/cancel buttons.
19
19
-`collapse.svelte` a collapsible section, often titled "Advanced Options" to hide optional controls
20
+
-`float.svelte` - a low-level wrapper for `@floating-ui/dom` positioning. Prefer higher-level components (`floating_menu.svelte`, `info_tooltip.svelte`) when they fit your use case.
21
+
-`floating_menu.svelte` / `table_action_menu.svelte` - floating dropdown menus using `@floating-ui/dom`. Use instead of DaisyUI's `dropdown-content` class, which breaks inside tables, dialogs, and scroll areas. `table_action_menu.svelte` is a convenience wrapper that includes the "..." ellipsis button with hover-to-open; `floating_menu.svelte` is the generic version with a trigger slot.
20
22
21
23
Read the control's code to better understand it and its parameters. Optionally search for an existing use of the control to see it in use.
Copy file name to clipboardExpand all lines: .agents/skills/claude-maintain-models/SKILL.md
+51-54Lines changed: 51 additions & 54 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
---
2
-
name: kiln-add-model
2
+
name: claude-maintain-models
3
3
description: Add new AI models to Kiln's ml_model_list.py and produce a Discord announcement. Use when the user wants to add, integrate, or register a new LLM model (e.g. Claude, GPT, DeepSeek, Gemini, Kimi, Qwen, Grok) into the Kiln model list, mentions adding a model to ml_model_list.py, or asks to discover/find new models that are available but not yet in Kiln.
@@ -19,7 +20,6 @@ After code changes, run paid integration tests, then draft a Discord post.
19
20
20
21
These apply throughout the entire workflow.
21
22
22
-
-**Sandbox:** All `curl` and `uv run` commands MUST use `required_permissions: ["all"]`. The sandbox breaks `uv run` (Rust panics) and blocks network access for `curl`.
23
23
-**Slug verification:** NEVER guess or infer model slugs from naming patterns. Every `model_id` must come from an authoritative source (LiteLLM catalog, official docs, API reference, or changelog). If you can't verify a slug, tell the user and ask them to provide it.
24
24
-**Date awareness:** These models are often released very recently. Web search for current info before assuming you know the details.
25
25
@@ -248,73 +248,71 @@ After all tests complete, **revert `pytest.ini`** back to the commented-out stat
248
248
249
249
### 4f. Test output format
250
250
251
-
After all tests finish, present results to the user as:
252
-
253
-
1.**Two paragraphs of nuance** – describe any unusual findings, things you tried and reverted, known pre-existing failures vs new failures, API quirks discovered, and any config adjustments made during testing.
251
+
Collect test results for use in the PR body (Phase 5). Organize by model name and provider using these symbols:
252
+
- ✅ for passed tests
253
+
- ⚠️ for tests that failed due to content quality flakes (e.g. model returned fewer items than expected, weak assertion mismatches) — include a brief reason
254
+
- ❌ for tests that failed due to real errors (bad slug, unsupported feature, 400/500 errors) — include a brief reason
255
+
- List every test using the full pytest parametrize ID, grouped by provider
256
+
- Include extraction tests (Phase 4d) if they were run
254
257
255
-
2.**Per-model per-test dump** – organized by model name and provider, using this format:
Use ✅ for PASSED, ❌ for FAILED (with brief reason), ⏭️ for SKIPPED.
262
+
After all tests pass and `pytest.ini` is reverted, commit the changes and open a PR against `main`.
265
263
266
-
---
264
+
### 5a. Commit and push
267
265
268
-
## Phase 5 – Discord Announcement
266
+
1. Create a new branch named `add-model/MODEL_NAME` (e.g. `add-model/glm-5-1`)
267
+
2. Stage only the changed files (typically just `ml_model_list.py`)
268
+
3. Commit with a concise message (e.g. "Add GLM 5.1 to model list (together_ai, siliconflow_cn)")
269
+
4. Push the branch
269
270
270
-
**Do NOT draft the Discord announcement automatically.** After presenting test results, ask the user if they want a Discord announcement drafted. Only proceed if they confirm.
271
+
### 5b. Create the PR
271
272
272
-
When requested, use this format:
273
+
Use `gh pr create` against `main`. The PR body must follow this exact format:
273
274
274
275
```
275
-
New Model: [Model Name] 🚀
276
-
[One-liner about the model and that it's now in Kiln]
277
-
278
-
Kiln Test Pass Results
279
-
[Model Name]:
280
-
✅ Tool Calling
281
-
✅ Structured Data ([mode used])
282
-
✅ Synthetic Data Generation
283
-
✅ Evals (only if suggested_for_evals=True)
284
-
✅ Document extraction: [formats] (only if supports_doc_extraction=True)
285
-
✅ Vision: [formats] (only if supports_vision=True)
286
-
287
-
Model Variants, Hosts and Quirks
288
-
[Model Name]:
289
-
Available on: [list providers]
290
-
[Any quirks or notes]
291
-
292
-
How to Use These Models in Kiln
293
-
Simply restart Kiln, and all these models will appear in your model dropdown if you have the appropriate API configured.
294
-
```
276
+
## What does this PR do?
295
277
296
-
Use ⚠️ for flaky features, ❌ for unsupported.
278
+
Test Results
297
279
298
-
### Test Summary
280
+
[Two paragraphs of nuance — describe any unusual findings, things you tried and reverted, known pre-existing failures vs new failures, API quirks discovered, and any config adjustments made during testing.]
299
281
300
-
After the Discord announcement, print a per-test summary listing every test that ran for the model. Use the full pytest parametrize ID so the user can see exactly which test+provider combos passed, failed, or were flaky.
- [X] New tests have been added to any work in /lib
310
308
```
311
309
312
-
Rules:
313
-
-✅ for passed tests
314
-
-⚠️ for tests that failed due to content quality flakes (e.g. model returned fewer items than expected, weak assertion mismatches) — include a brief reason
315
-
-❌ for tests that failed due to real errors (bad slug, unsupported feature, 400/500 errors) — include a brief reason
316
-
-List every test, grouped by provider if the model has multiple providers
317
-
-Include extraction tests (Phase 4c) if they were run
310
+
**Rules for the PR body:**
311
+
-Every test that ran must appear in the per-test dump, using the full pytest parametrize ID
312
+
-Group tests by `[Model Name] ([provider]):` headers
313
+
-The summary section at the top gives a quick pass/skip/fail count per model+provider
314
+
-The detailed section below the `---` lists every individual test result
315
+
-Use ⚠️ for content quality flakes (not real failures), ❌ for real errors
Copy file name to clipboardExpand all lines: AGENTS.md
+19-3Lines changed: 19 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,9 +25,23 @@ This repo is a monorepo containing all of the source code, in the following stru
25
25
26
26
### Agent Tools
27
27
28
-
Agents have access to a range of tools for running tests, linting, formatting and typechecking. Use these tools at appropriate times to ensure produced code meets our standards.
29
-
30
-
To run all checks in a CLI, run `uv run ./checks.sh --agent-mode` (agent mode will reduce tokens unless there is an error).
28
+
Agents have access to a range of tools for running tests, linting, formatting and typechecking. Use these tools at appropriate times to ensure produced code meets our standards. All checks must pass before merging. When iterating on a specific failure, use the targeted command before re-running the full suite.
29
+
30
+
-**All checks:**`uv run ./checks.sh --agent-mode` (agent mode suppresses output unless there's a failure)
31
+
32
+
| Check | Fix | Description |
33
+
|---|---|---|
34
+
|`uv run ruff check`|`uv run ruff check --fix`| Python lint |
35
+
|`uv run ruff format --check .`|`uv run ruff format .`| Python format |
36
+
|`uv run ty check`| — | Python type check |
37
+
|`uv run python3 -m pytest --benchmark-quiet -q -n auto .`| — | Python tests |
38
+
|`npm run lint`| — | Web lint (from `app/web_ui`) |
39
+
|`npm run format_check`|`npm run format`| Web format (from `app/web_ui`) |
40
+
|`npm run check`| — | Web type check and svelte check (from `app/web_ui`) |
41
+
|`npm run test_run`| — | Web tests (from `app/web_ui`) |
42
+
|`npm run build`| — | Web build (from `app/web_ui`) |
43
+
|`app/web_ui/src/lib/check_schema.sh`|`app/web_ui/src/lib/generate_schema.sh`| OpenAPI client up to date |
44
+
|`misspell`| — | Spelling check (optional if not installed) |
31
45
32
46
### Agent Prompts
33
47
@@ -37,7 +51,9 @@ These prompts can be accessed from the `get_prompt` tool, and you may request se
37
51
38
52
### General Agent Guidance
39
53
54
+
- When spawning subagents, always use the same model as the current agent
40
55
- Don't include comments in code explaining changes, explain changes in chat instead.
56
+
- Use `TODO` comments to mark any temporary code, placeholders, or items that must be addressed before merging to main. CI enforces that no `TODO` comments remain on main, so they are a safe way to flag work-in-progress during development. Clean up all `TODO` comments before the final PR.
41
57
- Before wrapping up a task, run appropriate tools for linting, testing, formatting and typechecking. Fix any issues you introduced.
0 commit comments