Skip to content

Commit 569dacd

Browse files
committed
merged main
2 parents c68ef22 + e87d1a2 commit 569dacd

File tree

66 files changed

+3295
-674
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+3295
-674
lines changed

.agents/code_review_guidelines.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
- Missing comments: comments should document the "why" not the what. If code does something unexpected, and the "why" is non obvious, the why should be documented.
1111
- Code in the incorrect place: adding code to a class/file where it doesn’t belong
1212
- Repeated Code: we should use helper functions, test parameterization and other features for code reuse. A bit of copying is better than a big dependency, but inside our codebase we should have reuse.
13+
- `TODO` comments: before the final PR, all `TODO` comments must be resolved. Any code or comment that must be changed before merging to main must include the exact string `TODO` in the comment — `FIXME`, `HACK`, `XXX`, and other alternatives do not count, as only `TODO` is enforced by CI. `TODO` comments are acceptable in intermediate commits but must be cleaned up before the final PR/phase.
1314
- Editing globals: rarely a good idea. When done it should be thoughtful and clear: singletons clearly designed to be singletons and labeled as such. Never set globals on external libs (structlog) unless this project is an “application” (server always run at top level) and not a library (potentially called from many apps).
1415

1516
### Python specific guide
@@ -25,6 +26,13 @@ The SDK in `/libs/core` is a SDK/library we expose to third parties. We code rev
2526
- All visible classes/vars should have docstrings explaining their purpose. These will be pulled into 3rd party docs automatically. The doc strings should be written for 3rd party devs learning the SDK.
2627
- Performance: the base_adapter and litellm_adapter are performance critical. They are the core run-loop of our agent system. We should avoid anything that would slow them down (file reads should be done once and passed in, etc). It's critical to avoid blocking IO - a process may be executing hundreds of these in parallel.
2728

29+
### UI-Specific Review Guides
30+
31+
If the change contains UI changes read:
32+
33+
- `./frontend_design_guide.md`
34+
- `./frontend_controls.md`
35+
2836
### FastAPI / OpenAPI Standards
2937

3038
If the change impacts API endpoints, read `.agents/api_code_review.md` for instructions on how to code review API endpoints.

.agents/frontend_controls.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,14 @@ The following controls are commonly used in our design language:
1111
- `app_page.svelte` - a page of our app including title, subtitle, and action buttons in standard position/size
1212
- `property_list.svelte` - a list of properties in a grid with name, value and optional tooltips/links. Optional list title.
1313
- `form_element.svelte`/`form_container.svelte`/`form_list.svelte` - a series of controls for building forms with submit buttons, spinners, errors, validation, input controls, etc.
14-
- `info_tooltip.svelte` - a way to display a tooltip from an “i” info icon
14+
- `info_tooltip.svelte` - a way to display a tooltip from an “i” info icon, and uses floating_ui to not break doc flow.
1515
- `warning.svelte` - show message box with icon and text. Can be a warning, informational or success.
1616
- `intro.svelte` - used for empty screens before data is added. Teaches user about concept, and has buttons guiding them to an action.
1717
- `dialog.svelte` a modal dialog with close button, title, area for content, and action buttons.
1818
- `edit_dialog.svelte` a dialog for editing properties like name/description. Has save/cancel buttons.
1919
- `collapse.svelte` a collapsible section, often titled "Advanced Options" to hide optional controls
20+
- `float.svelte` - a low-level wrapper for `@floating-ui/dom` positioning. Prefer higher-level components (`floating_menu.svelte`, `info_tooltip.svelte`) when they fit your use case.
21+
- `floating_menu.svelte` / `table_action_menu.svelte` - floating dropdown menus using `@floating-ui/dom`. Use instead of DaisyUI's `dropdown-content` class, which breaks inside tables, dialogs, and scroll areas. `table_action_menu.svelte` is a convenience wrapper that includes the "..." ellipsis button with hover-to-open; `floating_menu.svelte` is the generic version with a trigger slot.
2022

2123
Read the control's code to better understand it and its parameters. Optionally search for an existing use of the control to see it in use.
2224

.cursor/skills/kiln-add-model/SKILL.md renamed to .agents/skills/claude-maintain-models/SKILL.md

Lines changed: 51 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
2-
name: kiln-add-model
2+
name: claude-maintain-models
33
description: Add new AI models to Kiln's ml_model_list.py and produce a Discord announcement. Use when the user wants to add, integrate, or register a new LLM model (e.g. Claude, GPT, DeepSeek, Gemini, Kimi, Qwen, Grok) into the Kiln model list, mentions adding a model to ml_model_list.py, or asks to discover/find new models that are available but not yet in Kiln.
4+
allowed-tools: Read Edit Write Bash Grep Glob Agent WebSearch WebFetch
45
---
56

67
# Add a New AI Model to Kiln
@@ -19,7 +20,6 @@ After code changes, run paid integration tests, then draft a Discord post.
1920

2021
These apply throughout the entire workflow.
2122

22-
- **Sandbox:** All `curl` and `uv run` commands MUST use `required_permissions: ["all"]`. The sandbox breaks `uv run` (Rust panics) and blocks network access for `curl`.
2323
- **Slug verification:** NEVER guess or infer model slugs from naming patterns. Every `model_id` must come from an authoritative source (LiteLLM catalog, official docs, API reference, or changelog). If you can't verify a slug, tell the user and ask them to provide it.
2424
- **Date awareness:** These models are often released very recently. Web search for current info before assuming you know the details.
2525

@@ -248,73 +248,71 @@ After all tests complete, **revert `pytest.ini`** back to the commented-out stat
248248

249249
### 4f. Test output format
250250

251-
After all tests finish, present results to the user as:
252-
253-
1. **Two paragraphs of nuance** – describe any unusual findings, things you tried and reverted, known pre-existing failures vs new failures, API quirks discovered, and any config adjustments made during testing.
251+
Collect test results for use in the PR body (Phase 5). Organize by model name and provider using these symbols:
252+
- ✅ for passed tests
253+
- ⚠️ for tests that failed due to content quality flakes (e.g. model returned fewer items than expected, weak assertion mismatches) — include a brief reason
254+
- ❌ for tests that failed due to real errors (bad slug, unsupported feature, 400/500 errors) — include a brief reason
255+
- List every test using the full pytest parametrize ID, grouped by provider
256+
- Include extraction tests (Phase 4d) if they were run
254257

255-
2. **Per-model per-test dump** – organized by model name and provider, using this format:
258+
---
256259

257-
```text
258-
Model Name (provider):
259-
✅ test_name[model_enum-provider]
260-
❌ test_name[model_enum-provider] -- brief failure reason
261-
⏭️ test_name[model_enum-provider]
262-
```
260+
## Phase 5 – Create Pull Request
263261

264-
Use ✅ for PASSED, ❌ for FAILED (with brief reason), ⏭️ for SKIPPED.
262+
After all tests pass and `pytest.ini` is reverted, commit the changes and open a PR against `main`.
265263

266-
---
264+
### 5a. Commit and push
267265

268-
## Phase 5 – Discord Announcement
266+
1. Create a new branch named `add-model/MODEL_NAME` (e.g. `add-model/glm-5-1`)
267+
2. Stage only the changed files (typically just `ml_model_list.py`)
268+
3. Commit with a concise message (e.g. "Add GLM 5.1 to model list (together_ai, siliconflow_cn)")
269+
4. Push the branch
269270

270-
**Do NOT draft the Discord announcement automatically.** After presenting test results, ask the user if they want a Discord announcement drafted. Only proceed if they confirm.
271+
### 5b. Create the PR
271272

272-
When requested, use this format:
273+
Use `gh pr create` against `main`. The PR body must follow this exact format:
273274

274275
```
275-
New Model: [Model Name] 🚀
276-
[One-liner about the model and that it's now in Kiln]
277-
278-
Kiln Test Pass Results
279-
[Model Name]:
280-
✅ Tool Calling
281-
✅ Structured Data ([mode used])
282-
✅ Synthetic Data Generation
283-
✅ Evals (only if suggested_for_evals=True)
284-
✅ Document extraction: [formats] (only if supports_doc_extraction=True)
285-
✅ Vision: [formats] (only if supports_vision=True)
286-
287-
Model Variants, Hosts and Quirks
288-
[Model Name]:
289-
Available on: [list providers]
290-
[Any quirks or notes]
291-
292-
How to Use These Models in Kiln
293-
Simply restart Kiln, and all these models will appear in your model dropdown if you have the appropriate API configured.
294-
```
276+
## What does this PR do?
295277
296-
Use ⚠️ for flaky features, ❌ for unsupported.
278+
Test Results
297279
298-
### Test Summary
280+
[Two paragraphs of nuance — describe any unusual findings, things you tried and reverted, known pre-existing failures vs new failures, API quirks discovered, and any config adjustments made during testing.]
299281
300-
After the Discord announcement, print a per-test summary listing every test that ran for the model. Use the full pytest parametrize ID so the user can see exactly which test+provider combos passed, failed, or were flaky.
282+
[Model Name] ([provider]):
283+
- [N] passed, [N] skipped[, [N] failed]
284+
- [Any notable failures or flakes]
301285
302-
Format:
303-
```
304-
Test Summary: [Model Name]
286+
[Repeat for each model+provider combo]
287+
288+
---
289+
[Model Name] ([provider]):
305290
✅ test_data_gen_all_models_providers[model_enum-provider]
306291
✅ test_data_gen_sample_all_models_providers[model_enum-provider]
307-
✅ test_tools_all_built_in_models[model_enum-provider]
308-
⚠️ test_structured_input_cot_prompt_builder[model_enum-provider] — assert 3 == 5 (content quality flake)
309-
❌ test_all_built_in_models_structured_output[model_enum-provider] — 400 Bad Request (unsupported feature)
292+
✅ test_data_gen_sample_all_models_providers_with_structured_output[model_enum-provider]
293+
✅ test_all_built_in_models_llm_as_judge[model_enum-provider]
294+
✅ test_all_built_in_models_structured_output[model_enum-provider]
295+
✅ test_all_built_in_models_structured_input[model_enum-provider]
296+
✅ test_structured_output_cot_prompt_builder[model_enum-provider]
297+
✅ test_all_models_providers_plaintext[model_enum-provider]
298+
✅ test_cot_prompt_builder[model_enum-provider]
299+
⚠️ test_structured_input_cot_prompt_builder[model_enum-provider] — brief reason
300+
❌ test_name[model_enum-provider] — brief reason
301+
302+
[Repeat for each model+provider combo]
303+
304+
## Checklists
305+
306+
- [X] Tests have been run locally and passed
307+
- [X] New tests have been added to any work in /lib
310308
```
311309

312-
Rules:
313-
- ✅ for passed tests
314-
- ⚠️ for tests that failed due to content quality flakes (e.g. model returned fewer items than expected, weak assertion mismatches) — include a brief reason
315-
- ❌ for tests that failed due to real errors (bad slug, unsupported feature, 400/500 errors) — include a brief reason
316-
- List every test, grouped by provider if the model has multiple providers
317-
- Include extraction tests (Phase 4c) if they were run
310+
**Rules for the PR body:**
311+
- Every test that ran must appear in the per-test dump, using the full pytest parametrize ID
312+
- Group tests by `[Model Name] ([provider]):` headers
313+
- The summary section at the top gives a quick pass/skip/fail count per model+provider
314+
- The detailed section below the `---` lists every individual test result
315+
- Use ⚠️ for content quality flakes (not real failures), ❌ for real errors
318316

319317
---
320318

@@ -333,9 +331,8 @@ Rules:
333331
- [ ] Parallel testing enabled in `pytest.ini` (`addopts = -n 8`)
334332
- [ ] Smoke test passed
335333
- [ ] Full test suite passed
336-
- [ ] Per-model per-test result dump presented with nuance paragraphs
337334
- [ ] Parallel testing reverted in `pytest.ini` (re-commented)
338-
- [ ] Discord announcement drafted (only if user requests it)
335+
- [ ] PR created against `main` with test results in the body
339336

340337
---
341338

.github/workflows/debug_detector.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ jobs:
2424
fi
2525
2626
echo "Checking for TODO or FIXME"
27-
notes=$(grep -nR --exclude-dir=node_modules --exclude-dir=.venv --exclude-dir=.git --exclude-dir=.github --exclude-dir=build --exclude-dir=dist --exclude-dir=.svelte-kit -e 'TODO' -e 'FIXME' . || true)
27+
notes=$(grep -nR --exclude-dir=node_modules --exclude-dir=.venv --exclude-dir=.git --exclude-dir=.github --exclude-dir=build --exclude-dir=dist --exclude-dir=.svelte-kit --exclude=AGENTS.md --exclude=code_review_guidelines.md -e 'TODO' -e 'FIXME' . || true)
2828
if [ -n "$notes" ]; then
2929
echo "$notes"
3030
found=1

AGENTS.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,23 @@ This repo is a monorepo containing all of the source code, in the following stru
2525

2626
### Agent Tools
2727

28-
Agents have access to a range of tools for running tests, linting, formatting and typechecking. Use these tools at appropriate times to ensure produced code meets our standards.
29-
30-
To run all checks in a CLI, run `uv run ./checks.sh --agent-mode` (agent mode will reduce tokens unless there is an error).
28+
Agents have access to a range of tools for running tests, linting, formatting and typechecking. Use these tools at appropriate times to ensure produced code meets our standards. All checks must pass before merging. When iterating on a specific failure, use the targeted command before re-running the full suite.
29+
30+
- **All checks:** `uv run ./checks.sh --agent-mode` (agent mode suppresses output unless there's a failure)
31+
32+
| Check | Fix | Description |
33+
|---|---|---|
34+
| `uv run ruff check` | `uv run ruff check --fix` | Python lint |
35+
| `uv run ruff format --check .` | `uv run ruff format .` | Python format |
36+
| `uv run ty check` || Python type check |
37+
| `uv run python3 -m pytest --benchmark-quiet -q -n auto .` || Python tests |
38+
| `npm run lint` || Web lint (from `app/web_ui`) |
39+
| `npm run format_check` | `npm run format` | Web format (from `app/web_ui`) |
40+
| `npm run check` || Web type check and svelte check (from `app/web_ui`) |
41+
| `npm run test_run` || Web tests (from `app/web_ui`) |
42+
| `npm run build` || Web build (from `app/web_ui`) |
43+
| `app/web_ui/src/lib/check_schema.sh` | `app/web_ui/src/lib/generate_schema.sh` | OpenAPI client up to date |
44+
| `misspell` || Spelling check (optional if not installed) |
3145

3246
### Agent Prompts
3347

@@ -37,7 +51,9 @@ These prompts can be accessed from the `get_prompt` tool, and you may request se
3751

3852
### General Agent Guidance
3953

54+
- When spawning subagents, always use the same model as the current agent
4055
- Don't include comments in code explaining changes, explain changes in chat instead.
56+
- Use `TODO` comments to mark any temporary code, placeholders, or items that must be addressed before merging to main. CI enforces that no `TODO` comments remain on main, so they are a safe way to flag work-in-progress during development. Clean up all `TODO` comments before the final PR.
4157
- Before wrapping up a task, run appropriate tools for linting, testing, formatting and typechecking. Fix any issues you introduced.
4258

4359
### Code Review Guidelines
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
from http import HTTPStatus
2+
from typing import Any
3+
4+
import httpx
5+
6+
from ... import errors
7+
from ...client import AuthenticatedClient, Client
8+
from ...models.create_api_key_response import CreateApiKeyResponse
9+
from ...types import Response
10+
11+
12+
def _get_kwargs() -> dict[str, Any]:
13+
14+
_kwargs: dict[str, Any] = {
15+
"method": "post",
16+
"url": "/v1/create_api_key",
17+
}
18+
19+
return _kwargs
20+
21+
22+
def _parse_response(*, client: AuthenticatedClient | Client, response: httpx.Response) -> CreateApiKeyResponse | None:
23+
if response.status_code == 201:
24+
response_201 = CreateApiKeyResponse.from_dict(response.json())
25+
26+
return response_201
27+
28+
if client.raise_on_unexpected_status:
29+
raise errors.UnexpectedStatus(response.status_code, response.content)
30+
else:
31+
return None
32+
33+
34+
def _build_response(
35+
*, client: AuthenticatedClient | Client, response: httpx.Response
36+
) -> Response[CreateApiKeyResponse]:
37+
return Response(
38+
status_code=HTTPStatus(response.status_code),
39+
content=response.content,
40+
headers=response.headers,
41+
parsed=_parse_response(client=client, response=response),
42+
)
43+
44+
45+
def sync_detailed(
46+
*,
47+
client: AuthenticatedClient,
48+
) -> Response[CreateApiKeyResponse]:
49+
"""Create Api Key
50+
51+
Create a new API key for the authenticated user.
52+
53+
Requires a Kinde OAuth access token (not an API key).
54+
Returns the raw API key which can then be used for subsequent API calls.
55+
56+
Raises:
57+
errors.UnexpectedStatus: If the server returns an undocumented status code and Client.raise_on_unexpected_status is True.
58+
httpx.TimeoutException: If the request takes longer than Client.timeout.
59+
60+
Returns:
61+
Response[CreateApiKeyResponse]
62+
"""
63+
64+
kwargs = _get_kwargs()
65+
66+
response = client.get_httpx_client().request(
67+
**kwargs,
68+
)
69+
70+
return _build_response(client=client, response=response)
71+
72+
73+
def sync(
74+
*,
75+
client: AuthenticatedClient,
76+
) -> CreateApiKeyResponse | None:
77+
"""Create Api Key
78+
79+
Create a new API key for the authenticated user.
80+
81+
Requires a Kinde OAuth access token (not an API key).
82+
Returns the raw API key which can then be used for subsequent API calls.
83+
84+
Raises:
85+
errors.UnexpectedStatus: If the server returns an undocumented status code and Client.raise_on_unexpected_status is True.
86+
httpx.TimeoutException: If the request takes longer than Client.timeout.
87+
88+
Returns:
89+
CreateApiKeyResponse
90+
"""
91+
92+
return sync_detailed(
93+
client=client,
94+
).parsed
95+
96+
97+
async def asyncio_detailed(
98+
*,
99+
client: AuthenticatedClient,
100+
) -> Response[CreateApiKeyResponse]:
101+
"""Create Api Key
102+
103+
Create a new API key for the authenticated user.
104+
105+
Requires a Kinde OAuth access token (not an API key).
106+
Returns the raw API key which can then be used for subsequent API calls.
107+
108+
Raises:
109+
errors.UnexpectedStatus: If the server returns an undocumented status code and Client.raise_on_unexpected_status is True.
110+
httpx.TimeoutException: If the request takes longer than Client.timeout.
111+
112+
Returns:
113+
Response[CreateApiKeyResponse]
114+
"""
115+
116+
kwargs = _get_kwargs()
117+
118+
response = await client.get_async_httpx_client().request(**kwargs)
119+
120+
return _build_response(client=client, response=response)
121+
122+
123+
async def asyncio(
124+
*,
125+
client: AuthenticatedClient,
126+
) -> CreateApiKeyResponse | None:
127+
"""Create Api Key
128+
129+
Create a new API key for the authenticated user.
130+
131+
Requires a Kinde OAuth access token (not an API key).
132+
Returns the raw API key which can then be used for subsequent API calls.
133+
134+
Raises:
135+
errors.UnexpectedStatus: If the server returns an undocumented status code and Client.raise_on_unexpected_status is True.
136+
httpx.TimeoutException: If the request takes longer than Client.timeout.
137+
138+
Returns:
139+
CreateApiKeyResponse
140+
"""
141+
142+
return (
143+
await asyncio_detailed(
144+
client=client,
145+
)
146+
).parsed

app/desktop/studio_server/api_client/kiln_ai_server_client/models/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from .check_model_supported_response import CheckModelSupportedResponse
1414
from .clarify_spec_input import ClarifySpecInput
1515
from .clarify_spec_output import ClarifySpecOutput
16+
from .create_api_key_response import CreateApiKeyResponse
1617
from .examples_for_feedback_item import ExamplesForFeedbackItem
1718
from .examples_with_feedback_item import ExamplesWithFeedbackItem
1819
from .generate_batch_input import GenerateBatchInput
@@ -63,6 +64,7 @@
6364
"CheckModelSupportedResponse",
6465
"ClarifySpecInput",
6566
"ClarifySpecOutput",
67+
"CreateApiKeyResponse",
6668
"ExamplesForFeedbackItem",
6769
"ExamplesWithFeedbackItem",
6870
"GenerateBatchInput",

0 commit comments

Comments
 (0)