Kiln-AI
diff --git a/‎.agents/api_code_review.md‎
Lines changed: 59 additions & 0 deletions b/‎.agents/api_code_review.md‎
Lines changed: 59 additions & 0 deletions
diff --git a/‎.agents/code_review_guidelines.md‎
Lines changed: 9 additions & 0 deletions b/‎.agents/code_review_guidelines.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎app/desktop/studio_server/api_models/copilot_models.py‎
Lines changed: 7 additions & 5 deletions b/‎app/desktop/studio_server/api_models/copilot_models.py‎
Lines changed: 7 additions & 5 deletions
diff --git a/‎app/desktop/studio_server/copilot_api.py‎
Lines changed: 19 additions & 8 deletions b/‎app/desktop/studio_server/copilot_api.py‎
Lines changed: 19 additions & 8 deletions
diff --git a/‎app/desktop/studio_server/data_gen_api.py‎
Lines changed: 88 additions & 22 deletions b/‎app/desktop/studio_server/data_gen_api.py‎
Lines changed: 88 additions & 22 deletions
@@ -0,0 +1,59 @@
+# FastAPI / OpenAPI Standards
+
+Our OpenAPI spec drives our SDK, Scalar docs, and agent tool use (Kiln Chat calls our APIs). Every endpoint must be well-documented and consistently named. Flag violations during code review.
+
+**Required on every endpoint:**
+
+1. **`tags=[...]`** on the route decorator. Every endpoint must belong to a tag group (e.g. `tags=["Projects"]`). Untagged endpoints break Scalar navigation and agent tool discovery. Prefer existing tags, creating new ones only when really needed. All tags should be documented in `tags_metadata` in `server.py`
+2. **`summary=`** on the route decorator. A short, unique name for the operation. Summaries must be unambiguous — if two endpoints could share the same summary (e.g. "Edit Tags"), qualify them ("Edit Run Tags", "Edit Document Tags").
+3. **Docstring** on the handler function (optional if behavior is completely obvious from the path, method, and summary). When provided, docstrings should be terse — one sentence or a fragment. Never pad with filler like "This endpoint allows you to...". Longer descriptions (2–3 sentences) are warranted only when distinguishing easily confused endpoints, documenting non-obvious side effects, or noting prerequisites. Exclude if the `summary` string already covers the same level of detail.
+4. **`Path(description=...)`** on every path parameter, using `Annotated[str, Path(description="...")]` syntax. Recurring ID parameters must use consistent standard descriptions (e.g. `"The unique identifier of the project."`, `"The unique identifier of the task within the project."`).
+5. **`Query(description=...)`** on every query parameter.
+6. **`Field(description=...)`** on Pydantic model properties that aren't completely self-evident from name + type.
+7. **Class docstring** on Pydantic models used as API request/response bodies. These become the schema description in the OpenAPI spec, which agents and SDK users see when inspecting request/response types. Optional but suggested if non-obvious from name.
+
+**Correct HTTP methods:**
+
+- **GET** must be idempotent and side-effect-free. If an endpoint creates, modifies, or deletes data, it must not be GET. We previously had GET endpoints that established connections and ran evaluations — this is wrong and confuses both agents and humans.
+- **POST** for creation and actions that trigger execution.
+- **PATCH** for partial updates.
+- **DELETE** for deletion.
+- The only exception is SSE streaming endpoints, which must use GET due to browser `EventSource` constraints. These must have descriptions explicitly noting the mutation and the SSE reason.
+
+**Naming and path conventions:**
+
+- **Always use plural nouns** in path segments: `/tasks/{task_id}`, never `/task/{task_id}`. Same for `/projects`, `/specs`, `/evals`, `/runs`, `/prompts`, `/documents`, `/skills`, `/run_configs`, etc. We had inconsistencies where GET used plural but POST/PATCH/DELETE used singular — this is confusing and must be caught.
+- **Paths should be descriptive and intuitive.** Paths should follow REST conventions and be clear (as possible) without docstrings. Path and descriptions should distinguishing similar sounding endpoints. If a path could reasonably be improved, suggest a rename.
+- **Consistent path structure** for related resources. All operations on the same resource type should share a common path prefix (e.g. all run config operations under `/run_configs`, not split across `/task_run_config`, `/mcp_run_config`, `/run_config`). Important to not use similar but different prefixes, as this commonly trips up agents.
+- **No trailing slashes** on paths. Use `/run_configs` not `/run_configs/`. Trailing slashes cause inconsistency between endpoints and can break client routing.
+
+**Example of a well-documented endpoint:**
+
+```python
+@app.delete(
+    "/api/projects/{project_id}",
+    summary="Delete Project",
+    tags=["Projects"],
+)
+async def delete_project(
+    project_id: Annotated[
+        str, Path(description="The unique identifier of the project.")
+    ],
+) -> dict:
+    """Removes the project from Kiln but does not delete the files from disk."""
+```
+
+**What to flag in code review:**
+
+- Missing `tags=` on any route decorator
+- Missing `summary=` on any route decorator
+- Missing `Path(description=...)` or `Query(description=...)` on any parameter
+- GET endpoints that perform mutations (unless SSE with documented justification)
+- Singular nouns in path segments where plural is standard
+- Ambiguous or duplicate summaries across endpoints
+- Trailing slashes on paths
+- Inconsistent path naming for the same resource type
+- Wordy or filler-padded docstrings ("This endpoint allows you to...")
+- Docstrings containing code artifacts, raw `Args:` blocks, or formatting that doesn't read as clean prose in OpenAPI
+- Pydantic models used in API request/response types (nested included) missing a class docstring, if the class name alone isn't obvious
+- Custom string types with validator-based constraints that don't surface in the OpenAPI schema. Use `StringConstraints` in the `Annotated` type definition so `minLength`/`maxLength` appear automatically (see `FilenameString`, `SkillNameString` for examples). Don't duplicate constraints in individual `Field()` calls.
@@ -25,6 +25,15 @@ The SDK in `/libs/core` is a SDK/library we expose to third parties. We code rev
 - All visible classes/vars should have docstrings explaining their purpose. These will be pulled into 3rd party docs automatically. The doc strings should be written for 3rd party devs learning the SDK.
 - Performance: the base_adapter and litellm_adapter are performance critical. They are the core run-loop of our agent system. We should avoid anything that would slow them down (file reads should be done once and passed in, etc). It's critical to avoid blocking IO - a process may be executing hundreds of these in parallel.
 
+### FastAPI / OpenAPI Standards
+
+If the change impacts API endpoints, read `.agents/api_code_review.md` for instructions on how to code review API endpoints.
+
+Changes impacting APIs include:
+ - adding/removing/modifying a FastAPI endpoint `@app.get`, `@app.delete`, etc
+ - adding/removing/modifing a pydantic model which is used in an API endpoint, as a input/return value (including nested models)
+
 ### Project specific guide
 
 - **`ModelName` enum and user input:** Do not use the `ModelName` enum for validation or typing of user-provided model identifiers (for example in a Pydantic request body that validates an API payload). Kiln loads additional models over the air; those models can use names that are not members of the locally shipped `ModelName` enum. If request validation is tied to the enum, a model that is valid according to the merged model list will fail validation. Appropriate uses of `ModelName` include aliasing a constant chosen at build time (for example default config that references a known shipped model) and entries inside the `ml_model_list` provider definitions.
+
@@ -8,16 +8,18 @@
 class TaskInfoApi(BaseModel):
     """Task information for copilot API calls."""
 
-    task_prompt: str
-    task_input_schema: str
-    task_output_schema: str
+    task_prompt: str = Field(description="The task's prompt.")
+    task_input_schema: str = Field(description="The task's input JSON schema.")
+    task_output_schema: str = Field(description="The task's output JSON schema.")
 
 
 class TaskMetadataApi(BaseModel):
     """Metadata about the model used for a task."""
 
-    model_name: str
-    model_provider_name: ModelProviderName
+    model_name: str = Field(description="The name of the AI model used.")
+    model_provider_name: ModelProviderName = Field(
+        description="The provider hosting the model (e.g. OpenAI, Anthropic)."
+    )
 
 
 class SyntheticDataGenerationStepConfigApi(BaseModel):
 
@@ -1,4 +1,5 @@
 import logging
+from typing import Annotated
 
 from app.desktop.studio_server.api_client.kiln_ai_server_client.api.copilot import (
     clarify_spec_v1_copilot_clarify_spec_post,
@@ -47,7 +48,7 @@
     get_copilot_api_key,
 )
 from app.desktop.studio_server.utils.response_utils import unwrap_response
-from fastapi import FastAPI, HTTPException
+from fastapi import FastAPI, HTTPException, Path
 from kiln_ai.datamodel import TaskRun
 from kiln_ai.datamodel.basemodel import FilenameString
 from kiln_ai.datamodel.datamodel_enums import Priority
@@ -113,7 +114,7 @@ class CreateSpecWithCopilotRequest(BaseModel):
 
 
 def connect_copilot_api(app: FastAPI):
-    @app.post("/api/copilot/clarify_spec")
+    @app.post("/api/copilot/clarify_spec", tags=["Copilot"])
     async def clarify_spec(input: ClarifySpecApiInput) -> ClarifySpecApiOutput:
         api_key = get_copilot_api_key()
         client = get_authenticated_client(api_key)
@@ -139,7 +140,7 @@ async def clarify_spec(input: ClarifySpecApiInput) -> ClarifySpecApiOutput:
             detail="Unknown error.",
         )
 
-    @app.post("/api/copilot/refine_spec")
+    @app.post("/api/copilot/refine_spec", tags=["Copilot"])
     async def refine_spec(input: RefineSpecApiInput) -> RefineSpecApiOutput:
         api_key = get_copilot_api_key()
         client = get_authenticated_client(api_key)
@@ -165,7 +166,7 @@ async def refine_spec(input: RefineSpecApiInput) -> RefineSpecApiOutput:
             detail="Unknown error.",
         )
 
-    @app.post("/api/copilot/generate_batch")
+    @app.post("/api/copilot/generate_batch", tags=["Copilot"])
     async def generate_batch(input: GenerateBatchApiInput) -> GenerateBatchApiOutput:
         api_key = get_copilot_api_key()
         client = get_authenticated_client(api_key)
@@ -191,7 +192,7 @@ async def generate_batch(input: GenerateBatchApiInput) -> GenerateBatchApiOutput
             detail="Unknown error.",
         )
 
-    @app.post("/api/copilot/question_spec")
+    @app.post("/api/copilot/question_spec", tags=["Copilot"])
     async def question_spec(
         input: SpecQuestionerApiInput,
     ) -> QuestionSet:
@@ -219,7 +220,7 @@ async def question_spec(
             detail="Unknown error.",
         )
 
-    @app.post("/api/copilot/refine_spec_with_question_answers")
+    @app.post("/api/copilot/refine_spec_with_question_answers", tags=["Copilot"])
     async def submit_question_answers(
         request: SubmitAnswersRequest,
     ) -> RefineSpecApiOutput:
@@ -245,9 +246,19 @@ async def submit_question_answers(
             detail="Unknown error.",
         )
 
-    @app.post("/api/projects/{project_id}/tasks/{task_id}/spec_with_copilot")
+    @app.post(
+        "/api/projects/{project_id}/tasks/{task_id}/spec_with_copilot",
+        tags=["Copilot"],
+    )
     async def create_spec_with_copilot(
-        project_id: str, task_id: str, request: CreateSpecWithCopilotRequest
+        project_id: Annotated[
+            str, Path(description="The unique identifier of the project.")
+        ],
+        task_id: Annotated[
+            str,
+            Path(description="The unique identifier of the task within the project."),
+        ],
+        request: CreateSpecWithCopilotRequest,
     ) -> Spec:
         """Create a spec using Kiln Copilot.
 
 
@@ -1,6 +1,6 @@
-from typing import Literal
+from typing import Annotated, Literal
 
-from fastapi import FastAPI, HTTPException
+from fastapi import FastAPI, HTTPException, Path, Query
 from kiln_ai.adapters.adapter_registry import adapter_for_task, load_skills_for_task
 from kiln_ai.adapters.data_gen.data_gen_task import (
     DataGenCategoriesTask,
@@ -55,7 +55,7 @@ class DataGenSampleApiInput(BaseModel):
     topic: list[str] = Field(description="Topic path for sample generation", default=[])
     num_samples: int = Field(description="Number of samples to generate", default=8)
     gen_type: Literal["training", "eval"] = Field(
-        description="The type of task to generate topics for"
+        description="The type of data generation: eval or training."
     )
     guidance: str | None = Field(
         description="Optional custom guidance for generation",
@@ -122,9 +122,20 @@ class SaveQnaPairInput(BaseModel):
 
 
 def connect_data_gen_api(app: FastAPI):
-    @app.post("/api/projects/{project_id}/tasks/{task_id}/generate_categories")
+    @app.post(
+        "/api/projects/{project_id}/tasks/{task_id}/generate_categories",
+        summary="Generate Categories",
+        tags=["Synthetic Data"],
+    )
     async def generate_categories(
-        project_id: str, task_id: str, input: DataGenCategoriesApiInput
+        project_id: Annotated[
+            str, Path(description="The unique identifier of the project.")
+        ],
+        task_id: Annotated[
+            str,
+            Path(description="The unique identifier of the task within the project."),
+        ],
+        input: DataGenCategoriesApiInput,
     ) -> TaskRun:
         project = project_from_id(project_id)
         task = task_from_id(project_id, task_id)
@@ -155,9 +166,20 @@ async def generate_categories(
         categories_run = await adapter.invoke(task_input.model_dump())
         return categories_run
 
-    @app.post("/api/projects/{project_id}/tasks/{task_id}/generate_inputs")
+    @app.post(
+        "/api/projects/{project_id}/tasks/{task_id}/generate_inputs",
+        summary="Generate Inputs",
+        tags=["Synthetic Data"],
+    )
     async def generate_samples(
-        project_id: str, task_id: str, input: DataGenSampleApiInput
+        project_id: Annotated[
+            str, Path(description="The unique identifier of the project.")
+        ],
+        task_id: Annotated[
+            str,
+            Path(description="The unique identifier of the task within the project."),
+        ],
+        input: DataGenSampleApiInput,
     ) -> TaskRun:
         project = project_from_id(project_id)
         task = task_from_id(project_id, task_id)
@@ -187,10 +209,19 @@ async def generate_samples(
         samples_run = await adapter.invoke(task_input.model_dump())
         return samples_run
 
-    @app.post("/api/projects/{project_id}/tasks/{task_id}/save_sample")
+    @app.post(
+        "/api/projects/{project_id}/tasks/{task_id}/save_sample",
+        summary="Save Sample",
+        tags=["Synthetic Data"],
+    )
     async def save_sample(
-        project_id: str,
-        task_id: str,
+        project_id: Annotated[
+            str, Path(description="The unique identifier of the project.")
+        ],
+        task_id: Annotated[
+            str,
+            Path(description="The unique identifier of the task within the project."),
+        ],
         task_run: TaskRun,
     ) -> TaskRun:
         """
@@ -202,12 +233,24 @@ async def save_sample(
         task_run.save_to_file()
         return task_run
 
-    @app.post("/api/projects/{project_id}/tasks/{task_id}/generate_sample")
+    @app.post(
+        "/api/projects/{project_id}/tasks/{task_id}/generate_sample",
+        summary="Generate Sample",
+        tags=["Synthetic Data"],
+    )
     async def generate_sample(
-        project_id: str,
-        task_id: str,
+        project_id: Annotated[
+            str, Path(description="The unique identifier of the project.")
+        ],
+        task_id: Annotated[
+            str,
+            Path(description="The unique identifier of the task within the project."),
+        ],
         sample: DataGenSaveSamplesApiInput,
-        session_id: str | None = None,
+        session_id: Annotated[
+            str | None,
+            Query(description="Optional session ID to group generated samples."),
+        ] = None,
     ) -> TaskRun:
         task = task_from_id(project_id, task_id)
 
@@ -260,12 +303,24 @@ async def generate_sample(
 
         return run
 
-    @app.post("/api/projects/{project_id}/tasks/{task_id}/generate_qna")
+    @app.post(
+        "/api/projects/{project_id}/tasks/{task_id}/generate_qna",
+        summary="Generate Q&A Pairs",
+        tags=["Synthetic Data"],
+    )
     async def generate_qna_pairs(
-        project_id: str,
-        task_id: str,
+        project_id: Annotated[
+            str, Path(description="The unique identifier of the project.")
+        ],
+        task_id: Annotated[
+            str,
+            Path(description="The unique identifier of the task within the project."),
+        ],
         input: DataGenQnaApiInput,
-        session_id: str | None = None,
+        session_id: Annotated[
+            str | None,
+            Query(description="Optional session ID to group generated Q&A pairs."),
+        ] = None,
     ) -> TaskRun:
         project = project_from_id(project_id)
         if not project:
@@ -303,12 +358,23 @@ async def generate_qna_pairs(
 
         return qna_run
 
-    @app.post("/api/projects/{project_id}/tasks/{task_id}/save_qna_pair")
+    @app.post(
+        "/api/projects/{project_id}/tasks/{task_id}/save_qna_pair",
+        summary="Save Q&A Pair",
+        tags=["Synthetic Data"],
+    )
     async def save_qna_pair(
-        project_id: str,
-        task_id: str,
+        project_id: Annotated[
+            str, Path(description="The unique identifier of the project.")
+        ],
+        task_id: Annotated[
+            str,
+            Path(description="The unique identifier of the task within the project."),
+        ],
         input: SaveQnaPairInput,
-        session_id: str,
+        session_id: Annotated[
+            str, Query(description="Session ID to group saved Q&A pairs.")
+        ],
     ) -> TaskRun:
         """
         Save a single QnA pair as a TaskRun. We store the task's system prompt