KIL-534 Add Feedback data model on TaskRun by sfierro · Pull Request #1267 · Kiln-AI/Kiln

sfierro · 2026-04-14T03:30:22Z

Linear ticket: https://linear.app/kiln-ai/issue/KIL-534/improve-feedback-on-taskrun

Description

Replace the single user_feedback string field on TaskRun with a proper Feedback data model that supports multiple feedback entries per task run.

Changes

New Feedback model (libs/core/kiln_ai/datamodel/feedback.py): A KilnParentedModel under TaskRun with feedback (text) and source (FeedbackSource enum: run-page, spec-feedback)
TaskRun now extends KilnParentModel so it can have Feedback children, stored as separate files to avoid write conflicts when multiple people give feedback on the same run
Removed user_feedback field from TaskRun — no backwards compat needed since we haven't shipped yet
New REST API endpoints: GET/POST /api/.../runs/{run_id}/feedback for listing and creating feedback
Updated copilot code: ExampleWithFeedbackApi field renamed from user_feedback to feedback, removed user_feedback from task run creation in copilot utils
Frontend: Updated spec builder to use new field name, regenerated API schema
Follow-up ticket: Created KIL-537 for replacing repair UI with feedback UI

Test plan

New unit tests for Feedback model (creation, persistence, parent relationship, serialization)
New API tests for feedback endpoints (list, create, error cases)
Updated copilot model/utils tests
All checks pass (checks.sh --agent-mode)
Manual verification of spec builder flow

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added feedback management for task runs: create and list feedback entries with source attribution ("run-page" or "spec-feedback")
- Feedback is now a first-class structured entity associated with runs (replacing the old single user_feedback field)
Tests
- Added coverage for the feedback data model and feedback API endpoints, including persistence and validation scenarios

Replace the single `user_feedback` string field on TaskRun with a proper Feedback model that supports multiple feedback entries per run. Feedback is a parented model under TaskRun, stored as separate files to avoid write conflicts when multiple people provide feedback. - Add Feedback model (feedback text + FeedbackSource enum) - Make TaskRun a parent model with feedback children - Remove user_feedback field from TaskRun - Add REST API endpoints (list/create) for feedback on task runs - Update copilot models, utils, and frontend spec builder - Create follow-up ticket KIL-537 for repair UI replacement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-14T03:30:39Z

📝 Walkthrough

Walkthrough

Refactors feedback handling: removes user_feedback on TaskRun, introduces a typed Feedback model and FeedbackSource enum, adds GET/POST feedback endpoints for task runs, updates API schema and exports, and adjusts copilot utilities and tests to persist feedback as child objects.

Changes

Cohort / File(s)	Summary
Feedback Datamodel `libs/core/kiln_ai/datamodel/feedback.py`, `libs/core/kiln_ai/datamodel/datamodel_enums.py`, `libs/core/kiln_ai/datamodel/__init__.py`	Add `Feedback` model and `FeedbackSource` enum; export both from datamodel package.
TaskRun Model `libs/core/kiln_ai/datamodel/task_run.py`	Remove `user_feedback` field; add `KilnParentModel` base/`parent_of={"feedback": Feedback}` mapping and typed `feedback()` accessor.
Server API & Integration `libs/server/kiln_server/feedback_api.py`, `libs/server/kiln_server/server.py`	Add feedback endpoints (GET/POST) for task runs, `CreateFeedbackRequest` model, helper resolver, register routes and OpenAPI tag.
API Schema / Web UI Types `app/web_ui/src/lib/api_schema.d.ts`	Add `Feedback`, `CreateFeedbackRequest`, `FeedbackSource` schemas; add feedback GET/POST operations; remove `user_feedback` from TaskRun input/output types.
Desktop Copilot Flow `app/desktop/studio_server/utils/copilot_utils.py`, `app/desktop/studio_server/copilot_api.py`, `app/desktop/studio_server/utils/test_copilot_utils.py`, `app/desktop/studio_server/test_copilot_api.py`	Change task-run creation to return `DatasetTaskRuns` (collects runs and pending feedback); `create_task_run_from_reviewed` now returns `(TaskRun, feedback_text)`; save pending feedback after run save; update tests/mocks accordingly.
Tests — Datamodel & Server `libs/core/kiln_ai/datamodel/test_feedback.py`, `libs/server/kiln_server/test_feedback_api.py`	Add comprehensive tests for `Feedback` model (validation, persistence, relationships) and endpoint tests for listing/creating feedback, plus error cases.
Agent Policy Annotations `libs/server/kiln_server/utils/agent_checks/annotations/*_feedback.json`	Add agent-check annotations for GET and POST feedback endpoints (allow, no approval).

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Server
    participant TaskRun
    participant Feedback
    participant Disk

    Note over Client,Server: Create Feedback Flow
    Client->>Server: POST /api/projects/{p}/tasks/{t}/runs/{r}/feedback\n{ feedback, source }
    Server->>TaskRun: resolve run by ids
    TaskRun-->>Server: TaskRun instance
    Server->>Feedback: construct Feedback(feedback, source, parent=TaskRun)
    Feedback-->>Server: Feedback instance
    Server->>Disk: save_to_file(Feedback)
    Disk-->>Server: persisted
    Server-->>Client: 200 Created Feedback (id, created_at, ...)

    Note over Client,Server: List Feedback Flow
    Client->>Server: GET /api/.../runs/{r}/feedback
    Server->>TaskRun: resolve run by ids
    TaskRun-->>Server: TaskRun instance
    Server->>TaskRun: feedback(readonly=True)
    TaskRun->>Disk: load child Feedback objects
    Disk-->>TaskRun: list[Feedback]
    TaskRun-->>Server: list[Feedback]
    Server-->>Client: 200 list[Feedback]

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: support nesting task runs into each other #1126 — Modifies TaskRun parent/child wiring and related datamodel changes overlapping task_run.py edits.
Persistence moved to python #979 — Changes Copilot task-run construction flow; touches the same copilot utilities and interfaces.
Fine tune UX v2 #309 — Also edits TaskRun internals (refs/refactors) and may conflict with datamodel base/class changes.

Suggested reviewers

leonardmq
chiang-daniel

Poem

🐰 I hopped a patch from run to tree,

Feedback now sits where it should be.
From run-page chirps to spec-feedback cheer,
POST and GET bring all voices near.
A little save, a stamped ID — hooray for clarity!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.91% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding a Feedback data model to TaskRun, which aligns with the core objective of the PR.
Description check	✅ Passed	The description is comprehensive and includes all required sections: it explains what the PR does, links to the related Linear ticket, describes the changes made, includes a test plan with status checkmarks, and confirms the CLA.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch KIL-534/feedback-data-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-14T03:33:16Z

📊 Coverage Report

Overall Coverage: 91%

Diff: origin/main...HEAD

app/desktop/studio_server/copilot_api.py (66.7%): Missing lines 434
app/desktop/studio_server/utils/copilot_utils.py (76.9%): Missing lines 234-238,243
libs/core/kiln_ai/datamodel/init.py (100%)
libs/core/kiln_ai/datamodel/datamodel_enums.py (100%)
libs/core/kiln_ai/datamodel/feedback.py (100%)
libs/core/kiln_ai/datamodel/task_run.py (80.0%): Missing lines 155
libs/server/kiln_server/feedback_api.py (100%)
libs/server/kiln_server/server.py (100%)

Summary

Total: 73 lines
Missing: 8 lines
Coverage: 89%

Line-by-line

View line-by-line diff coverage

app/desktop/studio_server/copilot_api.py

Lines 430-438

  430 
  431             for run in task_runs:
  432                 run.save_to_file()
  433                 saved_models.append(run)
! 434                 dataset_runs.save_pending_feedback(run)
  435 
  436             spec.save_to_file()
  437             saved_models.append(spec)
  438         except Exception:

app/desktop/studio_server/utils/copilot_utils.py

Lines 230-247

  230             self._pending_feedback[task_run.id] = feedback_text
  231 
  232     def save_pending_feedback(self, task_run: TaskRun) -> None:
  233         """Create Feedback children for a saved TaskRun if it has pending feedback."""
! 234         if not task_run.id:
! 235             return
! 236         feedback_text = self._pending_feedback.get(task_run.id)
! 237         if feedback_text:
! 238             fb = Feedback(
  239                 feedback=feedback_text,
  240                 source=FeedbackSource.spec_feedback,
  241                 parent=task_run,
  242             )
! 243             fb.save_to_file()
  244 
  245 
  246 def create_dataset_task_runs(
  247     all_examples: list[SampleApi],

libs/core/kiln_ai/datamodel/task_run.py

Lines 151-159

  151         """
  152         return self.thinking_training_data() is not None
  153 
  154     def feedback(self, readonly: bool = False) -> list[Feedback]:
! 155         return super().feedback(readonly=readonly)  # type: ignore
  156 
  157     # Workaround to return typed parent without importing Task
  158     def parent_task(self) -> Union["Task", None]:
  159         if self.parent is None or self.parent.__class__.__name__ != "Task":

📊 HTML Coverage Report - Interactive coverage report
📈 Diff Coverage Report - Detailed diff analysis
Github Actions Run - View the full coverage report

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request refactors the feedback system by replacing the single user_feedback field on TaskRun with a multi-source Feedback child model. It introduces new API endpoints for creating and listing feedback, updates the data models and schemas, and renames fields in the UI and client models to maintain consistency. Feedback was provided regarding potential data loss in the spec builder flow, as the previous feedback is now discarded without being migrated to the new model structure.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

libs/server/kiln_server/feedback_api.py (1)

12-21: Consider rejecting whitespace-only feedback.

The min_length=1 constraint rejects empty strings but allows whitespace-only input like " " or "\t". If whitespace-only feedback should be rejected, add pattern or a custom validator.

Proposed fix if whitespace-only should be rejected

+import re
+from pydantic import field_validator
+
 class CreateFeedbackRequest(BaseModel):
     """Request body for creating feedback on a task run."""

     feedback: str = Field(
         min_length=1,
         description="Free-form text feedback on the task run.",
     )
     source: FeedbackSource = Field(
         description="Where this feedback originated.",
     )
+
+    `@field_validator`("feedback")
+    `@classmethod`
+    def feedback_not_blank(cls, v: str) -> str:
+        if not v.strip():
+            raise ValueError("Feedback cannot be blank")
+        return v

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/feedback_api.py` around lines 12 - 21, The
CreateFeedbackRequest model currently uses feedback: str = Field(min_length=1)
which still permits whitespace-only strings; update validation on the feedback
field in CreateFeedbackRequest to reject whitespace-only input by either adding
a pattern to the Field (e.g., require at least one non-whitespace character) or
adding a Pydantic validator on CreateFeedbackRequest.feedback that strips the
value and raises a ValueError if the stripped string is empty, ensuring the
model rejects strings like "   " while preserving valid non-empty feedback.

libs/core/kiln_ai/datamodel/test_feedback.py (1)

134-135: Specify explicit encoding when opening files.

For cross-platform consistency, explicitly specify encoding="utf-8" when opening JSON files.

Proposed fix

-        with open(fb.path) as f:
+        with open(fb.path, encoding="utf-8") as f:
             data = json.load(f)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/datamodel/test_feedback.py` around lines 134 - 135, Change
the file-open call that reads the JSON fixture so it specifies the encoding
explicitly: when opening fb.path (the with open(fb.path) as f: ... block that
assigns json.load(f) to data) pass encoding="utf-8" to open to ensure consistent
cross-platform decoding.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/core/kiln_ai/datamodel/task_run.py`:
- Around line 154-155: Remove the dead runtime feedback method from TaskRun and
replace it with a TYPE_CHECKING-only stub so type checkers see the signature but
the dynamic method generated by KilnParentModel.__init_subclass__ (via
parent_of={"feedback": Feedback}) remains the actual runtime implementation;
specifically delete the existing def feedback(self, readonly: bool = False) ->
list[Feedback]: return super().feedback(...) and instead add an if
TYPE_CHECKING: stub declaring the same signature and return type (no body) to
preserve type information only.

---

Nitpick comments:
In `@libs/core/kiln_ai/datamodel/test_feedback.py`:
- Around line 134-135: Change the file-open call that reads the JSON fixture so
it specifies the encoding explicitly: when opening fb.path (the with
open(fb.path) as f: ... block that assigns json.load(f) to data) pass
encoding="utf-8" to open to ensure consistent cross-platform decoding.

In `@libs/server/kiln_server/feedback_api.py`:
- Around line 12-21: The CreateFeedbackRequest model currently uses feedback:
str = Field(min_length=1) which still permits whitespace-only strings; update
validation on the feedback field in CreateFeedbackRequest to reject
whitespace-only input by either adding a pattern to the Field (e.g., require at
least one non-whitespace character) or adding a Pydantic validator on
CreateFeedbackRequest.feedback that strips the value and raises a ValueError if
the stripped string is empty, ensuring the model rejects strings like "   "
while preserving valid non-empty feedback.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a7599fc3-a24d-4dd3-a2e0-604613fed91f

📥 Commits

Reviewing files that changed from the base of the PR and between 4a3029d and d469cec.

⛔ Files ignored due to path filters (1)

app/desktop/studio_server/api_client/kiln_ai_server_client/models/examples_with_feedback_item.py is excluded by !app/desktop/studio_server/api_client/kiln_ai_server_client/**

📒 Files selected for processing (14)

app/desktop/studio_server/api_models/copilot_models.py
app/desktop/studio_server/api_models/test_copilot_models.py
app/desktop/studio_server/utils/copilot_utils.py
app/desktop/studio_server/utils/test_copilot_utils.py
app/web_ui/src/lib/api_schema.d.ts
app/web_ui/src/routes/(app)/specs/[project_id]/[task_id]/spec_builder/+page.svelte
libs/core/kiln_ai/datamodel/__init__.py
libs/core/kiln_ai/datamodel/datamodel_enums.py
libs/core/kiln_ai/datamodel/feedback.py
libs/core/kiln_ai/datamodel/task_run.py
libs/core/kiln_ai/datamodel/test_feedback.py
libs/server/kiln_server/feedback_api.py
libs/server/kiln_server/server.py
libs/server/kiln_server/test_feedback_api.py

💤 Files with no reviewable changes (2)

app/desktop/studio_server/utils/copilot_utils.py
app/desktop/studio_server/utils/test_copilot_utils.py

The ticket only asked to remove user_feedback from TaskRun, not rename it in the copilot/spec-builder code which uses it for a different purpose. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When creating TaskRuns from reviewed examples in the copilot flow, create Feedback children (with source=spec-feedback) after saving the run, so review feedback is not lost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

app/desktop/studio_server/test_copilot_api.py (1)
420-421: This success case still skips the new task-run/feedback loop.

DatasetTaskRuns() here is empty, so the test never exercises run.save_to_file() or dataset_runs.save_pending_feedback(run). Please seed at least one run and assert the feedback hook is invoked; otherwise the main regression this PR addresses can still pass untested.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/desktop/studio_server/test_copilot_api.py` around lines 420 - 421, The
test stubs out create_dataset_task_runs with an empty DatasetTaskRuns, so it
never exercises run.save_to_file() or dataset_runs.save_pending_feedback(run);
update the mock return_value for
"app.desktop.studio_server.copilot_api.create_dataset_task_runs" to include at
least one seeded run instance (e.g., a DatasetTaskRun/Run object with necessary
attributes/state) so the code path executes, and add assertions that the run's
save_to_file() was called and that dataset_runs.save_pending_feedback(run) (or
the mocked equivalent) was invoked; reference the mocked function
create_dataset_task_runs and the run/save_pending_feedback calls when making the
changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/desktop/studio_server/test_copilot_api.py`:
- Around line 420-421: The test stubs out create_dataset_task_runs with an empty
DatasetTaskRuns, so it never exercises run.save_to_file() or
dataset_runs.save_pending_feedback(run); update the mock return_value for
"app.desktop.studio_server.copilot_api.create_dataset_task_runs" to include at
least one seeded run instance (e.g., a DatasetTaskRun/Run object with necessary
attributes/state) so the code path executes, and add assertions that the run's
save_to_file() was called and that dataset_runs.save_pending_feedback(run) (or
the mocked equivalent) was invoked; reference the mocked function
create_dataset_task_runs and the run/save_pending_feedback calls when making the
changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 77796bda-0016-4058-b4de-575dec72bd4c

📥 Commits

Reviewing files that changed from the base of the PR and between 8c58f6f and b0f890f.

📒 Files selected for processing (4)

app/desktop/studio_server/copilot_api.py
app/desktop/studio_server/test_copilot_api.py
app/desktop/studio_server/utils/copilot_utils.py
app/desktop/studio_server/utils/test_copilot_utils.py

🚧 Files skipped from review as they are similar to previous changes (1)

app/desktop/studio_server/utils/copilot_utils.py

sfierro · 2026-04-14T20:22:44Z

@@ -0,0 +1,140 @@
+import json


double check test works for backward compat

doing in next PR

Add agent policy annotations for feedback API endpoints

7f3e448

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist bot reviewed Apr 14, 2026

View reviewed changes

Comment thread app/desktop/studio_server/utils/copilot_utils.py

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

Comment thread libs/core/kiln_ai/datamodel/task_run.py

sfierro and others added 2 commits April 13, 2026 20:39

Revert unintended user_feedback renames in copilot code

a011fe1

The ticket only asked to remove user_feedback from TaskRun, not rename it in the copilot/spec-builder code which uses it for a different purpose. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove misplaced annotation files, revert copilot renames

8c58f6f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sfierro requested review from chiang-daniel, leonardmq, scosman and tawnymanticore April 14, 2026 03:45

sfierro and others added 2 commits April 13, 2026 21:06

reverts

7643497

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

leonardmq approved these changes Apr 14, 2026

View reviewed changes

sfierro commented Apr 14, 2026

View reviewed changes

sfierro merged commit 1c2f985 into main Apr 14, 2026
14 checks passed

sfierro deleted the KIL-534/feedback-data-model branch April 14, 2026 23:58

Conversation

sfierro commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Coverage Report

Diff: origin/main...HEAD

Summary

Line-by-line

app/desktop/studio_server/copilot_api.py

app/desktop/studio_server/utils/copilot_utils.py

libs/core/kiln_ai/datamodel/task_run.py

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sfierro Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

sfierro Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sfierro commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 14, 2026 •

edited

Loading

github-actions bot commented Apr 14, 2026 •

edited

Loading