KIL-534 Add Feedback data model on TaskRun#1267
Conversation
Replace the single `user_feedback` string field on TaskRun with a proper Feedback model that supports multiple feedback entries per run. Feedback is a parented model under TaskRun, stored as separate files to avoid write conflicts when multiple people provide feedback. - Add Feedback model (feedback text + FeedbackSource enum) - Make TaskRun a parent model with feedback children - Remove user_feedback field from TaskRun - Add REST API endpoints (list/create) for feedback on task runs - Update copilot models, utils, and frontend spec builder - Create follow-up ticket KIL-537 for repair UI replacement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughRefactors feedback handling: removes Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Server
participant TaskRun
participant Feedback
participant Disk
Note over Client,Server: Create Feedback Flow
Client->>Server: POST /api/projects/{p}/tasks/{t}/runs/{r}/feedback\n{ feedback, source }
Server->>TaskRun: resolve run by ids
TaskRun-->>Server: TaskRun instance
Server->>Feedback: construct Feedback(feedback, source, parent=TaskRun)
Feedback-->>Server: Feedback instance
Server->>Disk: save_to_file(Feedback)
Disk-->>Server: persisted
Server-->>Client: 200 Created Feedback (id, created_at, ...)
Note over Client,Server: List Feedback Flow
Client->>Server: GET /api/.../runs/{r}/feedback
Server->>TaskRun: resolve run by ids
TaskRun-->>Server: TaskRun instance
Server->>TaskRun: feedback(readonly=True)
TaskRun->>Disk: load child Feedback objects
Disk-->>TaskRun: list[Feedback]
TaskRun-->>Server: list[Feedback]
Server-->>Client: 200 list[Feedback]
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📊 Coverage ReportOverall Coverage: 91% Diff: origin/main...HEAD
Summary
Line-by-lineView line-by-line diff coverageapp/desktop/studio_server/copilot_api.pyLines 430-438 430
431 for run in task_runs:
432 run.save_to_file()
433 saved_models.append(run)
! 434 dataset_runs.save_pending_feedback(run)
435
436 spec.save_to_file()
437 saved_models.append(spec)
438 except Exception:app/desktop/studio_server/utils/copilot_utils.pyLines 230-247 230 self._pending_feedback[task_run.id] = feedback_text
231
232 def save_pending_feedback(self, task_run: TaskRun) -> None:
233 """Create Feedback children for a saved TaskRun if it has pending feedback."""
! 234 if not task_run.id:
! 235 return
! 236 feedback_text = self._pending_feedback.get(task_run.id)
! 237 if feedback_text:
! 238 fb = Feedback(
239 feedback=feedback_text,
240 source=FeedbackSource.spec_feedback,
241 parent=task_run,
242 )
! 243 fb.save_to_file()
244
245
246 def create_dataset_task_runs(
247 all_examples: list[SampleApi],libs/core/kiln_ai/datamodel/task_run.pyLines 151-159 151 """
152 return self.thinking_training_data() is not None
153
154 def feedback(self, readonly: bool = False) -> list[Feedback]:
! 155 return super().feedback(readonly=readonly) # type: ignore
156
157 # Workaround to return typed parent without importing Task
158 def parent_task(self) -> Union["Task", None]:
159 if self.parent is None or self.parent.__class__.__name__ != "Task":
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the feedback system by replacing the single user_feedback field on TaskRun with a multi-source Feedback child model. It introduces new API endpoints for creating and listing feedback, updates the data models and schemas, and renames fields in the UI and client models to maintain consistency. Feedback was provided regarding potential data loss in the spec builder flow, as the previous feedback is now discarded without being migrated to the new model structure.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
libs/server/kiln_server/feedback_api.py (1)
12-21: Consider rejecting whitespace-only feedback.The
min_length=1constraint rejects empty strings but allows whitespace-only input like" "or"\t". If whitespace-only feedback should be rejected, addpatternor a custom validator.Proposed fix if whitespace-only should be rejected
+import re +from pydantic import field_validator + class CreateFeedbackRequest(BaseModel): """Request body for creating feedback on a task run.""" feedback: str = Field( min_length=1, description="Free-form text feedback on the task run.", ) source: FeedbackSource = Field( description="Where this feedback originated.", ) + + `@field_validator`("feedback") + `@classmethod` + def feedback_not_blank(cls, v: str) -> str: + if not v.strip(): + raise ValueError("Feedback cannot be blank") + return v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/server/kiln_server/feedback_api.py` around lines 12 - 21, The CreateFeedbackRequest model currently uses feedback: str = Field(min_length=1) which still permits whitespace-only strings; update validation on the feedback field in CreateFeedbackRequest to reject whitespace-only input by either adding a pattern to the Field (e.g., require at least one non-whitespace character) or adding a Pydantic validator on CreateFeedbackRequest.feedback that strips the value and raises a ValueError if the stripped string is empty, ensuring the model rejects strings like " " while preserving valid non-empty feedback.libs/core/kiln_ai/datamodel/test_feedback.py (1)
134-135: Specify explicit encoding when opening files.For cross-platform consistency, explicitly specify
encoding="utf-8"when opening JSON files.Proposed fix
- with open(fb.path) as f: + with open(fb.path, encoding="utf-8") as f: data = json.load(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/core/kiln_ai/datamodel/test_feedback.py` around lines 134 - 135, Change the file-open call that reads the JSON fixture so it specifies the encoding explicitly: when opening fb.path (the with open(fb.path) as f: ... block that assigns json.load(f) to data) pass encoding="utf-8" to open to ensure consistent cross-platform decoding.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/core/kiln_ai/datamodel/task_run.py`:
- Around line 154-155: Remove the dead runtime feedback method from TaskRun and
replace it with a TYPE_CHECKING-only stub so type checkers see the signature but
the dynamic method generated by KilnParentModel.__init_subclass__ (via
parent_of={"feedback": Feedback}) remains the actual runtime implementation;
specifically delete the existing def feedback(self, readonly: bool = False) ->
list[Feedback]: return super().feedback(...) and instead add an if
TYPE_CHECKING: stub declaring the same signature and return type (no body) to
preserve type information only.
---
Nitpick comments:
In `@libs/core/kiln_ai/datamodel/test_feedback.py`:
- Around line 134-135: Change the file-open call that reads the JSON fixture so
it specifies the encoding explicitly: when opening fb.path (the with
open(fb.path) as f: ... block that assigns json.load(f) to data) pass
encoding="utf-8" to open to ensure consistent cross-platform decoding.
In `@libs/server/kiln_server/feedback_api.py`:
- Around line 12-21: The CreateFeedbackRequest model currently uses feedback:
str = Field(min_length=1) which still permits whitespace-only strings; update
validation on the feedback field in CreateFeedbackRequest to reject
whitespace-only input by either adding a pattern to the Field (e.g., require at
least one non-whitespace character) or adding a Pydantic validator on
CreateFeedbackRequest.feedback that strips the value and raises a ValueError if
the stripped string is empty, ensuring the model rejects strings like " "
while preserving valid non-empty feedback.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a7599fc3-a24d-4dd3-a2e0-604613fed91f
⛔ Files ignored due to path filters (1)
app/desktop/studio_server/api_client/kiln_ai_server_client/models/examples_with_feedback_item.pyis excluded by!app/desktop/studio_server/api_client/kiln_ai_server_client/**
📒 Files selected for processing (14)
app/desktop/studio_server/api_models/copilot_models.pyapp/desktop/studio_server/api_models/test_copilot_models.pyapp/desktop/studio_server/utils/copilot_utils.pyapp/desktop/studio_server/utils/test_copilot_utils.pyapp/web_ui/src/lib/api_schema.d.tsapp/web_ui/src/routes/(app)/specs/[project_id]/[task_id]/spec_builder/+page.sveltelibs/core/kiln_ai/datamodel/__init__.pylibs/core/kiln_ai/datamodel/datamodel_enums.pylibs/core/kiln_ai/datamodel/feedback.pylibs/core/kiln_ai/datamodel/task_run.pylibs/core/kiln_ai/datamodel/test_feedback.pylibs/server/kiln_server/feedback_api.pylibs/server/kiln_server/server.pylibs/server/kiln_server/test_feedback_api.py
💤 Files with no reviewable changes (2)
- app/desktop/studio_server/utils/copilot_utils.py
- app/desktop/studio_server/utils/test_copilot_utils.py
The ticket only asked to remove user_feedback from TaskRun, not rename it in the copilot/spec-builder code which uses it for a different purpose. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When creating TaskRuns from reviewed examples in the copilot flow, create Feedback children (with source=spec-feedback) after saving the run, so review feedback is not lost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
app/desktop/studio_server/test_copilot_api.py (1)
420-421: This success case still skips the new task-run/feedback loop.
DatasetTaskRuns()here is empty, so the test never exercisesrun.save_to_file()ordataset_runs.save_pending_feedback(run). Please seed at least one run and assert the feedback hook is invoked; otherwise the main regression this PR addresses can still pass untested.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/desktop/studio_server/test_copilot_api.py` around lines 420 - 421, The test stubs out create_dataset_task_runs with an empty DatasetTaskRuns, so it never exercises run.save_to_file() or dataset_runs.save_pending_feedback(run); update the mock return_value for "app.desktop.studio_server.copilot_api.create_dataset_task_runs" to include at least one seeded run instance (e.g., a DatasetTaskRun/Run object with necessary attributes/state) so the code path executes, and add assertions that the run's save_to_file() was called and that dataset_runs.save_pending_feedback(run) (or the mocked equivalent) was invoked; reference the mocked function create_dataset_task_runs and the run/save_pending_feedback calls when making the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/desktop/studio_server/test_copilot_api.py`:
- Around line 420-421: The test stubs out create_dataset_task_runs with an empty
DatasetTaskRuns, so it never exercises run.save_to_file() or
dataset_runs.save_pending_feedback(run); update the mock return_value for
"app.desktop.studio_server.copilot_api.create_dataset_task_runs" to include at
least one seeded run instance (e.g., a DatasetTaskRun/Run object with necessary
attributes/state) so the code path executes, and add assertions that the run's
save_to_file() was called and that dataset_runs.save_pending_feedback(run) (or
the mocked equivalent) was invoked; reference the mocked function
create_dataset_task_runs and the run/save_pending_feedback calls when making the
changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 77796bda-0016-4058-b4de-575dec72bd4c
📒 Files selected for processing (4)
app/desktop/studio_server/copilot_api.pyapp/desktop/studio_server/test_copilot_api.pyapp/desktop/studio_server/utils/copilot_utils.pyapp/desktop/studio_server/utils/test_copilot_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- app/desktop/studio_server/utils/copilot_utils.py
| @@ -0,0 +1,140 @@ | |||
| import json | |||
There was a problem hiding this comment.
double check test works for backward compat
Linear ticket: https://linear.app/kiln-ai/issue/KIL-534/improve-feedback-on-taskrun
Description
Replace the single
user_feedbackstring field on TaskRun with a proper Feedback data model that supports multiple feedback entries per task run.Changes
Feedbackmodel (libs/core/kiln_ai/datamodel/feedback.py): AKilnParentedModelunderTaskRunwithfeedback(text) andsource(FeedbackSourceenum:run-page,spec-feedback)TaskRunnow extendsKilnParentModelso it can haveFeedbackchildren, stored as separate files to avoid write conflicts when multiple people give feedback on the same runuser_feedbackfield fromTaskRun— no backwards compat needed since we haven't shipped yetGET/POST /api/.../runs/{run_id}/feedbackfor listing and creating feedbackExampleWithFeedbackApifield renamed fromuser_feedbacktofeedback, removeduser_feedbackfrom task run creation in copilot utilsTest plan
Feedbackmodel (creation, persistence, parent relationship, serialization)checks.sh --agent-mode)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests