Skip to content

KIL-534 Add Feedback data model on TaskRun#1267

Merged
sfierro merged 6 commits intomainfrom
KIL-534/feedback-data-model
Apr 14, 2026
Merged

KIL-534 Add Feedback data model on TaskRun#1267
sfierro merged 6 commits intomainfrom
KIL-534/feedback-data-model

Conversation

@sfierro
Copy link
Copy Markdown
Contributor

@sfierro sfierro commented Apr 14, 2026

Linear ticket: https://linear.app/kiln-ai/issue/KIL-534/improve-feedback-on-taskrun

Description

Replace the single user_feedback string field on TaskRun with a proper Feedback data model that supports multiple feedback entries per task run.

Changes

  • New Feedback model (libs/core/kiln_ai/datamodel/feedback.py): A KilnParentedModel under TaskRun with feedback (text) and source (FeedbackSource enum: run-page, spec-feedback)
  • TaskRun now extends KilnParentModel so it can have Feedback children, stored as separate files to avoid write conflicts when multiple people give feedback on the same run
  • Removed user_feedback field from TaskRun — no backwards compat needed since we haven't shipped yet
  • New REST API endpoints: GET/POST /api/.../runs/{run_id}/feedback for listing and creating feedback
  • Updated copilot code: ExampleWithFeedbackApi field renamed from user_feedback to feedback, removed user_feedback from task run creation in copilot utils
  • Frontend: Updated spec builder to use new field name, regenerated API schema
  • Follow-up ticket: Created KIL-537 for replacing repair UI with feedback UI

Test plan

  • New unit tests for Feedback model (creation, persistence, parent relationship, serialization)
  • New API tests for feedback endpoints (list, create, error cases)
  • Updated copilot model/utils tests
  • All checks pass (checks.sh --agent-mode)
  • Manual verification of spec builder flow

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added feedback management for task runs: create and list feedback entries with source attribution ("run-page" or "spec-feedback")
    • Feedback is now a first-class structured entity associated with runs (replacing the old single user_feedback field)
  • Tests

    • Added coverage for the feedback data model and feedback API endpoints, including persistence and validation scenarios

Replace the single `user_feedback` string field on TaskRun with a proper
Feedback model that supports multiple feedback entries per run. Feedback
is a parented model under TaskRun, stored as separate files to avoid
write conflicts when multiple people provide feedback.

- Add Feedback model (feedback text + FeedbackSource enum)
- Make TaskRun a parent model with feedback children
- Remove user_feedback field from TaskRun
- Add REST API endpoints (list/create) for feedback on task runs
- Update copilot models, utils, and frontend spec builder
- Create follow-up ticket KIL-537 for repair UI replacement

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Refactors feedback handling: removes user_feedback on TaskRun, introduces a typed Feedback model and FeedbackSource enum, adds GET/POST feedback endpoints for task runs, updates API schema and exports, and adjusts copilot utilities and tests to persist feedback as child objects.

Changes

Cohort / File(s) Summary
Feedback Datamodel
libs/core/kiln_ai/datamodel/feedback.py, libs/core/kiln_ai/datamodel/datamodel_enums.py, libs/core/kiln_ai/datamodel/__init__.py
Add Feedback model and FeedbackSource enum; export both from datamodel package.
TaskRun Model
libs/core/kiln_ai/datamodel/task_run.py
Remove user_feedback field; add KilnParentModel base/parent_of={"feedback": Feedback} mapping and typed feedback() accessor.
Server API & Integration
libs/server/kiln_server/feedback_api.py, libs/server/kiln_server/server.py
Add feedback endpoints (GET/POST) for task runs, CreateFeedbackRequest model, helper resolver, register routes and OpenAPI tag.
API Schema / Web UI Types
app/web_ui/src/lib/api_schema.d.ts
Add Feedback, CreateFeedbackRequest, FeedbackSource schemas; add feedback GET/POST operations; remove user_feedback from TaskRun input/output types.
Desktop Copilot Flow
app/desktop/studio_server/utils/copilot_utils.py, app/desktop/studio_server/copilot_api.py, app/desktop/studio_server/utils/test_copilot_utils.py, app/desktop/studio_server/test_copilot_api.py
Change task-run creation to return DatasetTaskRuns (collects runs and pending feedback); create_task_run_from_reviewed now returns (TaskRun, feedback_text); save pending feedback after run save; update tests/mocks accordingly.
Tests — Datamodel & Server
libs/core/kiln_ai/datamodel/test_feedback.py, libs/server/kiln_server/test_feedback_api.py
Add comprehensive tests for Feedback model (validation, persistence, relationships) and endpoint tests for listing/creating feedback, plus error cases.
Agent Policy Annotations
libs/server/kiln_server/utils/agent_checks/annotations/*_feedback.json
Add agent-check annotations for GET and POST feedback endpoints (allow, no approval).

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Server
    participant TaskRun
    participant Feedback
    participant Disk

    Note over Client,Server: Create Feedback Flow
    Client->>Server: POST /api/projects/{p}/tasks/{t}/runs/{r}/feedback\n{ feedback, source }
    Server->>TaskRun: resolve run by ids
    TaskRun-->>Server: TaskRun instance
    Server->>Feedback: construct Feedback(feedback, source, parent=TaskRun)
    Feedback-->>Server: Feedback instance
    Server->>Disk: save_to_file(Feedback)
    Disk-->>Server: persisted
    Server-->>Client: 200 Created Feedback (id, created_at, ...)

    Note over Client,Server: List Feedback Flow
    Client->>Server: GET /api/.../runs/{r}/feedback
    Server->>TaskRun: resolve run by ids
    TaskRun-->>Server: TaskRun instance
    Server->>TaskRun: feedback(readonly=True)
    TaskRun->>Disk: load child Feedback objects
    Disk-->>TaskRun: list[Feedback]
    TaskRun-->>Server: list[Feedback]
    Server-->>Client: 200 list[Feedback]
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • leonardmq
  • chiang-daniel

Poem

🐰 I hopped a patch from run to tree,

Feedback now sits where it should be.
From run-page chirps to spec-feedback cheer,
POST and GET bring all voices near.
A little save, a stamped ID — hooray for clarity!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding a Feedback data model to TaskRun, which aligns with the core objective of the PR.
Description check ✅ Passed The description is comprehensive and includes all required sections: it explains what the PR does, links to the related Linear ticket, describes the changes made, includes a test plan with status checkmarks, and confirms the CLA.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch KIL-534/feedback-data-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 14, 2026

📊 Coverage Report

Overall Coverage: 91%

Diff: origin/main...HEAD

  • app/desktop/studio_server/copilot_api.py (66.7%): Missing lines 434
  • app/desktop/studio_server/utils/copilot_utils.py (76.9%): Missing lines 234-238,243
  • libs/core/kiln_ai/datamodel/init.py (100%)
  • libs/core/kiln_ai/datamodel/datamodel_enums.py (100%)
  • libs/core/kiln_ai/datamodel/feedback.py (100%)
  • libs/core/kiln_ai/datamodel/task_run.py (80.0%): Missing lines 155
  • libs/server/kiln_server/feedback_api.py (100%)
  • libs/server/kiln_server/server.py (100%)

Summary

  • Total: 73 lines
  • Missing: 8 lines
  • Coverage: 89%

Line-by-line

View line-by-line diff coverage

app/desktop/studio_server/copilot_api.py

Lines 430-438

  430 
  431             for run in task_runs:
  432                 run.save_to_file()
  433                 saved_models.append(run)
! 434                 dataset_runs.save_pending_feedback(run)
  435 
  436             spec.save_to_file()
  437             saved_models.append(spec)
  438         except Exception:

app/desktop/studio_server/utils/copilot_utils.py

Lines 230-247

  230             self._pending_feedback[task_run.id] = feedback_text
  231 
  232     def save_pending_feedback(self, task_run: TaskRun) -> None:
  233         """Create Feedback children for a saved TaskRun if it has pending feedback."""
! 234         if not task_run.id:
! 235             return
! 236         feedback_text = self._pending_feedback.get(task_run.id)
! 237         if feedback_text:
! 238             fb = Feedback(
  239                 feedback=feedback_text,
  240                 source=FeedbackSource.spec_feedback,
  241                 parent=task_run,
  242             )
! 243             fb.save_to_file()
  244 
  245 
  246 def create_dataset_task_runs(
  247     all_examples: list[SampleApi],

libs/core/kiln_ai/datamodel/task_run.py

Lines 151-159

  151         """
  152         return self.thinking_training_data() is not None
  153 
  154     def feedback(self, readonly: bool = False) -> list[Feedback]:
! 155         return super().feedback(readonly=readonly)  # type: ignore
  156 
  157     # Workaround to return typed parent without importing Task
  158     def parent_task(self) -> Union["Task", None]:
  159         if self.parent is None or self.parent.__class__.__name__ != "Task":


Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the feedback system by replacing the single user_feedback field on TaskRun with a multi-source Feedback child model. It introduces new API endpoints for creating and listing feedback, updates the data models and schemas, and renames fields in the UI and client models to maintain consistency. Feedback was provided regarding potential data loss in the spec builder flow, as the previous feedback is now discarded without being migrated to the new model structure.

Comment thread app/desktop/studio_server/utils/copilot_utils.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
libs/server/kiln_server/feedback_api.py (1)

12-21: Consider rejecting whitespace-only feedback.

The min_length=1 constraint rejects empty strings but allows whitespace-only input like " " or "\t". If whitespace-only feedback should be rejected, add pattern or a custom validator.

Proposed fix if whitespace-only should be rejected
+import re
+from pydantic import field_validator
+
 class CreateFeedbackRequest(BaseModel):
     """Request body for creating feedback on a task run."""

     feedback: str = Field(
         min_length=1,
         description="Free-form text feedback on the task run.",
     )
     source: FeedbackSource = Field(
         description="Where this feedback originated.",
     )
+
+    `@field_validator`("feedback")
+    `@classmethod`
+    def feedback_not_blank(cls, v: str) -> str:
+        if not v.strip():
+            raise ValueError("Feedback cannot be blank")
+        return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/feedback_api.py` around lines 12 - 21, The
CreateFeedbackRequest model currently uses feedback: str = Field(min_length=1)
which still permits whitespace-only strings; update validation on the feedback
field in CreateFeedbackRequest to reject whitespace-only input by either adding
a pattern to the Field (e.g., require at least one non-whitespace character) or
adding a Pydantic validator on CreateFeedbackRequest.feedback that strips the
value and raises a ValueError if the stripped string is empty, ensuring the
model rejects strings like "   " while preserving valid non-empty feedback.
libs/core/kiln_ai/datamodel/test_feedback.py (1)

134-135: Specify explicit encoding when opening files.

For cross-platform consistency, explicitly specify encoding="utf-8" when opening JSON files.

Proposed fix
-        with open(fb.path) as f:
+        with open(fb.path, encoding="utf-8") as f:
             data = json.load(f)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/datamodel/test_feedback.py` around lines 134 - 135, Change
the file-open call that reads the JSON fixture so it specifies the encoding
explicitly: when opening fb.path (the with open(fb.path) as f: ... block that
assigns json.load(f) to data) pass encoding="utf-8" to open to ensure consistent
cross-platform decoding.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/core/kiln_ai/datamodel/task_run.py`:
- Around line 154-155: Remove the dead runtime feedback method from TaskRun and
replace it with a TYPE_CHECKING-only stub so type checkers see the signature but
the dynamic method generated by KilnParentModel.__init_subclass__ (via
parent_of={"feedback": Feedback}) remains the actual runtime implementation;
specifically delete the existing def feedback(self, readonly: bool = False) ->
list[Feedback]: return super().feedback(...) and instead add an if
TYPE_CHECKING: stub declaring the same signature and return type (no body) to
preserve type information only.

---

Nitpick comments:
In `@libs/core/kiln_ai/datamodel/test_feedback.py`:
- Around line 134-135: Change the file-open call that reads the JSON fixture so
it specifies the encoding explicitly: when opening fb.path (the with
open(fb.path) as f: ... block that assigns json.load(f) to data) pass
encoding="utf-8" to open to ensure consistent cross-platform decoding.

In `@libs/server/kiln_server/feedback_api.py`:
- Around line 12-21: The CreateFeedbackRequest model currently uses feedback:
str = Field(min_length=1) which still permits whitespace-only strings; update
validation on the feedback field in CreateFeedbackRequest to reject
whitespace-only input by either adding a pattern to the Field (e.g., require at
least one non-whitespace character) or adding a Pydantic validator on
CreateFeedbackRequest.feedback that strips the value and raises a ValueError if
the stripped string is empty, ensuring the model rejects strings like "   "
while preserving valid non-empty feedback.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a7599fc3-a24d-4dd3-a2e0-604613fed91f

📥 Commits

Reviewing files that changed from the base of the PR and between 4a3029d and d469cec.

⛔ Files ignored due to path filters (1)
  • app/desktop/studio_server/api_client/kiln_ai_server_client/models/examples_with_feedback_item.py is excluded by !app/desktop/studio_server/api_client/kiln_ai_server_client/**
📒 Files selected for processing (14)
  • app/desktop/studio_server/api_models/copilot_models.py
  • app/desktop/studio_server/api_models/test_copilot_models.py
  • app/desktop/studio_server/utils/copilot_utils.py
  • app/desktop/studio_server/utils/test_copilot_utils.py
  • app/web_ui/src/lib/api_schema.d.ts
  • app/web_ui/src/routes/(app)/specs/[project_id]/[task_id]/spec_builder/+page.svelte
  • libs/core/kiln_ai/datamodel/__init__.py
  • libs/core/kiln_ai/datamodel/datamodel_enums.py
  • libs/core/kiln_ai/datamodel/feedback.py
  • libs/core/kiln_ai/datamodel/task_run.py
  • libs/core/kiln_ai/datamodel/test_feedback.py
  • libs/server/kiln_server/feedback_api.py
  • libs/server/kiln_server/server.py
  • libs/server/kiln_server/test_feedback_api.py
💤 Files with no reviewable changes (2)
  • app/desktop/studio_server/utils/copilot_utils.py
  • app/desktop/studio_server/utils/test_copilot_utils.py

Comment thread libs/core/kiln_ai/datamodel/task_run.py
sfierro and others added 2 commits April 13, 2026 20:39
The ticket only asked to remove user_feedback from TaskRun, not rename
it in the copilot/spec-builder code which uses it for a different purpose.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sfierro and others added 2 commits April 13, 2026 21:06
When creating TaskRuns from reviewed examples in the copilot flow,
create Feedback children (with source=spec-feedback) after saving
the run, so review feedback is not lost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
app/desktop/studio_server/test_copilot_api.py (1)

420-421: This success case still skips the new task-run/feedback loop.

DatasetTaskRuns() here is empty, so the test never exercises run.save_to_file() or dataset_runs.save_pending_feedback(run). Please seed at least one run and assert the feedback hook is invoked; otherwise the main regression this PR addresses can still pass untested.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/desktop/studio_server/test_copilot_api.py` around lines 420 - 421, The
test stubs out create_dataset_task_runs with an empty DatasetTaskRuns, so it
never exercises run.save_to_file() or dataset_runs.save_pending_feedback(run);
update the mock return_value for
"app.desktop.studio_server.copilot_api.create_dataset_task_runs" to include at
least one seeded run instance (e.g., a DatasetTaskRun/Run object with necessary
attributes/state) so the code path executes, and add assertions that the run's
save_to_file() was called and that dataset_runs.save_pending_feedback(run) (or
the mocked equivalent) was invoked; reference the mocked function
create_dataset_task_runs and the run/save_pending_feedback calls when making the
changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/desktop/studio_server/test_copilot_api.py`:
- Around line 420-421: The test stubs out create_dataset_task_runs with an empty
DatasetTaskRuns, so it never exercises run.save_to_file() or
dataset_runs.save_pending_feedback(run); update the mock return_value for
"app.desktop.studio_server.copilot_api.create_dataset_task_runs" to include at
least one seeded run instance (e.g., a DatasetTaskRun/Run object with necessary
attributes/state) so the code path executes, and add assertions that the run's
save_to_file() was called and that dataset_runs.save_pending_feedback(run) (or
the mocked equivalent) was invoked; reference the mocked function
create_dataset_task_runs and the run/save_pending_feedback calls when making the
changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 77796bda-0016-4058-b4de-575dec72bd4c

📥 Commits

Reviewing files that changed from the base of the PR and between 8c58f6f and b0f890f.

📒 Files selected for processing (4)
  • app/desktop/studio_server/copilot_api.py
  • app/desktop/studio_server/test_copilot_api.py
  • app/desktop/studio_server/utils/copilot_utils.py
  • app/desktop/studio_server/utils/test_copilot_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/desktop/studio_server/utils/copilot_utils.py

@@ -0,0 +1,140 @@
import json
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double check test works for backward compat

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing in next PR

@sfierro sfierro merged commit 1c2f985 into main Apr 14, 2026
14 checks passed
@sfierro sfierro deleted the KIL-534/feedback-data-model branch April 14, 2026 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants