Update add-model skill: lagging-provider checks and push-gate rules by tawnymanticore · Pull Request #1281 · Kiln-AI/Kiln

tawnymanticore · 2026-04-16T14:05:34Z

What does this PR do?

The stop hook in Claude Code web is super annoying. "hopeful" fix.

Checklists

Tests have been run locally and passed
New tests have been added to any work in /lib

Summary by CodeRabbit

Documentation
- Added operational guidelines: a Phase 1B backfill cross-check on every skill run that validates recent models and flags proposed additions instead of silently changing the registry.
- Expanded Phase 5 with explicit gating of execution when tests or phases fail, plus an “abandon” recovery path to clean up local changes and branches.
- Clarified guidance for using provider endpoints and requesting API keys.

coderabbitai · 2026-04-16T14:05:50Z

📝 Walkthrough

Walkthrough

Adds Phase 1B guidance to run a provider-specific backfill cross-check on every skill invocation (pulling the 10 newest built_in_models entries and validating against Fireworks, Together, and SiliconFlow), extends Phase 5 with gating/abandon rules for Claude Code Web’s stop hook, and updates Lagging Providers usage guidance.

Changes

Cohort / File(s)	Summary
Skill doc `.agents/skills/claude-maintain-models/SKILL.md`	Added Phase 1B: require provider-specific backfill cross-check on every invocation (fetch 10 newest `built_in_models`/git history, validate model availability via Fireworks, Together, SiliconFlow; flag/propose PRs instead of silently adding). Extended Phase 5 with subsections 5.0/5.1 describing gating rules (prevent commit/push/branch creation when tests/smokes fail or phases are incomplete), an explicit abandon path (revert local changes, delete `add-model/*` branches), and updated Lagging Providers guidance including WebFetch/Together API-key handling and prompting the user for keys.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Move Cursor add model skill to .agents and rename #1260 — Related edits to the same claude-maintain-models Phase 5 PR/commit workflow; overlaps gating/PR creation behavior.
Qwen 3.5, Gemini 3.1 Flash Lite, GPT 5.3 Chat/Instant, GPT 5.4 #1103 — Changes touching built_in_models and ModelName enum; relevant to the backfill/cross-check guidance.

Suggested reviewers

scosman
leonardmq
chiang-daniel

Poem

🐰 I hopped through docs to check the list,
I sniffed each model in the mist,
If a provider lags behind,
I flag and bundle, thoughtful-minded,
Branches pruned, the garden's kissed 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is largely incomplete. It lacks detail about what the PR actually does, omits the Related Issues section, doesn't adequately explain the stop hook fix, and provides minimal context despite significant documentation changes.	Expand the description to explain the Phase 1B backfill cross-check addition, Phase 5 operational behavior changes, and lagging provider guidance; include Related Issues link if applicable.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title directly and clearly summarizes the main changes: updates to the add-model skill focusing on lagging-provider checks and push-gate rules, which aligns with the documentation changes adding cross-check validation and stop hook behavior gates.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mike/update-claude-maintain

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the SKILL.md documentation to provide guidance on handling the Claude Code Web stop hook and defines a gating process before pushing changes. The feedback suggests enhancing the abandonment procedure by explicitly deleting local branches to ensure a clean state and including authentication errors as critical failures in the push gate criteria.

github-actions · 2026-04-16T14:08:46Z

📊 Coverage Report

Overall Coverage: 91%

Diff: origin/main...HEAD

No lines with coverage information in this diff.

📊 HTML Coverage Report - Interactive coverage report
📈 Diff Coverage Report - Detailed diff analysis
Github Actions Run - View the full coverage report

coderabbitai

🧹 Nitpick comments (3)

.agents/skills/claude-maintain-models/SKILL.md (3)
537-537: Minor wording: simplify the Together API key instruction.

Line 537 says "ask the user before prompting them to export it" — the "before prompting them" is redundant. Consider:
-If the key isn't set, ask the user before prompting them to export it — don't fail silently onto models.dev.
+If the key isn't set, ask the user to export it — don't fail silently onto models.dev.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/skills/claude-maintain-models/SKILL.md at line 537, Edit the
SKILL.md text that currently reads "If the key isn't set, ask the user before
prompting them to export it — don't fail silently onto models.dev." and simplify
the phrasing by removing the redundant "before prompting them" (e.g., "If the
key isn't set, ask the user to export it — don't fail silently onto
models.dev."). Update the single line at or around the sentence in the Together
API key instructions to the simplified wording.
60-75: Phase 1B design looks good; consider frequency optimization.

The backfill check design is sound and addresses the "lagging providers" gap mentioned in the PR objectives. Running it on every invocation ensures recent models don't miss provider additions, which aligns with keeping the model list current.

One observation: checking 10 models across 3 providers means ~30 API calls every time the skill runs. If this becomes a bottleneck, consider throttling (e.g., only run Phase 1B if the user explicitly invokes discovery mode, or cache results for 24 hours). For now, the thoroughness seems justified.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/skills/claude-maintain-models/SKILL.md around lines 60 - 75, Phase
1B currently runs the lagging-provider backfill check on every invocation by
pulling the top 10 entries from built_in_models in ml_model_list.py and
cross-checking Fireworks/Together/SiliconFlow against KilnModel entries, which
can cost ~30 API calls per run; modify the skill to reduce frequency by either
gating Phase 1B behind an explicit "discovery mode" flag or adding a 24-hour
cache for the backfill results keyed by provider+model (invalidate on model-add
operations), and ensure the logic that reads built_in_models and compares
against KilnModel honors this gate/cache so behavior remains identical when the
check runs.
522-524: Consider adding language specifiers to code blocks.

Static analysis flags these WebFetch command blocks as missing language specifiers (MD040). While WebFetch isn't bash, you could either:

Add ```bash if these are meant to be run in a shell context

Add a generic ```text or ```shell for consistency

Leave unmarked if these are intentionally tool-specific commands

Other WebFetch examples in the document (lines 522-524, 540-542) are inconsistent, so standardizing would improve clarity.

Also applies to: 540-542
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/skills/claude-maintain-models/SKILL.md around lines 522 - 524,
Update the fenced code blocks that contain the WebFetch command (e.g., the block
with "WebFetch https://fireworks.ai/models/fireworks/{model-slug}") to include a
language specifier for consistency (choose one such as ```text, ```shell, or
```bash) and apply the same specifier to the other WebFetch examples in this
document so all WebFetch blocks use the same tag; ensure you only modify the
triple-backtick opening fence to add the chosen specifier for each WebFetch
block (search for the literal "WebFetch" blocks to find all occurrences).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.agents/skills/claude-maintain-models/SKILL.md:
- Line 537: Edit the SKILL.md text that currently reads "If the key isn't set,
ask the user before prompting them to export it — don't fail silently onto
models.dev." and simplify the phrasing by removing the redundant "before
prompting them" (e.g., "If the key isn't set, ask the user to export it — don't
fail silently onto models.dev."). Update the single line at or around the
sentence in the Together API key instructions to the simplified wording.
- Around line 60-75: Phase 1B currently runs the lagging-provider backfill check
on every invocation by pulling the top 10 entries from built_in_models in
ml_model_list.py and cross-checking Fireworks/Together/SiliconFlow against
KilnModel entries, which can cost ~30 API calls per run; modify the skill to
reduce frequency by either gating Phase 1B behind an explicit "discovery mode"
flag or adding a 24-hour cache for the backfill results keyed by provider+model
(invalidate on model-add operations), and ensure the logic that reads
built_in_models and compares against KilnModel honors this gate/cache so
behavior remains identical when the check runs.
- Around line 522-524: Update the fenced code blocks that contain the WebFetch
command (e.g., the block with "WebFetch
https://fireworks.ai/models/fireworks/{model-slug}") to include a language
specifier for consistency (choose one such as ```text, ```shell, or ```bash) and
apply the same specifier to the other WebFetch examples in this document so all
WebFetch blocks use the same tag; ensure you only modify the triple-backtick
opening fence to add the chosen specifier for each WebFetch block (search for
the literal "WebFetch" blocks to find all occurrences).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 937d76cc-beeb-4011-89c5-43264b43b076

📥 Commits

Reviewing files that changed from the base of the PR and between e8fe6b4 and 2975906.

📒 Files selected for processing (1)

.agents/skills/claude-maintain-models/SKILL.md

* KIL-517 Fix misc spec builder bugs and improvements Addresses 11 items: add X button to dismiss questions, preserve answers on failed request, add Created At to spec details, allow whitespace while typing spec names (trim on submit), add priority selector in advanced options, fix autoselect badge persistence, rename FewShotSelector to TaskSampleSelector, fine tune page max-width, add Re-run button for review examples, disable copilot when full trace enabled, and add archive/unarchive to spec details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address Gemini review: use specific question numbers in validation messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address CodeRabbit review: persist dismissed questions across remounts Lift dismissed state to parent like selections/other_texts so dismissals survive component remounts on API failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * KIL-522 Restore persisted model selection on Run page Initialize model from ui_state store (localStorage) instead of empty string so the previously selected model is restored on page load. Also fix the saved-config dropdown to show "custom" immediately instead of "Select an option" while configs load. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * KIL-522 Add one-shot guard to prevent default config from overriding intentional Custom selection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * KIL-534 Add Feedback data model on TaskRun Replace the single `user_feedback` string field on TaskRun with a proper Feedback model that supports multiple feedback entries per run. Feedback is a parented model under TaskRun, stored as separate files to avoid write conflicts when multiple people provide feedback. - Add Feedback model (feedback text + FeedbackSource enum) - Make TaskRun a parent model with feedback children - Remove user_feedback field from TaskRun - Add REST API endpoints (list/create) for feedback on task runs - Update copilot models, utils, and frontend spec builder - Create follow-up ticket KIL-537 for repair UI replacement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add agent policy annotations for feedback API endpoints Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Revert unintended user_feedback renames in copilot code The ticket only asked to remove user_feedback from TaskRun, not rename it in the copilot/spec-builder code which uses it for a different purpose. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove misplaced annotation files, revert copilot renames Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Preserve feedback from spec review as Feedback children When creating TaskRuns from reviewed examples in the copilot flow, create Feedback children (with source=spec-feedback) after saving the run, so review feedback is not lost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * reverts * KIL-537 Replace repair UI with feedback UI Remove all repair UI code (repair form, repair edit form, repair review/accept/delete flows) and replace with a new feedback UI that uses the Feedback data model from KIL-534. - Rename "Output Rating" to "Rating and Feedback" - Add inline feedback list (up to 3, truncated) with "Add Feedback" link - Add "All Feedback" modal with sortable table - Add "Add Feedback" modal using FormContainer - Delete output_repair_edit_form.svelte - Remove model_name/provider/focus_repair_on_appear props from Run Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address AI review feedback: race condition and submit loading state - Add request ID tracking and run ID dedup to load_feedback to prevent race conditions and redundant requests when switching runs - Set add_feedback_submitting = true at start of submit_feedback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Show latest 3 feedbacks in inline preview instead of oldest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * reverted some changes * fixed add feedback dialog UI * outline instead of bg for clickable area * claude compatible mcp.json * steveback * policy anno * Add Fireworks AI provider to GLM 5.1 (#1275) https://getkiln.slack.com/archives/C0AG8U78MNG/p1776274097954549?thread_ts=1776273210.799549&cid=C0AG8U78MNG Co-authored-by: Claude <noreply@anthropic.com> * Add Grok 4.20 and Minimax M2.7 (Together AI) (#1269) * Add Grok 4.20 and Minimax M2.7 TogetherAI provider Added Grok 4.20 (OpenRouter) and TogetherAI provider for Minimax M2.7 to the model list. https://claude.ai/code/session_01S77zSCTFnNW52JiCyWpBoV * Remove reasoning flags from Grok 4.20 Other Grok models on OpenRouter don't set reasoning_capable=True. The model doesn't reliably return reasoning, causing 5 test failures. Removing to match the Kiln pattern for Grok on OpenRouter. https://claude.ai/code/session_01S77zSCTFnNW52JiCyWpBoV * Fix Minimax M2.7 Together AI structured output config The json_schema mode was being ignored by M2.7 on Together AI (model returned plain text instead of JSON). Switch to json_instruction_and_object with reasoning_optional_for_structured_output and optional_r1_thinking parser, matching the M2.5 Together AI config that works reliably. https://claude.ai/code/session_01F1L5ryuY5t2MxQXbNVjQGj --------- Co-authored-by: Claude <noreply@anthropic.com> * Update add-model skill: lagging-provider checks and push-gate rules (#1281) * Update SKILL.md * Update SKILL.md * Update SKILL.md * CR * Workaround for Claude Code web for using anthropic models in paid tests (#1283) * Update SKILL.md * Update SKILL.md * Update SKILL.md * CR * Update SKILL.md * Add Claude Opus 4.7 to model list (#1282) * Add Claude Opus 4.7 to model list (anthropic, openrouter) Adds Anthropic's new Opus 4.7 model with both Anthropic and OpenRouter providers. Introduces CLAUDE_OPUS_4_7_ANTHROPIC_THINKING_LEVELS to support the new "xhigh" and "max" effort levels exclusive to Opus 4.7. * Apply zero-sum swap: demote Opus 4.6 from suggested/featured Opus 4.7 now carries featured_rank=2, editorial_notes, suggested_for_evals, and suggested_for_data_gen. Removing the same flags from Opus 4.6 keeps the suggested/featured count stable across the Claude Opus family. https://claude.ai/code/session_01Xnfzt91McoMdqaiRv1g6xg * Add PDF support to OpenRouter provider for Opus 4.7 Adds KilnMimeType.PDF to multimodal_mime_types and sets multimodal_requires_pdf_as_image=True (OpenRouter's PDF routing through Mistral OCR breaks LiteLLM parsing, so PDFs must be sent as images). https://claude.ai/code/session_01Xnfzt91McoMdqaiRv1g6xg --------- Co-authored-by: Claude <noreply@anthropic.com> --------- Co-authored-by: Sam Fierro <13154106+sfierro@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: scosman <scosman@users.noreply.github.com>

tawnymanticore added 2 commits April 14, 2026 11:58

Update SKILL.md

1785ec4

Update SKILL.md

e8fe6b4

gemini-code-assist bot reviewed Apr 16, 2026

View reviewed changes

Comment thread .agents/skills/claude-maintain-models/SKILL.md Outdated

Comment thread .agents/skills/claude-maintain-models/SKILL.md Outdated

tawnymanticore added 2 commits April 16, 2026 10:53

Update SKILL.md

a13ca9d

CR

2975906

tawnymanticore changed the title ~~Update claude maintain wrt stop hook~~ Update add model skill, stop hook caveats and navigating network egress Apr 16, 2026

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

tawnymanticore changed the title ~~Update add model skill, stop hook caveats and navigating network egress~~ Update add-model skill: lagging-provider checks and push-gate rules Apr 16, 2026

tawnymanticore requested a review from leonardmq April 16, 2026 15:02

leonardmq approved these changes Apr 16, 2026

View reviewed changes

tawnymanticore merged commit a358e73 into main Apr 16, 2026
14 checks passed

tawnymanticore deleted the mike/update-claude-maintain branch April 16, 2026 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update add-model skill: lagging-provider checks and push-gate rules#1281

Update add-model skill: lagging-provider checks and push-gate rules#1281
tawnymanticore merged 4 commits intomainfrom
mike/update-claude-maintain

tawnymanticore commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tawnymanticore commented Apr 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklists

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Coverage Report

Diff: origin/main...HEAD

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tawnymanticore commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

github-actions bot commented Apr 16, 2026 •

edited

Loading