Skip to content

Workaround for Claude Code web for using anthropic models in paid tests#1283

Merged
tawnymanticore merged 6 commits intomainfrom
mike/update-claude-maintain
Apr 16, 2026
Merged

Workaround for Claude Code web for using anthropic models in paid tests#1283
tawnymanticore merged 6 commits intomainfrom
mike/update-claude-maintain

Conversation

@tawnymanticore
Copy link
Copy Markdown
Collaborator

@tawnymanticore tawnymanticore commented Apr 16, 2026

What does this PR do?

Claude Code Web doesn't allow anthropic API keys to be secrets, workaround.

Checklists

  • Tests have been run locally and passed
  • New tests have been added to any work in /lib

Summary by CodeRabbit

  • Documentation
    • Updated testing workflow documentation with guidance on resolving API key configuration issues when running Anthropic-direct tests, including the recommended command pattern for proper environment variable setup.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c875bd93-db45-48e6-a085-7a37906486c2

📥 Commits

Reviewing files that changed from the base of the PR and between a358e73 and 3e98d91.

📒 Files selected for processing (1)
  • .agents/skills/claude-maintain-models/SKILL.md

📝 Walkthrough

Walkthrough

Added documentation notes to the Anthropic API key testing workflow, highlighting a common environment variable mismatch issue (KILN_ANTHROPIC_API_KEY vs ANTHROPIC_API_KEY) and providing a one-shot environment variable aliasing solution for running tests.

Changes

Cohort / File(s) Summary
Documentation & Workflow
.agents/skills/claude-maintain-models/SKILL.md
Added "Anthropic API key gotcha" note explaining environment variable naming mismatch between application and SDK, with recommended command pattern using inline environment variable aliasing instead of global exports.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Suggested reviewers

  • scosman
  • leonardmq
  • chiang-daniel

Poem

🐰 When keys collide and tests refuse to run,
A mismatch hides where API calls begun,
An alias swift makes the confusion done—
ANTHROPIC_API_KEY="$KILN..." under the sun! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: a workaround for Claude Code Web regarding Anthropic API key handling in paid tests.
Description check ✅ Passed The description covers the main purpose and includes completed checklists, but lacks detail on what the workaround entails and omits the Related Issues section from the template.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mike/update-claude-maintain

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

📊 Coverage Report

Overall Coverage: 91%

Diff: origin/main...HEAD

No lines with coverage information in this diff.


Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation in SKILL.md to include a troubleshooting guide for Anthropic API key mismatches during testing. The review feedback correctly identifies a terminology error, suggesting the use of 'environment variable assignment' instead of 'one-shot alias' to describe the command-line syntax provided.

3. Re-run that single test to verify
4. Only re-run the full suite once the single test passes

**Anthropic API key gotcha:** if an Anthropic-direct test fails with an auth/API key error, check whether the user's environment exports the key as `KILN_ANTHROPIC_API_KEY` instead of `ANTHROPIC_API_KEY` (the Kiln app uses the prefixed name; the Anthropic SDK used by tests expects the unprefixed name). Prepend the test command with a one-shot alias — don't `export` it globally:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The term "one-shot alias" is technically incorrect. In bash, an alias is a command shortcut created with the alias builtin. The example provided is an environment variable assignment for a single command execution. Using precise terminology is important in a skill definition to ensure the AI agent correctly understands the intended action and doesn't attempt to use the alias command incorrectly.

Additionally, as a more robust solution, consider updating the test configuration or the kiln_ai adapter to automatically check for KILN_ANTHROPIC_API_KEY as a fallback when ANTHROPIC_API_KEY is missing. This would remove the need for manual environment variable mapping by the agent.

Suggested change
**Anthropic API key gotcha:** if an Anthropic-direct test fails with an auth/API key error, check whether the user's environment exports the key as `KILN_ANTHROPIC_API_KEY` instead of `ANTHROPIC_API_KEY` (the Kiln app uses the prefixed name; the Anthropic SDK used by tests expects the unprefixed name). Prepend the test command with a one-shot alias — don't `export` it globally:
**Anthropic API key gotcha:** if an Anthropic-direct test fails with an auth/API key error, check whether the user's environment exports the key as KILN_ANTHROPIC_API_KEY instead of ANTHROPIC_API_KEY (the Kiln app uses the prefixed name; the Anthropic SDK used by tests expects the unprefixed name). Prepend the test command with an environment variable assignment — don't export it globally:

@tawnymanticore tawnymanticore merged commit 089dc6c into main Apr 16, 2026
14 checks passed
@tawnymanticore tawnymanticore deleted the mike/update-claude-maintain branch April 16, 2026 16:45
tawnymanticore added a commit that referenced this pull request Apr 16, 2026
* KIL-517 Fix misc spec builder bugs and improvements

Addresses 11 items: add X button to dismiss questions, preserve answers on
failed request, add Created At to spec details, allow whitespace while typing
spec names (trim on submit), add priority selector in advanced options, fix
autoselect badge persistence, rename FewShotSelector to TaskSampleSelector,
fine tune page max-width, add Re-run button for review examples, disable
copilot when full trace enabled, and add archive/unarchive to spec details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address Gemini review: use specific question numbers in validation messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address CodeRabbit review: persist dismissed questions across remounts

Lift dismissed state to parent like selections/other_texts so dismissals
survive component remounts on API failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* KIL-522 Restore persisted model selection on Run page

Initialize model from ui_state store (localStorage) instead of empty
string so the previously selected model is restored on page load.
Also fix the saved-config dropdown to show "custom" immediately
instead of "Select an option" while configs load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* KIL-522 Add one-shot guard to prevent default config from overriding intentional Custom selection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* KIL-534 Add Feedback data model on TaskRun

Replace the single `user_feedback` string field on TaskRun with a proper
Feedback model that supports multiple feedback entries per run. Feedback
is a parented model under TaskRun, stored as separate files to avoid
write conflicts when multiple people provide feedback.

- Add Feedback model (feedback text + FeedbackSource enum)
- Make TaskRun a parent model with feedback children
- Remove user_feedback field from TaskRun
- Add REST API endpoints (list/create) for feedback on task runs
- Update copilot models, utils, and frontend spec builder
- Create follow-up ticket KIL-537 for repair UI replacement

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add agent policy annotations for feedback API endpoints

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert unintended user_feedback renames in copilot code

The ticket only asked to remove user_feedback from TaskRun, not rename
it in the copilot/spec-builder code which uses it for a different purpose.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove misplaced annotation files, revert copilot renames

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Preserve feedback from spec review as Feedback children

When creating TaskRuns from reviewed examples in the copilot flow,
create Feedback children (with source=spec-feedback) after saving
the run, so review feedback is not lost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* reverts

* KIL-537 Replace repair UI with feedback UI

Remove all repair UI code (repair form, repair edit form, repair
review/accept/delete flows) and replace with a new feedback UI that
uses the Feedback data model from KIL-534.

- Rename "Output Rating" to "Rating and Feedback"
- Add inline feedback list (up to 3, truncated) with "Add Feedback" link
- Add "All Feedback" modal with sortable table
- Add "Add Feedback" modal using FormContainer
- Delete output_repair_edit_form.svelte
- Remove model_name/provider/focus_repair_on_appear props from Run

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Address AI review feedback: race condition and submit loading state

- Add request ID tracking and run ID dedup to load_feedback to prevent
  race conditions and redundant requests when switching runs
- Set add_feedback_submitting = true at start of submit_feedback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Show latest 3 feedbacks in inline preview instead of oldest

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* reverted some changes

* fixed add feedback dialog UI

* outline instead of bg for clickable area

* claude compatible mcp.json

* steveback

* policy anno

* Add Fireworks AI provider to GLM 5.1 (#1275)

https://getkiln.slack.com/archives/C0AG8U78MNG/p1776274097954549?thread_ts=1776273210.799549&cid=C0AG8U78MNG

Co-authored-by: Claude <noreply@anthropic.com>

* Add Grok 4.20 and Minimax M2.7 (Together AI) (#1269)

* Add Grok 4.20 and Minimax M2.7 TogetherAI provider

Added Grok 4.20 (OpenRouter) and TogetherAI provider for Minimax M2.7 to the model list.

https://claude.ai/code/session_01S77zSCTFnNW52JiCyWpBoV

* Remove reasoning flags from Grok 4.20

Other Grok models on OpenRouter don't set reasoning_capable=True.
The model doesn't reliably return reasoning, causing 5 test failures.
Removing to match the Kiln pattern for Grok on OpenRouter.

https://claude.ai/code/session_01S77zSCTFnNW52JiCyWpBoV

* Fix Minimax M2.7 Together AI structured output config

The json_schema mode was being ignored by M2.7 on Together AI (model
returned plain text instead of JSON). Switch to json_instruction_and_object
with reasoning_optional_for_structured_output and optional_r1_thinking
parser, matching the M2.5 Together AI config that works reliably.

https://claude.ai/code/session_01F1L5ryuY5t2MxQXbNVjQGj

---------

Co-authored-by: Claude <noreply@anthropic.com>

* Update add-model skill: lagging-provider checks and push-gate rules (#1281)

* Update SKILL.md

* Update SKILL.md

* Update SKILL.md

* CR

* Workaround for Claude Code web for using anthropic models in paid tests (#1283)

* Update SKILL.md

* Update SKILL.md

* Update SKILL.md

* CR

* Update SKILL.md

* Add Claude Opus 4.7 to model list (#1282)

* Add Claude Opus 4.7 to model list (anthropic, openrouter)

Adds Anthropic's new Opus 4.7 model with both Anthropic and OpenRouter
providers. Introduces CLAUDE_OPUS_4_7_ANTHROPIC_THINKING_LEVELS to
support the new "xhigh" and "max" effort levels exclusive to Opus 4.7.

* Apply zero-sum swap: demote Opus 4.6 from suggested/featured

Opus 4.7 now carries featured_rank=2, editorial_notes, suggested_for_evals,
and suggested_for_data_gen. Removing the same flags from Opus 4.6 keeps the
suggested/featured count stable across the Claude Opus family.

https://claude.ai/code/session_01Xnfzt91McoMdqaiRv1g6xg

* Add PDF support to OpenRouter provider for Opus 4.7

Adds KilnMimeType.PDF to multimodal_mime_types and sets
multimodal_requires_pdf_as_image=True (OpenRouter's PDF routing through
Mistral OCR breaks LiteLLM parsing, so PDFs must be sent as images).

https://claude.ai/code/session_01Xnfzt91McoMdqaiRv1g6xg

---------

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Sam Fierro <13154106+sfierro@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: scosman <scosman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants