Summary
Custom skills never auto-trigger when invoked via claude -p (non-interactive/headless mode), regardless of how explicit or directive the skill description is. Precision is 100% (no false positives), but recall is persistently 0% across multiple description iterations.
Environment
- OS: Windows 11 Home 10.0.26200
- Claude plan: Claude Pro (OAuth — no
ANTHROPIC_API_KEY)
- Claude Code version: latest (VS Code extension + CLI)
- skill-creator: marketplace version (Skills 2.0)
- Skill location:
C:\Users\<user>\.claude\skills\<skill-name>\SKILL.md
Steps to Reproduce
- Create a custom skill in
~/.claude/skills/<skill>/SKILL.md with a description following Skills 2.0 format
- Create a trigger eval set (
trigger-eval.json) with 20 queries (10 should-trigger, 10 should-not)
- Run
run_eval.py from skill-creator:
python -m scripts.run_eval \
--eval-set trigger-eval.json \
--skill-path ~/.claude/skills/<skill> \
--runs-per-query 3 --verbose
- Observe results
Observed Behavior
All should-trigger queries score trigger_rate: 0.0 — the skill never auto-triggers in any of the claude -p subprocess calls spawned by run_eval.py.
Example output:
[FAIL] rate=0/3 expected=True: can you search youtube for claude code tutorials?
[FAIL] rate=0/3 expected=True: find me 5 youtube videos about python async programming
[FAIL] rate=0/3 expected=True: /yt-search IfcOpenShell tutorial
[PASS] rate=0/3 expected=False: how do i download a youtube video to watch offline?
[PASS] rate=0/3 expected=False: search google for recent news about llm agents
Results: 10/20 passed — precision=100%, recall=0%, accuracy=50%
Description Tested (v2 — very explicit)
description: >
Fetches live YouTube search results via yt-dlp — real titles, channels,
view counts, upload dates, and URLs that Claude cannot know from training data.
ALWAYS use this skill whenever the user wants to search YouTube or find YouTube
videos on any topic. Never answer YouTube search requests from memory; only this
skill can provide current, accurate video results.
Trigger on any request containing: "search youtube", "find youtube videos",
"look up youtube tutorials", "youtube results", "yt-search", "/yt-search",
"search yt", "pull up youtube videos", "what youtube videos", "find me videos about".
Do NOT trigger for: summarizing a specific video URL, downloading videos, or
general web searches with no YouTube intent.
Even with ALWAYS, explicit keyword lists, and negative examples, recall remains 0%.
Expected Behavior
Skills with highly directive descriptions matching the query content should auto-trigger with meaningful recall (>50%) in headless mode.
Additional Notes
- The skill works perfectly when invoked explicitly via
/yt-search <query> slash command
run_loop.py (the automated optimization loop) is not usable for Claude Pro users — it calls anthropic.Anthropic() SDK directly and requires ANTHROPIC_API_KEY, which is not available with OAuth authentication
- The
run_eval.py script required a Windows-specific fix: select.select() does not work with subprocess pipes on Windows (WinError 10038), replaced with a queue.Queue + background thread pattern
- Tested across 2 full description iterations with no improvement in recall
Workaround
None for auto-triggering. Users must invoke skills explicitly via slash commands.
Suggested Investigation
- Does
claude -p pass available skills to the model differently than interactive mode?
- Is there a flag or env variable to enable skill auto-triggering in headless mode?
- Should
run_eval.py work differently for Pro/OAuth users (no API key)?
Summary
Custom skills never auto-trigger when invoked via
claude -p(non-interactive/headless mode), regardless of how explicit or directive the skill description is. Precision is 100% (no false positives), but recall is persistently 0% across multiple description iterations.Environment
ANTHROPIC_API_KEY)C:\Users\<user>\.claude\skills\<skill-name>\SKILL.mdSteps to Reproduce
~/.claude/skills/<skill>/SKILL.mdwith a description following Skills 2.0 formattrigger-eval.json) with 20 queries (10 should-trigger, 10 should-not)run_eval.pyfromskill-creator:Observed Behavior
All should-trigger queries score
trigger_rate: 0.0— the skill never auto-triggers in any of theclaude -psubprocess calls spawned byrun_eval.py.Example output:
Description Tested (v2 — very explicit)
Even with
ALWAYS, explicit keyword lists, and negative examples, recall remains 0%.Expected Behavior
Skills with highly directive descriptions matching the query content should auto-trigger with meaningful recall (>50%) in headless mode.
Additional Notes
/yt-search <query>slash commandrun_loop.py(the automated optimization loop) is not usable for Claude Pro users — it callsanthropic.Anthropic()SDK directly and requiresANTHROPIC_API_KEY, which is not available with OAuth authenticationrun_eval.pyscript required a Windows-specific fix:select.select()does not work with subprocess pipes on Windows (WinError 10038), replaced with aqueue.Queue+ background thread patternWorkaround
None for auto-triggering. Users must invoke skills explicitly via slash commands.
Suggested Investigation
claude -ppass available skills to the model differently than interactive mode?run_eval.pywork differently for Pro/OAuth users (no API key)?