Skip to content

Skill auto-triggering: recall=0% in headless mode (claude -p) regardless of description content #32184

@daviddelven

Description

@daviddelven

Summary

Custom skills never auto-trigger when invoked via claude -p (non-interactive/headless mode), regardless of how explicit or directive the skill description is. Precision is 100% (no false positives), but recall is persistently 0% across multiple description iterations.

Environment

  • OS: Windows 11 Home 10.0.26200
  • Claude plan: Claude Pro (OAuth — no ANTHROPIC_API_KEY)
  • Claude Code version: latest (VS Code extension + CLI)
  • skill-creator: marketplace version (Skills 2.0)
  • Skill location: C:\Users\<user>\.claude\skills\<skill-name>\SKILL.md

Steps to Reproduce

  1. Create a custom skill in ~/.claude/skills/<skill>/SKILL.md with a description following Skills 2.0 format
  2. Create a trigger eval set (trigger-eval.json) with 20 queries (10 should-trigger, 10 should-not)
  3. Run run_eval.py from skill-creator:
    python -m scripts.run_eval \
      --eval-set trigger-eval.json \
      --skill-path ~/.claude/skills/<skill> \
      --runs-per-query 3 --verbose
  4. Observe results

Observed Behavior

All should-trigger queries score trigger_rate: 0.0 — the skill never auto-triggers in any of the claude -p subprocess calls spawned by run_eval.py.

Example output:

[FAIL] rate=0/3 expected=True: can you search youtube for claude code tutorials?
[FAIL] rate=0/3 expected=True: find me 5 youtube videos about python async programming
[FAIL] rate=0/3 expected=True: /yt-search IfcOpenShell tutorial
[PASS] rate=0/3 expected=False: how do i download a youtube video to watch offline?
[PASS] rate=0/3 expected=False: search google for recent news about llm agents

Results: 10/20 passed — precision=100%, recall=0%, accuracy=50%

Description Tested (v2 — very explicit)

description: >
  Fetches live YouTube search results via yt-dlp — real titles, channels,
  view counts, upload dates, and URLs that Claude cannot know from training data.
  ALWAYS use this skill whenever the user wants to search YouTube or find YouTube
  videos on any topic. Never answer YouTube search requests from memory; only this
  skill can provide current, accurate video results.
  Trigger on any request containing: "search youtube", "find youtube videos",
  "look up youtube tutorials", "youtube results", "yt-search", "/yt-search",
  "search yt", "pull up youtube videos", "what youtube videos", "find me videos about".
  Do NOT trigger for: summarizing a specific video URL, downloading videos, or
  general web searches with no YouTube intent.

Even with ALWAYS, explicit keyword lists, and negative examples, recall remains 0%.

Expected Behavior

Skills with highly directive descriptions matching the query content should auto-trigger with meaningful recall (>50%) in headless mode.

Additional Notes

  • The skill works perfectly when invoked explicitly via /yt-search <query> slash command
  • run_loop.py (the automated optimization loop) is not usable for Claude Pro users — it calls anthropic.Anthropic() SDK directly and requires ANTHROPIC_API_KEY, which is not available with OAuth authentication
  • The run_eval.py script required a Windows-specific fix: select.select() does not work with subprocess pipes on Windows (WinError 10038), replaced with a queue.Queue + background thread pattern
  • Tested across 2 full description iterations with no improvement in recall

Workaround

None for auto-triggering. Users must invoke skills explicitly via slash commands.

Suggested Investigation

  • Does claude -p pass available skills to the model differently than interactive mode?
  • Is there a flag or env variable to enable skill auto-triggering in headless mode?
  • Should run_eval.py work differently for Pro/OAuth users (no API key)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:skillsbugSomething isn't workinghas reproHas detailed reproduction stepsplatform:windowsIssue specifically occurs on Windows

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions