| name | review-skill | ||||
|---|---|---|---|---|---|
| description | Review a proposed Agent Skill for structural validity and content quality before publishing. Runs the skill-validator CLI to check for structural issues, scores the skill with an LLM judge, and interprets results to advise authors on what to address. Use when a user wants to review, validate, or quality-check an Agent Skill. | ||||
| compatibility | Requires skill-validator CLI. LLM scoring requires an Anthropic or OpenAI API key, the Claude CLI, OR can be skipped for structural-only review. | ||||
| metadata |
|
You are helping a skill author review an Agent Skill before publishing. This is a multi-step process: determine environment, verify prerequisites, run structural validation, review content, optionally run LLM scoring, and interpret results. Follow every step in order.
Check for saved configuration:
cat ~/.config/skill-validator/review-state.yaml 2>/dev/nullIf the state file exists with prereqs_passed: true, offer:
Found saved settings — configured for [provider/structural-only] reviews.
- Continue with saved settings — skip to Step 2
- Re-run prerequisite checks
- Change environment — switch provider or between LLM and structural-only
Option 1: read llm_scoring, provider, and cross_model from the file and
skip to Step 2.
Options 2-3: continue below.
If no state file exists, or the user chose to re-check/change, ask:
LLM scoring uses an Anthropic or OpenAI-compatible API, or the Claude CLI. Without an API key or CLI, we run structural validation only.
- Anthropic — use Claude via the Anthropic API (requires
ANTHROPIC_API_KEY)- OpenAI — use GPT via the OpenAI API (requires
OPENAI_API_KEY)- OpenAI-compatible — use a custom endpoint (Ollama, Groq, Azure, Together, etc.)
- Claude CLI — use the locally authenticated
claudebinary (no API key needed)- Skip LLM scoring — structural validation only
Options 1-4: set LLM_SCORING=true and record the provider choice.
Option 5: set LLM_SCORING=false. Run Step 1a only, then jump to Step 2.
If the user chose option 1 or 2, ask about cross-model comparison:
Scoring with a second model family gives more robust novelty scores, since each model has different training data. This requires API keys for both Anthropic and OpenAI.
- Yes, compare across model families — score with both Anthropic and OpenAI
- No, single provider is fine
Option 1: set CROSS_MODEL=true. Option 2: set CROSS_MODEL=false.
Do not offer cross-model comparison for option 3 (OpenAI-compatible) or option 4 (Claude CLI), since the second provider would need a standard Anthropic or OpenAI key.
After Step 1a, follow references/llm-scoring.md for API key checks before Step 2.
skill-validator --versionIf not found, search common locations (/usr/local/bin, /opt/homebrew/bin,
~/go/bin). If found but not on PATH, tell the user. If not found anywhere,
follow references/install-skill-validator.md.
Do NOT proceed until this succeeds.
If LLM_SCORING=true, complete the provider checks in
references/llm-scoring.md before continuing.
Persist state so future runs skip this step. Replace placeholders with actual values:
mkdir -p ~/.config/skill-validator
cat > ~/.config/skill-validator/review-state.yaml << 'EOF'
prereqs_passed: true
llm_scoring: <true or false>
provider: <anthropic, openai, openai-compatible, or claude-cli>
model: <model name if specified, or "default">
base_url: <custom base URL if openai-compatible, or omit>
cross_model: <true or false>
EOFAsk the user for the path to the skill they want to review, unless they have
already provided it. Verify the path contains a SKILL.md file:
ls <path>/SKILL.mdIf SKILL.md does not exist at the given path, tell the user this is not a
valid skill directory and ask them to provide the correct path.
Run the full check suite:
skill-validator check <path>Capture the exit code:
| Exit code | Meaning |
|---|---|
| 0 | Clean — no errors or warnings |
| 1 | Errors found — must fix before publishing |
| 2 | Warnings only — review but not blocking |
| 3 | CLI/usage error — check the command |
Exit 0: proceed. Exit 2: note warnings, proceed. Exit 1: list errors — these are blocking. The user must fix them before the skill can be published. Do NOT proceed to LLM scoring if exit code is 1.
Read the SKILL.md and any reference files, then evaluate each check below. Report which checks pass and which do not, with specific details on what is missing.
| Check | Criteria |
|---|---|
| Examples | Does the skill provide examples of expected inputs and outputs? |
| Edge cases | Does the skill document common edge cases or failure modes? |
| Scope-gating | Does the skill define when to stop/continue, prerequisites, and conditions for branching paths? |
Flag any failing checks as areas the author should address. These are not blocking but should be resolved before publishing for best results.
If LLM_SCORING=false, skip to Step 6.
If LLM_SCORING=true, follow the "Run LLM Scoring" and "Interpret LLM Scores"
sections of
references/llm-scoring.md.
If LLM_SCORING=true, follow the "Full Review Summary" section of
references/llm-scoring.md.
Include any failing content review checks from Step 4 in the action items.
If LLM_SCORING=false, present structural result, content review result,
areas to address, and a self-assessment checklist using the scoring dimensions
from assets/report.md. Note that LLM scoring was skipped;
advise re-running with an API key or self-assessing against the report
dimensions.
Structure the final summary with these sections in order:
- Structural validation — pass/fail with errors or warnings
- SKILL.md scores — overall and per-dimension table
- Reference scores — per-file table with overall and lowest dimension
- Novelty assessment — mean novelty vs threshold of 3; list
novel_infoper file for author verification - Action items — prioritized list of what to fix
- Recommendation — ready to publish / minor revisions / significant rework