All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Improve error message when the OpenAI API returns an
incorrect_hostnameerror due to regional endpoint requirements (#70). The error now tells the user to setOPENAI_BASE_URLor--base-urlwith the correct regional host (e.g.https://us.api.openai.com/v1).
- Fix false positive in comma-list keyword stuffing heuristic on prose descriptions with inline enumeration lists (#71). The heuristic now checks prose density (average words per segment) so that sentences with enough surrounding prose are not flagged as keyword dumps.
- Support
OPENAI_BASE_URL,OPENAI_ORG_ID, andOPENAI_PROJECT_IDenvironment variables for the OpenAI LLM-as-judge provider (#70). Organization and project headers are only sent when the base URL points to an OpenAI endpoint.
- Fix false-positive orphan warnings on Windows: file paths from the filesystem are now normalized to forward slashes before comparing against markdown references (#63).
- Fix code block detection on Windows: fenced code block regexes now handle CRLF line endings, fixing zero code-block counts when files are checked out with Windows-style line endings.
- Fix backslash paths in token count keys, result messages, and GitHub Actions annotations on Windows.
- Add Windows (
windows-latest) to CI test matrix.
- Contamination warnings now only fire when multiple application programming languages are detected. Skills containing only auxiliary languages (shell, config formats) no longer trigger false positives (#60, #62).
- Link checker now falls back to GET when HEAD returns 404 or 405, matching the standard approach used by lychee and other link validators. Fixes false positives on sites that don't handle HEAD correctly (#45).
- Link checker now sends
Accept: text/htmlheader, fixing false positives on SPAs like crates.io that require content negotiation to serve pages.
- Block SSRF in link validation: the HTTP client now refuses to connect to private/reserved IP addresses (loopback, RFC 1918, link-local, cloud metadata endpoints). Each hop in a redirect chain is checked independently, preventing redirects to internal addresses.
- Block path traversal in internal link checks: relative links that resolve
outside the skill directory (e.g.,
../../etc/passwd) are now rejected instead of being passed toos.Stat.
- SECURITY.md with reporting instructions and scope.
- CONTRIBUTING.md, CODE_OF_CONDUCT.md, PR template, and issue templates.
- Add
claude-cliLLM provider for scoring without API keys (#43, #44). Uses the locally authenticatedclaudebinary, making LLM scoring accessible to users with team or company subscriptions who don't have an explicit API key. Default model issonnet. - Preflight check for the
claudebinary at client creation time, giving a clear error when the CLI is not installed.
- Documentation notes that
claude-cliscores may be less consistent than API-based providers because the CLI loads local context (CLAUDE.md, memory, rules) into each scoring call.
- Add
--allow-dirsflag to accept specific non-standard directories without warnings (#39). Allowed directories are exempt from deep-nesting checks and skipped for orphan detection (with an informational note). Useful for development directories likeevals/ortesting/that aren't part of the spec but are needed during skill development.
- Refactor
--onlyand--skipflags from manual comma-separated string parsing toStringSliceVar, matching the--allow-dirsflag style. Both comma-separated (--only=structure,links) and repeated (--only=structure --only=links) syntax are now supported. Existing comma-separated usage is unaffected. - Restructure
validate structureandcheckflag documentation in the README from dense prose paragraphs into scannable tables.
- Add opt-in rate limiting for LLM API calls during evaluation via
RateLimitoption (#37). Disabled by default (zero value). - Recognize
OWNERS.yamlandOWNERSas known extraneous files so they produce the more specific "not needed in a skill" warning (#33).
- Deduplicate regex patterns into
util/regex.go, fixing tilde-fence stripping in content analysis (#35). - Cache token encoder with
sync.Onceto avoid repeated initialization in batch runs (#34).
- Rate limiter now respects context cancellation instead of blocking until the next tick interval.
- First rate-limited LLM call no longer incurs an unnecessary delay.
- Add
--allow-extra-frontmatterflag to suppress warnings for non-spec frontmatter fields (#27). Useful for teams that embed custom metadata (e.g. internal tags or routing hints) alongside standard skill fields. - Add
--allow-flat-layoutsflag to support skills that keep all files at the root instead of usingreferences/,scripts/, andassets/subdirectories (#23). When enabled, root-level files are treated as standard content for token counting and orphan detection rather than flagged as extraneous.
- Both new flags are available on
validate structureandcheckcommands.
- Fix false positive in comma-separated keyword stuffing heuristic on multi-sentence descriptions with inline enumeration lists (#26). The heuristic now splits descriptions into sentences before checking, so commas in separate sentences are no longer counted together.
- Extract keyword stuffing thresholds into named constants for easier tuning.
- Bump default OpenAI model to GPT 5.2.
- Add CI and review-skill examples to
examples/.
- Increase model name truncation limit in eval compare report.
First stable release. Includes the complete CLI and importable library packages.
validate structure— spec compliance, frontmatter, token counts, code fence integrity, internal link validation, orphan file detection, keyword stuffingvalidate links— external HTTP/HTTPS link validation with template URL supportanalyze content— content quality metrics (density, specificity, imperative ratio)analyze contamination— cross-language contamination detection and scoringcheck— run all deterministic checks with--only/--skipfilteringscore evaluate— LLM-as-judge scoring (Anthropic and OpenAI-compatible providers)score report— view and compare cached LLM scores across models- Output formats: text, JSON, markdown
- GitHub Actions annotations via
--emit-annotations --strictmode for CI (treats warnings as errors)- Multi-skill directory detection
- Pre-commit hook support for all major agent platforms
- Homebrew install via
agent-ecosystem/tap
orchestrate— high-level validation coordinationevaluate— LLM scoring orchestration with caching and progress reportingjudge— LLM client abstraction and scoring (EXPERIMENTAL)structure,content,contamination,links— individual analysis packagesskill— SKILL.md parsing (frontmatter + body)skillcheck— skill detection and reference file analysisreport— output formatting (text, JSON, markdown, GitHub annotations)types— shared data types (Report,Result,Level, etc.)judge.LLMClientinterface for custom LLM providers