Skip to content

feat: review system v2 — shared skills, bugfixes, analysis#304

Draft
nsheaps wants to merge 8 commits intomainfrom
claude/review-system-analysis-HieLF
Draft

feat: review system v2 — shared skills, bugfixes, analysis#304
nsheaps wants to merge 8 commits intomainfrom
claude/review-system-analysis-HieLF

Conversation

@nsheaps
Copy link
Copy Markdown
Owner

@nsheaps nsheaps commented Mar 24, 2026

Summary

  • Fix CDATA wrapper bug — restructured prompt instructions so the <![CDATA[...]]> prohibition is prominent and explicit
  • Fix non-blocking feedback overuse — raised the bar for COMMENT verdicts; suggestions that would improve code are now REQUEST_CHANGES, and non-blocking items require justification
  • Shared self-review skill — extracted multi-agent review logic into shared/skills/self-review/, symlinked into scm-utils
  • Scoring fixes — aligned emoji/badge thresholds (65/85), added calibration rubric, concrete scoring ceiling (84% max with warnings)
  • De-duplicated verdict criteria — self-review references prompt template instead of maintaining a third copy
  • Verdict criteria synced — plugin template now matches full CI prompt verdict section

Files Changed (8 files, +185/-68)

  • .github/prompts/claude-code-review.md — CDATA prohibition + verdict criteria restructured
  • plugins/scm-utils/.../prompt-template.md — verdict criteria synced from CI prompt
  • shared/skills/self-review/SKILL.md — NEW: shared multi-agent review procedure (107 lines)
  • plugins/scm-utils/skills/self-review/ — symlink to shared
  • plugins/scm-utils/skills/code-review/SKILL.md — verdict table updated to v2 phrasing
  • .claude/skills/code-review/SKILL.md — REMOVED (replaced by shared skill)
  • plugins/scm-utils/.claude-plugin/plugin.json — version bump to 0.2.0
  • .claude/.gitignore — ignore ephemeral pr-review reports

Test plan

  • Verify scm-utils plugin loads with new skill
  • Trigger a review on a test PR to confirm CDATA fix
  • Verify non-blocking feedback is stricter in review output
  • Test self-review skill invocation locally
  • Validate symlink resolves correctly after plugin install

https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM

@nsheaps nsheaps added the request-review Request a one-time review from the Claude review bot (label is removed after review starts) label Mar 24, 2026 — with Claude
@henry-nsheaps henry-nsheaps Bot removed the request-review Request a one-time review from the Claude review bot (label is removed after review starts) label Mar 24, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 24, 2026

Plugin Version Status

Versions are auto-bumped in PRs. Manual bumps to higher versions are preserved.

Plugin Base Current Action
scm-utils 0.1.14 0.2.0 Already bumped

Copy link
Copy Markdown
Contributor

@henry-nsheaps henry-nsheaps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Good improvements — a few issues to address before merge

⚠️ Prompt template diverges from CI prompt — missing verdict criteria that the CI prompt includes (see inline thread)
⚠️ self-review SKILL.md has conflicting/arbitrary scoring rules (lines 64-65) that clash with CI prompt format requirements
✅ CDATA prohibition is now explicit and prominent in both prompts
✅ Verdict criteria restructured into clear decision tree
✅ Non-blocking feedback bar raised appropriately with justification requirement
✅ Shared skills architecture with correct symlinks
✅ Version bump 0.1.13 → 0.2.0 appropriate for new features
✅ Analysis doc and migration guide are comprehensive and accurate
✅ All referenced workflow/action files verified to exist
✅ PR description accurately matches code changes

🖱️ Click to expand for full details

Code Quality (90%)

The PR makes well-structured improvements across the review system. The bugfixes (CDATA, verdict overuse) address real problems observed in production reviews. The shared skills architecture is a meaningful DRY improvement — extracting the multi-agent review logic from a project-level skill into shared/skills/ and symlinking into the plugin.

Strengths:

  • The verdict criteria restructuring (from flat CRITICAL lines to a structured decision tree) is significantly more readable and actionable
  • The footnote section cleanup (separating "must include" from "must not include") removes a confusing numbered list where items 3-5 were exclusions mixed into an inclusion list
  • The follow-up recommendation format now requires justification, which prevents lazy "non-blocking" feedback

Score rationale: 90% because the core changes are solid, but the template sync issue (see below) introduces a quality gap between the distributable template and the CI prompt.

Template Synchronization (Key Issue)

The CI prompt (.github/prompts/claude-code-review.md) now includes substantially more verdict guidance than the plugin template (prompt-template.md). Specifically, the template is missing:

  • Style/convention violation criteria for REQUEST_CHANGES
  • Maintainability/complexity criteria for REQUEST_CHANGES
  • The "previous feedback adequately addressed" criterion for APPROVE
  • The "bar for non-blocking feedback" paragraph
  • The APPROVE "if code changes will break something, do NOT approve" CRITICAL line

The analysis document (§3.1, §3.5) acknowledges this divergence as known debt, but this PR actively widens the gap by adding content to the CI prompt that isn't reflected in the template. See the inline comment on prompt-template.md for a concrete suggestion.

Shared Skills Architecture

The self-review and parallel-review skills are well-structured. The symlinks are correct:

  • plugins/scm-utils/skills/self-review/SKILL.md../../../../shared/skills/self-review/SKILL.md
  • plugins/scm-utils/skills/parallel-review/SKILL.md../../../../shared/skills/parallel-review/SKILL.md

The removed .claude/skills/code-review/SKILL.md content is fully captured in the new shared/skills/self-review/SKILL.md with improvements (verdict criteria, CDATA prohibition, inline comment conventions).

However, self-review/SKILL.md has two scoring rules (lines 64-65) that are problematic — see inline comments for details.

Documentation

Both new docs are comprehensive and accurate:

  • docs/review-system-analysis.md — thorough architecture overview with honest assessment of weaknesses
  • docs/review-system-v2-migration-guide.md — clear migration steps with before/after diagrams and rollback instructions

All component references in the analysis doc were verified against the actual filesystem.

Simplicity (85%)

The PR adds 899 lines across 11 files. The two documentation files (524 lines combined) are valuable as system documentation but add maintenance surface. The shared skills + symlink pattern is the right approach for DRY. Score reduced from 100% because the docs are comprehensive but lengthy for what is largely a prompt restructuring + architecture reorganization.

Confidence (85%)

High confidence in the prompt/template changes — the diffs are clear and the improvements are straightforward. Moderate confidence in how the shared skills will behave at runtime, since skills are interpreted by the agent and the scoring rules in self-review could produce unexpected interactions with the CI prompt's format requirements.

Recommended follow-ups (non-blocking — each item MUST explain why it shouldn't be addressed in this PR):

  • Unify CI prompt and template from a single source — requires architectural work (§5.1 of analysis) beyond this PR's scope; the analysis doc already tracks this as high priority
  • Add output validation for review markdown (strip CDATA, verify details/summary structure) — requires a new post-processing hook or action, tracked as §5.3 in analysis

Notes:123

Footnotes

  1. Workflow Run: https://github.com/nsheaps/ai-mktpl/actions/runs/23508568710/attempts/1

  2. PR: #304

  3. GitHub autolinked references: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/autolinked-references-and-urls

Comment thread shared/skills/self-review/SKILL.md Outdated

When all agents complete, review each report. Compare results across dimensions to build a complete picture. Create one overall report including:

- Score table with emoji indicators:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This constraint ("maximum overall score is 94%") is arbitrary and may produce confusing behavior. An agent implementing this skill would need to artificially cap its score even if every individual category scores 85%+. Consider either:

  1. Removing this rule and letting the synthesized score reflect the actual category scores
  2. Documenting the rationale (e.g., "a warning in any category means the PR has room for improvement, so the overall should reflect that")

As-is, this creates a magic number without justification, making it harder for agents to reason about scoring consistently.

Suggested change
- Score table with emoji indicators:
- If any category has ⚠️, the overall score should reflect that no category achieved ✅ level

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied the suggested change — replaced the arbitrary 94% cap with "the overall score should reflect that no category achieved ✅ level" in f97c75f.

Comment thread shared/skills/self-review/SKILL.md Outdated
When all agents complete, review each report. Compare results across dimensions to build a complete picture. Create one overall report including:

- Score table with emoji indicators:
- 🚨 Score below 70%
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ The "keep the final review to just the table" instruction at >95% conflicts with the CI prompt's requirement for <details>/<summary> blocks, footnotes, and badges in every review. An agent following both the CI prompt and this skill would get contradictory instructions.

Consider aligning this with the CI prompt's mandatory format, or adding a qualifier like "when used outside CI":

Suggested change
- 🚨 Score below 70%
- If overall score is >95%, the detailed section can be minimal (just the table and a brief summary)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied — changed to "the detailed section can be minimal (just the table and a brief summary)" to avoid conflicting with CI format requirements. f97c75f

Comment thread docs/review-system-analysis.md Outdated

### 3.1 Review Logic Split (Improved in v2)

Previously, review instructions existed in three divergent places. In v2, the architecture was restructured:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Good self-awareness here — this is the key architectural debt. The PR improves the situation (shared skills, DRY for review logic) but the CI prompt ↔ template divergence remains. The analysis correctly identifies this as a future improvement area rather than trying to solve it in this PR.

Comment on lines +131 to +141
- Security, performance, or correctness issues must be fixed before merging
- The code would improve meaningfully from a suggested change and the change is straightforward to make
- CRITICAL: If a change would make the code better and it's reasonable to do before merge, it's a requested change, not a suggestion

Use **"APPROVE"** when:
- The PR is ready to merge as-is with no outstanding issues

Use **"COMMENT"** only when ALL of these are true:
- The code won't break if merged as-is
- The suggestions are genuinely optional and there is a clear reason why each should NOT be addressed in this PR
- CRITICAL: If a suggestion would improve the code and is reasonable to implement, use REQUEST_CHANGES. Non-blocking feedback must have a justification.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ The CI prompt (.github/prompts/claude-code-review.md) has significantly more detail in its verdict section — including style/convention violations, maintainability criteria, and the full "bar for non-blocking feedback" paragraph. The template here is a stripped-down version.

This asymmetry means repos using the template will get a less rigorous review than this repo's CI prompt. If the template is the canonical distributable version, it should include the full verdict criteria. The extra ~10 lines from the CI prompt would make the template complete:

Suggested change
- Security, performance, or correctness issues must be fixed before merging
- The code would improve meaningfully from a suggested change and the change is straightforward to make
- CRITICAL: If a change would make the code better and it's reasonable to do before merge, it's a requested change, not a suggestion
Use **"APPROVE"** when:
- The PR is ready to merge as-is with no outstanding issues
Use **"COMMENT"** only when ALL of these are true:
- The code won't break if merged as-is
- The suggestions are genuinely optional and there is a clear reason why each should NOT be addressed in this PR
- CRITICAL: If a suggestion would improve the code and is reasonable to implement, use REQUEST_CHANGES. Non-blocking feedback must have a justification.
Use **"REQUEST_CHANGES"** when:
- Security, performance, or correctness issues must be fixed before merging
- Style or convention violations need correction before merging
- Maintainability or complexity issues would degrade the codebase if merged
- The code would improve meaningfully from a suggested change and the change is straightforward to make
- CRITICAL: If a change would make the code better and it's reasonable to do before merge, it's a requested change, not a suggestion
Use **"APPROVE"** when:
- The PR is ready to merge as-is with no outstanding issues
- Previous feedback has been adequately addressed (e.g., author clarified logic in PR body, commit messages, or code comments)
- CRITICAL: If the code changes will break something after merge, do NOT approve
Use **"COMMENT"** only when ALL of these are true:
- The code won't break if merged as-is
- The suggestions are genuinely optional and there is a clear reason why each should NOT be addressed in this PR
- CRITICAL: If a suggestion would improve the code and is reasonable to implement, use REQUEST_CHANGES. Non-blocking feedback must have a justification.
**The bar for non-blocking feedback:**
The goal is to merge high-quality code. If your feedback would improve the code, and the improvement is within the scope of the PR and straightforward to implement, then it IS blocking and should use REQUEST_CHANGES. Non-blocking feedback should be reserved for items that genuinely warrant discussion about whether they should be addressed at all, or that would require work significantly beyond the PR's scope.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced the template with the full CI prompt verdict criteria in f97c75f. Now includes: style/convention violations, maintainability criteria for REQUEST_CHANGES; previous feedback and break-after-merge criteria for APPROVE; and the full "bar for non-blocking feedback" paragraph.

@nsheaps nsheaps force-pushed the claude/review-system-analysis-HieLF branch from ac53a62 to f97c75f Compare March 25, 2026 00:17
claude added 4 commits March 25, 2026 00:45
- Fix CDATA wrapper bug: restructured footnote instructions to make
  CDATA prohibition prominent and explicit (was buried in a confusing
  numbered list that mixed "include" and "exclude" items)
- Fix non-blocking feedback overuse: raised the bar for COMMENT verdicts
  to require justification. If a suggestion would improve code and is
  reasonable to implement, it's now REQUEST_CHANGES, not COMMENT.
  Non-blocking follow-up items must explain why they shouldn't be
  addressed in the current PR.

- Extract multi-agent review logic from project-level skill into
  shared/skills/self-review/ (symlinked into scm-utils plugin)
- Add parallel-review shared skill for Agent Teams fan-out reviews
  (8 concurrent teammates, one per quality dimension)
- Remove project-level .claude/skills/code-review/SKILL.md
- Update scm-utils code-review skill to reference self-review for
  local fallback
- Bump scm-utils to v0.2.0 (new skills, breaking: project skill removed)

- docs/review-system-analysis.md: comprehensive analysis of review
  system architecture, strengths, weaknesses, and improvement suggestions
- docs/review-system-v2-migration-guide.md: migration guide from v1
  to v2 with step-by-step instructions and architecture diagrams

https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM
- Replace arbitrary 94% scoring cap with "should reflect that no
  category achieved ✅ level" (removes magic number)
- Change >95% "just the table" instruction to "minimal (table + brief
  summary)" to avoid conflicting with CI prompt format requirements
- Sync prompt-template.md verdict criteria with CI prompt: add
  style/convention, maintainability criteria for REQUEST_CHANGES;
  add previous feedback and break-after-merge criteria for APPROVE;
  add "bar for non-blocking feedback" paragraph

https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM
- Remove parallel-review skill (167 lines) — experimental, can't
  run in CI, violates YAGNI
- Remove migration guide (246 lines) — over-engineered for an
  internal refactor
- Trim analysis doc — remove parallel-review refs, replace 7-item
  roadmap with brief note pointing to GitHub issues
- Fix scoring threshold inconsistency — align emoji boundaries
  (now 65/85) with badge color boundaries
- Add concrete scoring ceiling: max 84% overall when any ⚠️ exists
- Add scoring calibration rubric for sub-agents (0-39/40-64/65-84/85-89/90-100)
- De-duplicate verdict criteria from self-review — now references
  prompt template instead of maintaining a third copy
- De-duplicate design principles — brief reference instead of full list
- Update code-review entry point verdict table to v2 phrasing

Net: -497 lines removed, +22 added

https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM
@nsheaps nsheaps force-pushed the claude/review-system-analysis-HieLF branch from e5b94c5 to 1be3350 Compare March 25, 2026 00:45
The 189-line review system analysis doc is valuable documentation but
disproportionate to this PR's functional changes (bugfixes + skill
extraction). Moving it out of scope to keep this PR focused.
The analysis will be submitted as a separate documentation PR.

Also fixes stale "three overlapping locations" text that was already
edited in the working tree.

https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM
@nsheaps nsheaps force-pushed the claude/review-system-analysis-HieLF branch from f775b5c to 571d839 Compare March 25, 2026 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants