feat: review system v2 — shared skills, bugfixes, analysis by nsheaps · Pull Request #304 · nsheaps/ai-mktpl

nsheaps · 2026-03-24T19:33:33Z

Summary

Fix CDATA wrapper bug — restructured prompt instructions so the <![CDATA[...]]> prohibition is prominent and explicit
Fix non-blocking feedback overuse — raised the bar for COMMENT verdicts; suggestions that would improve code are now REQUEST_CHANGES, and non-blocking items require justification
Shared self-review skill — extracted multi-agent review logic into shared/skills/self-review/, symlinked into scm-utils
Scoring fixes — aligned emoji/badge thresholds (65/85), added calibration rubric, concrete scoring ceiling (84% max with warnings)
De-duplicated verdict criteria — self-review references prompt template instead of maintaining a third copy
Verdict criteria synced — plugin template now matches full CI prompt verdict section

Files Changed (8 files, +185/-68)

.github/prompts/claude-code-review.md — CDATA prohibition + verdict criteria restructured
plugins/scm-utils/.../prompt-template.md — verdict criteria synced from CI prompt
shared/skills/self-review/SKILL.md — NEW: shared multi-agent review procedure (107 lines)
plugins/scm-utils/skills/self-review/ — symlink to shared
plugins/scm-utils/skills/code-review/SKILL.md — verdict table updated to v2 phrasing
.claude/skills/code-review/SKILL.md — REMOVED (replaced by shared skill)
plugins/scm-utils/.claude-plugin/plugin.json — version bump to 0.2.0
.claude/.gitignore — ignore ephemeral pr-review reports

Test plan

Verify scm-utils plugin loads with new skill
Trigger a review on a test PR to confirm CDATA fix
Verify non-blocking feedback is stricter in review output
Test self-review skill invocation locally
Validate symlink resolves correctly after plugin install

https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM

github-actions · 2026-03-24T19:35:08Z

Plugin Version Status

Versions are auto-bumped in PRs. Manual bumps to higher versions are preserved.

Plugin	Base	Current	Action
scm-utils	0.1.14	0.2.0	Already bumped

henry-nsheaps

⚠️ Good improvements — a few issues to address before merge

⚠️ Prompt template diverges from CI prompt — missing verdict criteria that the CI prompt includes (see inline thread)
⚠️ self-review SKILL.md has conflicting/arbitrary scoring rules (lines 64-65) that clash with CI prompt format requirements
✅ CDATA prohibition is now explicit and prominent in both prompts
✅ Verdict criteria restructured into clear decision tree
✅ Non-blocking feedback bar raised appropriately with justification requirement
✅ Shared skills architecture with correct symlinks
✅ Version bump 0.1.13 → 0.2.0 appropriate for new features
✅ Analysis doc and migration guide are comprehensive and accurate
✅ All referenced workflow/action files verified to exist
✅ PR description accurately matches code changes

🖱️ Click to expand for full details

Code Quality (90%)

The PR makes well-structured improvements across the review system. The bugfixes (CDATA, verdict overuse) address real problems observed in production reviews. The shared skills architecture is a meaningful DRY improvement — extracting the multi-agent review logic from a project-level skill into shared/skills/ and symlinking into the plugin.

Strengths:

The verdict criteria restructuring (from flat CRITICAL lines to a structured decision tree) is significantly more readable and actionable
The footnote section cleanup (separating "must include" from "must not include") removes a confusing numbered list where items 3-5 were exclusions mixed into an inclusion list
The follow-up recommendation format now requires justification, which prevents lazy "non-blocking" feedback

Score rationale: 90% because the core changes are solid, but the template sync issue (see below) introduces a quality gap between the distributable template and the CI prompt.

Template Synchronization (Key Issue)

The CI prompt (.github/prompts/claude-code-review.md) now includes substantially more verdict guidance than the plugin template (prompt-template.md). Specifically, the template is missing:

Style/convention violation criteria for REQUEST_CHANGES
Maintainability/complexity criteria for REQUEST_CHANGES
The "previous feedback adequately addressed" criterion for APPROVE
The "bar for non-blocking feedback" paragraph
The APPROVE "if code changes will break something, do NOT approve" CRITICAL line

The analysis document (§3.1, §3.5) acknowledges this divergence as known debt, but this PR actively widens the gap by adding content to the CI prompt that isn't reflected in the template. See the inline comment on prompt-template.md for a concrete suggestion.

Shared Skills Architecture

The self-review and parallel-review skills are well-structured. The symlinks are correct:

plugins/scm-utils/skills/self-review/SKILL.md → ../../../../shared/skills/self-review/SKILL.md ✅
plugins/scm-utils/skills/parallel-review/SKILL.md → ../../../../shared/skills/parallel-review/SKILL.md ✅

The removed .claude/skills/code-review/SKILL.md content is fully captured in the new shared/skills/self-review/SKILL.md with improvements (verdict criteria, CDATA prohibition, inline comment conventions).

However, self-review/SKILL.md has two scoring rules (lines 64-65) that are problematic — see inline comments for details.

Documentation

Both new docs are comprehensive and accurate:

docs/review-system-analysis.md — thorough architecture overview with honest assessment of weaknesses
docs/review-system-v2-migration-guide.md — clear migration steps with before/after diagrams and rollback instructions

All component references in the analysis doc were verified against the actual filesystem.

Simplicity (85%)

The PR adds 899 lines across 11 files. The two documentation files (524 lines combined) are valuable as system documentation but add maintenance surface. The shared skills + symlink pattern is the right approach for DRY. Score reduced from 100% because the docs are comprehensive but lengthy for what is largely a prompt restructuring + architecture reorganization.

Confidence (85%)

High confidence in the prompt/template changes — the diffs are clear and the improvements are straightforward. Moderate confidence in how the shared skills will behave at runtime, since skills are interpreted by the agent and the scoring rules in self-review could produce unexpected interactions with the CI prompt's format requirements.

Recommended follow-ups (non-blocking — each item MUST explain why it shouldn't be addressed in this PR):

Unify CI prompt and template from a single source — requires architectural work (§5.1 of analysis) beyond this PR's scope; the analysis doc already tracks this as high priority
Add output validation for review markdown (strip CDATA, verify details/summary structure) — requires a new post-processing hook or action, tracked as §5.3 in analysis

Notes:¹²³

Workflow Run: https://github.com/nsheaps/ai-mktpl/actions/runs/23508568710/attempts/1 ↩
PR: #304 ↩
GitHub autolinked references: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/autolinked-references-and-urls ↩

henry-nsheaps · 2026-03-24T19:38:14Z

+
+When all agents complete, review each report. Compare results across dimensions to build a complete picture. Create one overall report including:
+
+- Score table with emoji indicators:


⚠️ This constraint ("maximum overall score is 94%") is arbitrary and may produce confusing behavior. An agent implementing this skill would need to artificially cap its score even if every individual category scores 85%+. Consider either:

Removing this rule and letting the synthesized score reflect the actual category scores

Documenting the rationale (e.g., "a warning in any category means the PR has room for improvement, so the overall should reflect that")

As-is, this creates a magic number without justification, making it harder for agents to reason about scoring consistently.

Suggested change

- Score table with emoji indicators:

- If any category has ⚠️, the overall score should reflect that no category achieved ✅ level

Applied the suggested change — replaced the arbitrary 94% cap with "the overall score should reflect that no category achieved ✅ level" in f97c75f.

henry-nsheaps · 2026-03-24T19:38:19Z

+When all agents complete, review each report. Compare results across dimensions to build a complete picture. Create one overall report including:
+
+- Score table with emoji indicators:
+  - 🚨 Score below 70%


⚠️ The "keep the final review to just the table" instruction at >95% conflicts with the CI prompt's requirement for <details>/<summary> blocks, footnotes, and badges in every review. An agent following both the CI prompt and this skill would get contradictory instructions.

Consider aligning this with the CI prompt's mandatory format, or adding a qualifier like "when used outside CI":

Suggested change

- 🚨 Score below 70%

- If overall score is >95%, the detailed section can be minimal (just the table and a brief summary)

Applied — changed to "the detailed section can be minimal (just the table and a brief summary)" to avoid conflicting with CI format requirements. f97c75f

henry-nsheaps · 2026-03-24T19:38:22Z

+
+### 3.1 Review Logic Split (Improved in v2)
+
+Previously, review instructions existed in three divergent places. In v2, the architecture was restructured:


✅ Good self-awareness here — this is the key architectural debt. The PR improves the situation (shared skills, DRY for review logic) but the CI prompt ↔ template divergence remains. The analysis correctly identifies this as a future improvement area rather than trying to solve it in this PR.

henry-nsheaps · 2026-03-24T19:38:37Z

+    - Security, performance, or correctness issues must be fixed before merging
+    - The code would improve meaningfully from a suggested change and the change is straightforward to make
+    - CRITICAL: If a change would make the code better and it's reasonable to do before merge, it's a requested change, not a suggestion
+
+    Use **"APPROVE"** when:
+    - The PR is ready to merge as-is with no outstanding issues
+
+    Use **"COMMENT"** only when ALL of these are true:
+    - The code won't break if merged as-is
+    - The suggestions are genuinely optional and there is a clear reason why each should NOT be addressed in this PR
+    - CRITICAL: If a suggestion would improve the code and is reasonable to implement, use REQUEST_CHANGES. Non-blocking feedback must have a justification.


⚠️ The CI prompt (.github/prompts/claude-code-review.md) has significantly more detail in its verdict section — including style/convention violations, maintainability criteria, and the full "bar for non-blocking feedback" paragraph. The template here is a stripped-down version.

This asymmetry means repos using the template will get a less rigorous review than this repo's CI prompt. If the template is the canonical distributable version, it should include the full verdict criteria. The extra ~10 lines from the CI prompt would make the template complete:

Suggested change

- Security, performance, or correctness issues must be fixed before merging

- The code would improve meaningfully from a suggested change and the change is straightforward to make

- CRITICAL: If a change would make the code better and it's reasonable to do before merge, it's a requested change, not a suggestion

Use **"APPROVE"** when:

- The PR is ready to merge as-is with no outstanding issues

Use **"COMMENT"** only when ALL of these are true:

- The code won't break if merged as-is

- The suggestions are genuinely optional and there is a clear reason why each should NOT be addressed in this PR

- CRITICAL: If a suggestion would improve the code and is reasonable to implement, use REQUEST_CHANGES. Non-blocking feedback must have a justification.

Use **"REQUEST_CHANGES"** when:

- Security, performance, or correctness issues must be fixed before merging

- Style or convention violations need correction before merging

- Maintainability or complexity issues would degrade the codebase if merged

- The code would improve meaningfully from a suggested change and the change is straightforward to make

- CRITICAL: If a change would make the code better and it's reasonable to do before merge, it's a requested change, not a suggestion

Use **"APPROVE"** when:

- The PR is ready to merge as-is with no outstanding issues

- Previous feedback has been adequately addressed (e.g., author clarified logic in PR body, commit messages, or code comments)

- CRITICAL: If the code changes will break something after merge, do NOT approve

Use **"COMMENT"** only when ALL of these are true:

- The code won't break if merged as-is

- The suggestions are genuinely optional and there is a clear reason why each should NOT be addressed in this PR

- CRITICAL: If a suggestion would improve the code and is reasonable to implement, use REQUEST_CHANGES. Non-blocking feedback must have a justification.

**The bar for non-blocking feedback:**

The goal is to merge high-quality code. If your feedback would improve the code, and the improvement is within the scope of the PR and straightforward to implement, then it IS blocking and should use REQUEST_CHANGES. Non-blocking feedback should be reserved for items that genuinely warrant discussion about whether they should be addressed at all, or that would require work significantly beyond the PR's scope.

Synced the template with the full CI prompt verdict criteria in f97c75f. Now includes: style/convention violations, maintainability criteria for REQUEST_CHANGES; previous feedback and break-after-merge criteria for APPROVE; and the full "bar for non-blocking feedback" paragraph.

- Fix CDATA wrapper bug: restructured footnote instructions to make CDATA prohibition prominent and explicit (was buried in a confusing numbered list that mixed "include" and "exclude" items) - Fix non-blocking feedback overuse: raised the bar for COMMENT verdicts to require justification. If a suggestion would improve code and is reasonable to implement, it's now REQUEST_CHANGES, not COMMENT. Non-blocking follow-up items must explain why they shouldn't be addressed in the current PR. - Extract multi-agent review logic from project-level skill into shared/skills/self-review/ (symlinked into scm-utils plugin) - Add parallel-review shared skill for Agent Teams fan-out reviews (8 concurrent teammates, one per quality dimension) - Remove project-level .claude/skills/code-review/SKILL.md - Update scm-utils code-review skill to reference self-review for local fallback - Bump scm-utils to v0.2.0 (new skills, breaking: project skill removed) - docs/review-system-analysis.md: comprehensive analysis of review system architecture, strengths, weaknesses, and improvement suggestions - docs/review-system-v2-migration-guide.md: migration guide from v1 to v2 with step-by-step instructions and architecture diagrams https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM

- Replace arbitrary 94% scoring cap with "should reflect that no category achieved ✅ level" (removes magic number) - Change >95% "just the table" instruction to "minimal (table + brief summary)" to avoid conflicting with CI prompt format requirements - Sync prompt-template.md verdict criteria with CI prompt: add style/convention, maintainability criteria for REQUEST_CHANGES; add previous feedback and break-after-merge criteria for APPROVE; add "bar for non-blocking feedback" paragraph https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM

- Remove parallel-review skill (167 lines) — experimental, can't run in CI, violates YAGNI - Remove migration guide (246 lines) — over-engineered for an internal refactor - Trim analysis doc — remove parallel-review refs, replace 7-item roadmap with brief note pointing to GitHub issues - Fix scoring threshold inconsistency — align emoji boundaries (now 65/85) with badge color boundaries - Add concrete scoring ceiling: max 84% overall when any ⚠️ exists - Add scoring calibration rubric for sub-agents (0-39/40-64/65-84/85-89/90-100) - De-duplicate verdict criteria from self-review — now references prompt template instead of maintaining a third copy - De-duplicate design principles — brief reference instead of full list - Update code-review entry point verdict table to v2 phrasing Net: -497 lines removed, +22 added https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM

https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM

The 189-line review system analysis doc is valuable documentation but disproportionate to this PR's functional changes (bugfixes + skill extraction). Moving it out of scope to keep this PR focused. The analysis will be submitted as a separate documentation PR. Also fixes stale "three overlapping locations" text that was already edited in the working tree. https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM

nsheaps added the request-review Request a one-time review from the Claude review bot (label is removed after review starts) label Mar 24, 2026 — with Claude

henry-nsheaps Bot removed the request-review Request a one-time review from the Claude review bot (label is removed after review starts) label Mar 24, 2026

henry-nsheaps Bot requested changes Mar 24, 2026

View reviewed changes

nsheaps force-pushed the claude/review-system-analysis-HieLF branch from ac53a62 to f97c75f Compare March 25, 2026 00:17

claude added 4 commits March 25, 2026 00:45

chore: remove ephemeral review reports from tracked files

1be3350

https://claude.ai/code/session_01RB1y75VoeNNB7sucmusnmM

nsheaps force-pushed the claude/review-system-analysis-HieLF branch from e5b94c5 to 1be3350 Compare March 25, 2026 00:45

nsheaps force-pushed the claude/review-system-analysis-HieLF branch from f775b5c to 571d839 Compare March 25, 2026 00:49

nsheaps and others added 2 commits March 25, 2026 00:50

chore: mise run lint

d7a1bc6

chore: auto-bump plugin versions and update marketplace

5a5c187

automation-nsheaps Bot mentioned this pull request Mar 25, 2026

2026-03-24 Daily Report #324

Open

chore: merge main

095b31f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: review system v2 — shared skills, bugfixes, analysis#304

feat: review system v2 — shared skills, bugfixes, analysis#304
nsheaps wants to merge 8 commits intomainfrom
claude/review-system-analysis-HieLF

nsheaps commented Mar 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

henry-nsheaps Bot left a comment

Uh oh!

henry-nsheaps Bot Mar 24, 2026

Uh oh!

nsheaps Mar 25, 2026

Uh oh!

henry-nsheaps Bot Mar 24, 2026

Uh oh!

nsheaps Mar 25, 2026

Uh oh!

henry-nsheaps Bot Mar 24, 2026

Uh oh!

henry-nsheaps Bot Mar 24, 2026

Uh oh!

nsheaps Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		When all agents complete, review each report. Compare results across dimensions to build a complete picture. Create one overall report including:

		- Score table with emoji indicators:

	- Score table with emoji indicators:
	- If any category has ⚠️, the overall score should reflect that no category achieved ✅ level

	- 🚨 Score below 70%
	- If overall score is >95%, the detailed section can be minimal (just the table and a brief summary)


		### 3.1 Review Logic Split (Improved in v2)

		Previously, review instructions existed in three divergent places. In v2, the architecture was restructured:

Conversation

nsheaps commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files Changed (8 files, +185/-68)

Test plan

Uh oh!

github-actions Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Plugin Version Status

Uh oh!

henry-nsheaps Bot left a comment

Choose a reason for hiding this comment

⚠️ Good improvements — a few issues to address before merge

Code Quality (90%)

Template Synchronization (Key Issue)

Shared Skills Architecture

Documentation

Simplicity (85%)

Confidence (85%)

Footnotes

Uh oh!

henry-nsheaps Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

nsheaps Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

henry-nsheaps Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

nsheaps Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

henry-nsheaps Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

henry-nsheaps Bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

nsheaps Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nsheaps commented Mar 24, 2026 •

edited

Loading

github-actions Bot commented Mar 24, 2026 •

edited

Loading