Skip to content

Commit 6ddc1c7

Browse files
authored
chore: add Claude skills (#260)
This PR adds some Claude skills that support the review and formatting of the solution files.
1 parent 7171a97 commit 6ddc1c7

File tree

3 files changed

+478
-0
lines changed

3 files changed

+478
-0
lines changed
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
---
2+
name: code-review
3+
description: Performs thorough code reviews of individual human-eval problems. Use when asked explicity to review code or analyze code quality.
4+
---
5+
6+
# Code Review
7+
8+
Conduct systematic, thorough code reviews that examine multiple dimensions of code quality.
9+
10+
## Review Process
11+
12+
Review the human-eval problem/solution the user told you to review. The purpose of the solution is to assess and showcase Lean's aptitude for writing an verifying everyday programming tasks. The solutions should be understandable for outsiders that have not done much work in Lean and ideally convince them that Lean is a good choice for writing and verifying software. In the best case, the elegance and simplicity of the solutions enabled by Lean should baffle them.
13+
14+
A second purpose is that implementing these solutions will turn up missing API and API defects in the standard library.
15+
16+
### 1. **Understand the Context**
17+
- Read the whole file that contains the problem. Note that the prompt at the end of the file has been given and is relevant context but not subject of the review itself.
18+
- Ask questions if needed.
19+
20+
### 2. **File Structure**
21+
22+
Check that the file follows the project's structural conventions:
23+
24+
- **Section organization**: Are the standard sections present and in the correct order?
25+
- Implementation section: `/-! ## Implementation -/`
26+
- Tests section: `/-! ## Tests -/`
27+
- Missing API section (if needed): `/-! ## Missing API -/`
28+
- Verification section: `/-! ## Verification -/`
29+
- Prompt section: `/-! ## Prompt ... -/` (auto-generated, immutable)
30+
Notable exception: If the implementation depends on declarations from the Missing API section, then the Missing API section naturally comes first.
31+
32+
- **Missing API lemmas**: This is a key structural requirement. Helper lemmas that would naturally belong in Lean's standard library MUST be:
33+
- Placed in a dedicated `/-! ## Missing API -/` section (typically between Tests and Verification)
34+
- Properly namespaced with standard library types (e.g., `List.someHelper`, `Nat.someProperty`, `Array.someTheorem`)
35+
- Look specifically for lemmas about `List`, `Array`, `Nat`, `String`, `Option`, and other core types
36+
- This documentation is essential for identifying gaps in Lean's out-of-the-box verification capabilities
37+
38+
- **Section headers**: Are all major sections clearly marked with `/-! ## Section Name -/`?
39+
40+
- **Subsections**: Within Verification, are there helpful subsections (e.g., `/-! ### Loop invariants -/`, `/-! ### Main theorems -/`) that guide the reader?
41+
42+
### 3. **Documentation and Narrative**
43+
44+
- **Module docstrings**: Does the file have clear section headers (`/-! ... -/`) that guide the reader through the solution?
45+
- **Narrative flow**: Do the docstrings create a coherent story from problem statement → preliminaries → implementation → tests → verification?
46+
- **Section ordering**: Are sections presented in a logical learning order (e.g., simpler approaches before complex ones)?
47+
- **Pedagogical clarity**: Would a reader unfamiliar with the codebase understand the approach and motivation?
48+
- **Redundancy**: Are there redundant or contradictory explanations across different docstrings?
49+
- **Completeness**: Are key design decisions, trade-offs, or unusual techniques explained?
50+
51+
### 4. **Style and readability**
52+
53+
- Do the changes adhere to [standard library style](https://github.com/leanprover/lean4/blob/master/doc/std/style.md)?
54+
- **Naming**: Are variable/function names clear and descriptive?
55+
- **Structure**: Is the code organized logically? Are functions too large? Is the file structured in a way that lends itself to a single comfortable read-through?
56+
- **Comments**: Are complex sections explained? Are comments accurate?
57+
- **Consistency**: Does code follow the project's conventions?
58+
- **Clarity**: Could an outsider that has worked with other theorem provers before understand the code, or at least what it does, quickly?
59+
- **Design**: Is the implementation and verification approach appropriate? Are there simpler solutions?
60+
61+
**Lemma style guidelines**
62+
63+
- Are the lemmas' names consistent and descriptive?
64+
- Do the lemmas' names adhere to the [naming convention](https://github.com/leanprover/lean4/blob/master/doc/std/naming.md)?
65+
- Do the lemmas avoid non-terminal `simp` and `simp_all`? Note: `simp only` and `simp_all only` is fine, `simp; omega` too.
66+
- Are lemma arguments implicit where this makes sense?
67+
68+
### 5. **Correctness**
69+
70+
- Are there tests for the Lean implementation? Do the tests match the tests in the prompt sections?
71+
- Does the solution actually solve the problem in the prompt as instructed?
72+
- Do the verification lemmas (excluding lemmas that are pure helpers with technical and difficult statements) uniquely determine the solution's behavior?
73+
- Are the lemmas that seem to be intended as missing API plausible and consistent additions to Lean's standard library?
74+
- Does the code avoid using `sorry` and `admit`?
75+
- Check terminology, grammar and orthography (function names, variable names, comments, docstrings)
76+
77+
**CRITICAL: Semantic Interpretation & Verification Gaps**
78+
79+
This is one of the most important aspects of the review. Carefully examine whether the formal specification matches the informal prompt:
80+
81+
- **Semantic mismatches**: Does the formalization interpret the problem differently than the prompt describes it? Examples:
82+
- Prompt says "closed intervals [a, b]" but code uses half-open ranges [a, b)
83+
- Prompt says "length" meaning element count but code computes difference
84+
- Prompt says "empty list" but code uses Option.none
85+
- Prompt describes 1-indexed positions but code uses 0-indexed
86+
87+
- **Undocumented semantic choices**: Even if the formalization is mathematically equivalent, are non-obvious interpretation choices explained? A reader should understand:
88+
- Why a particular data structure was chosen
89+
- How the formal model relates to the informal description
90+
- What assumptions or simplifications were made
91+
92+
- **Verification gap**: Is there a gap between what the prompt asks for and what the theorems prove? Could a bug hide in this gap?
93+
94+
- **Severity guidelines for semantic issues**:
95+
- **High**: Semantic mismatch that could hide bugs or make verification meaningless (e.g., proving properties of the wrong data structure)
96+
- **Medium**: Undocumented semantic choice that works correctly but would confuse readers unfamiliar with the approach
97+
- **Low**: Minor interpretation difference that's obvious from context
98+
99+
When you find semantic interpretation differences, ALWAYS classify them under "Correctness/Verification Gaps" in your review, not under "Design" or other categories.
100+
101+
### 6. **Performance and Efficiency**
102+
103+
Solutions should be reasonably efficient.
104+
105+
- **Algorithms**: Are appropriate algorithms used? Any O(n²) where O(n) is possible?
106+
- **Resource usage**: Are resources allocated and freed properly?
107+
- **Redundancy**: Is there unnecessary duplication or recomputation?
108+
109+
### 7. **Overall rating**
110+
111+
Evaluate how well the solution, and its verification, match their purpose of being compelling examples of simple execution and formal verification of small programming tasks.
112+
113+
## Review Structure
114+
115+
Present your review organized by category:
116+
117+
```markdown
118+
## Summary
119+
Brief overview of the changes and overall assessment.
120+
121+
## ✅ Strengths
122+
What the code does well.
123+
124+
## 🔍 Issues Found
125+
126+
### File Structure
127+
- [Severity] Description (missing sections, missing API not separated, incorrect section ordering)
128+
129+
### Correctness/Verification Gaps
130+
- [Severity] Description (semantic mismatches, undocumented interpretation choices, verification gaps)
131+
132+
### Documentation
133+
- [Severity] Description (unclear section headers, poor narrative flow, redundant explanations)
134+
135+
### Style/Readability
136+
- [Severity] Description
137+
138+
### Performance
139+
- [Severity] Description
140+
141+
### Design
142+
- [Severity] Description
143+
144+
## 🤔 Questions
145+
Clarifications needed about intent or design decisions.
146+
147+
## 📋 Suggestions (Optional)
148+
Nice-to-haves that aren't blocking.
149+
150+
## Overall Assessment
151+
Final recommendation: Approve / Request Changes / Comment
152+
```
153+
154+
## Severity Levels
155+
156+
- **Critical**: Causes incorrect behavior, security vulnerability, data loss, or completely invalidates the verification
157+
- **High**:
158+
- Semantic mismatches between prompt and formalization that could hide bugs
159+
- Significant verification gaps where theorems don't prove what the prompt asks
160+
- Significant design flaws or performance issues
161+
- Maintainability concerns
162+
- Missing API lemmas not separated into their own section
163+
- **Medium**:
164+
- Undocumented semantic interpretation choices that work correctly but confuse readers
165+
- Code quality issues or readability problems
166+
- Non-obvious bugs
167+
- Missing or unclear section headers
168+
- **Low**:
169+
- Minor style issues
170+
- Documentation improvements
171+
- Suggestions for enhancement
172+
173+
## Tips for Effective Reviews
174+
175+
1. **Be constructive**: Focus on the code, not the person. Use "this could be clearer" not "you wrote this unclearly"
176+
3. **Ask questions**: If you don't understand, ask. That's a code clarity issue
177+
5. **Know when to stop**: Don't demand perfection; distinguish between blocking and nice-to-have
178+
6. **Reference standards**: Point to style guides, documentation, or existing patterns when suggesting changes
179+
7. **Test mentally**: Walk through the code with example inputs to catch edge cases *where appropriate*
180+
8. **Read as a newcomer**: When reviewing documentation, imagine you're encountering the code for the first time. Are section headers clear? Does the narrative flow? Would you understand the motivation and approach?
181+
9. **Check file structure first**: Before diving into code details, verify the file has the correct sections in the right order, and that missing API lemmas are properly separated.
182+
10. **Scrutinize semantics**: Always read the prompt carefully and compare it line-by-line with the formalization. Look for:
183+
- Different data structure interpretations (arrays vs lists, closed vs half-open intervals)
184+
- Different counting conventions (0-indexed vs 1-indexed, inclusive vs exclusive)
185+
- Different notions of "length", "size", "count", "empty", etc.
186+
- Any implicit assumptions in the formalization not stated in the prompt
187+
Even if mathematically equivalent, these differences should be documented and flagged appropriately.
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
---
2+
name: restructure-solutions
3+
description: Restructures a Lean file with one or many solution implementations so that each solution is self-contained with its own Implementation, Tests, and Verification sections.
4+
---
5+
6+
# Restructure Solutions
7+
8+
This skill takes a Lean file of a human-eval problem that has multiple solution variants (like `solution`, `solution'`, `solution''`) and restructures it so that:
9+
10+
1. Each solution gets its own section group:
11+
- `## Implementation`
12+
- `## Tests`
13+
- `## Missing API` (only if there are any lemmas/attributes/... that were added because they are missing in the standard library; usually those prefixed with `List`, `Nat`, etc.)
14+
- `## Verification`
15+
If there are many solutions, these three sections exist for every solution and are numbered.
16+
17+
2. Solutions are no longer interleaved - each is completely self-contained
18+
19+
3. The original Prompt section (if present) is preserved at the end
20+
21+
## Examples
22+
23+
**Single solution before:**
24+
```lean
25+
def solution := ...
26+
27+
example : solution ... := ...
28+
29+
theorem solution_spec := ...
30+
```
31+
32+
**Single solution after:**
33+
```lean
34+
## Implementation
35+
def solution := ...
36+
37+
## Tests
38+
example : solution ... := ...
39+
40+
## Verification
41+
theorem solution_spec := ...
42+
```
43+
44+
**Multiple solutions before:**
45+
```lean
46+
def solution := ...
47+
def solution' := ...
48+
def solution'' := ...
49+
50+
example : solution ... := ...
51+
example : solution' ... := ...
52+
example : solution'' ... := ...
53+
54+
theorem solution_spec := ...
55+
theorem solution'_spec := ...
56+
```
57+
58+
**Multiple solutions after:**
59+
```lean
60+
## Implementation 1
61+
def solution := ...
62+
63+
## Tests 1
64+
example : solution ... := ...
65+
66+
## Verification 1
67+
theorem solution_spec := ...
68+
69+
## Implementation 2
70+
def solution' := ...
71+
72+
## Tests 2
73+
example : solution' ... := ...
74+
75+
## Verification 2
76+
theorem solution'_spec := ...
77+
78+
## Implementation 3
79+
def solution'' := ...
80+
81+
## Tests 3
82+
example : solution'' ... := ...
83+
84+
## Verification 3
85+
...
86+
```
87+
88+
## Instructions for Claude
89+
90+
1. Read the file specified by the user
91+
2. Identify all solution definitions (typically named according to the function in the prompt at the end of the file)
92+
3. For each solution variant:
93+
- Extract the implementation definition
94+
- Extract all test examples that reference that solution
95+
- Extract all verification theorems/lemmas that reference that solution
96+
- Group them into sections that are numbered ONLY IF there is more than one solution (Implementation N, Tests N, Verification N)
97+
4. Preserve any helper lemmas/theorems in the Verification section where they're first needed
98+
5. Keep the Prompt section (and Canonical solution, Tests sections) at the very end, unchanged
99+
6. Maintain all imports, opens, and options at the top of the file
100+
7. Use the Edit tool to restructure the file with the new organization
101+
8. Double-check that you have added the necessary tests
102+
103+
## Notes
104+
105+
- Use `native_decide` for tests involving iterators or complex computations that don't reduce well
106+
- Use `decide` for tests involving simple recursive functions or monadic code
107+
- Preserve all comments, attributes, and proof code exactly as they are
108+
- Helper lemmas that are used across multiple solutions should stay in the first Verification section where they appear
109+
- Problem number X can be found in `HumanEvalLean/HumanEvalX.lean`
110+
- If in doubt, check `HumanEvalLean/HumanEval114.lean` for a reference

0 commit comments

Comments
 (0)