|
| 1 | +--- |
| 2 | +name: code-review |
| 3 | +description: Performs thorough code reviews of individual human-eval problems. Use when asked explicity to review code or analyze code quality. |
| 4 | +--- |
| 5 | + |
| 6 | +# Code Review |
| 7 | + |
| 8 | +Conduct systematic, thorough code reviews that examine multiple dimensions of code quality. |
| 9 | + |
| 10 | +## Review Process |
| 11 | + |
| 12 | +Review the human-eval problem/solution the user told you to review. The purpose of the solution is to assess and showcase Lean's aptitude for writing an verifying everyday programming tasks. The solutions should be understandable for outsiders that have not done much work in Lean and ideally convince them that Lean is a good choice for writing and verifying software. In the best case, the elegance and simplicity of the solutions enabled by Lean should baffle them. |
| 13 | + |
| 14 | +A second purpose is that implementing these solutions will turn up missing API and API defects in the standard library. |
| 15 | + |
| 16 | +### 1. **Understand the Context** |
| 17 | +- Read the whole file that contains the problem. Note that the prompt at the end of the file has been given and is relevant context but not subject of the review itself. |
| 18 | +- Ask questions if needed. |
| 19 | + |
| 20 | +### 2. **File Structure** |
| 21 | + |
| 22 | +Check that the file follows the project's structural conventions: |
| 23 | + |
| 24 | +- **Section organization**: Are the standard sections present and in the correct order? |
| 25 | + - Implementation section: `/-! ## Implementation -/` |
| 26 | + - Tests section: `/-! ## Tests -/` |
| 27 | + - Missing API section (if needed): `/-! ## Missing API -/` |
| 28 | + - Verification section: `/-! ## Verification -/` |
| 29 | + - Prompt section: `/-! ## Prompt ... -/` (auto-generated, immutable) |
| 30 | + Notable exception: If the implementation depends on declarations from the Missing API section, then the Missing API section naturally comes first. |
| 31 | + |
| 32 | +- **Missing API lemmas**: This is a key structural requirement. Helper lemmas that would naturally belong in Lean's standard library MUST be: |
| 33 | + - Placed in a dedicated `/-! ## Missing API -/` section (typically between Tests and Verification) |
| 34 | + - Properly namespaced with standard library types (e.g., `List.someHelper`, `Nat.someProperty`, `Array.someTheorem`) |
| 35 | + - Look specifically for lemmas about `List`, `Array`, `Nat`, `String`, `Option`, and other core types |
| 36 | + - This documentation is essential for identifying gaps in Lean's out-of-the-box verification capabilities |
| 37 | + |
| 38 | +- **Section headers**: Are all major sections clearly marked with `/-! ## Section Name -/`? |
| 39 | + |
| 40 | +- **Subsections**: Within Verification, are there helpful subsections (e.g., `/-! ### Loop invariants -/`, `/-! ### Main theorems -/`) that guide the reader? |
| 41 | + |
| 42 | +### 3. **Documentation and Narrative** |
| 43 | + |
| 44 | +- **Module docstrings**: Does the file have clear section headers (`/-! ... -/`) that guide the reader through the solution? |
| 45 | +- **Narrative flow**: Do the docstrings create a coherent story from problem statement → preliminaries → implementation → tests → verification? |
| 46 | +- **Section ordering**: Are sections presented in a logical learning order (e.g., simpler approaches before complex ones)? |
| 47 | +- **Pedagogical clarity**: Would a reader unfamiliar with the codebase understand the approach and motivation? |
| 48 | +- **Redundancy**: Are there redundant or contradictory explanations across different docstrings? |
| 49 | +- **Completeness**: Are key design decisions, trade-offs, or unusual techniques explained? |
| 50 | + |
| 51 | +### 4. **Style and readability** |
| 52 | + |
| 53 | +- Do the changes adhere to [standard library style](https://github.com/leanprover/lean4/blob/master/doc/std/style.md)? |
| 54 | +- **Naming**: Are variable/function names clear and descriptive? |
| 55 | +- **Structure**: Is the code organized logically? Are functions too large? Is the file structured in a way that lends itself to a single comfortable read-through? |
| 56 | +- **Comments**: Are complex sections explained? Are comments accurate? |
| 57 | +- **Consistency**: Does code follow the project's conventions? |
| 58 | +- **Clarity**: Could an outsider that has worked with other theorem provers before understand the code, or at least what it does, quickly? |
| 59 | +- **Design**: Is the implementation and verification approach appropriate? Are there simpler solutions? |
| 60 | + |
| 61 | +**Lemma style guidelines** |
| 62 | + |
| 63 | +- Are the lemmas' names consistent and descriptive? |
| 64 | +- Do the lemmas' names adhere to the [naming convention](https://github.com/leanprover/lean4/blob/master/doc/std/naming.md)? |
| 65 | +- Do the lemmas avoid non-terminal `simp` and `simp_all`? Note: `simp only` and `simp_all only` is fine, `simp; omega` too. |
| 66 | +- Are lemma arguments implicit where this makes sense? |
| 67 | + |
| 68 | +### 5. **Correctness** |
| 69 | + |
| 70 | +- Are there tests for the Lean implementation? Do the tests match the tests in the prompt sections? |
| 71 | +- Does the solution actually solve the problem in the prompt as instructed? |
| 72 | +- Do the verification lemmas (excluding lemmas that are pure helpers with technical and difficult statements) uniquely determine the solution's behavior? |
| 73 | +- Are the lemmas that seem to be intended as missing API plausible and consistent additions to Lean's standard library? |
| 74 | +- Does the code avoid using `sorry` and `admit`? |
| 75 | +- Check terminology, grammar and orthography (function names, variable names, comments, docstrings) |
| 76 | + |
| 77 | +**CRITICAL: Semantic Interpretation & Verification Gaps** |
| 78 | + |
| 79 | +This is one of the most important aspects of the review. Carefully examine whether the formal specification matches the informal prompt: |
| 80 | + |
| 81 | +- **Semantic mismatches**: Does the formalization interpret the problem differently than the prompt describes it? Examples: |
| 82 | + - Prompt says "closed intervals [a, b]" but code uses half-open ranges [a, b) |
| 83 | + - Prompt says "length" meaning element count but code computes difference |
| 84 | + - Prompt says "empty list" but code uses Option.none |
| 85 | + - Prompt describes 1-indexed positions but code uses 0-indexed |
| 86 | + |
| 87 | +- **Undocumented semantic choices**: Even if the formalization is mathematically equivalent, are non-obvious interpretation choices explained? A reader should understand: |
| 88 | + - Why a particular data structure was chosen |
| 89 | + - How the formal model relates to the informal description |
| 90 | + - What assumptions or simplifications were made |
| 91 | + |
| 92 | +- **Verification gap**: Is there a gap between what the prompt asks for and what the theorems prove? Could a bug hide in this gap? |
| 93 | + |
| 94 | +- **Severity guidelines for semantic issues**: |
| 95 | + - **High**: Semantic mismatch that could hide bugs or make verification meaningless (e.g., proving properties of the wrong data structure) |
| 96 | + - **Medium**: Undocumented semantic choice that works correctly but would confuse readers unfamiliar with the approach |
| 97 | + - **Low**: Minor interpretation difference that's obvious from context |
| 98 | + |
| 99 | +When you find semantic interpretation differences, ALWAYS classify them under "Correctness/Verification Gaps" in your review, not under "Design" or other categories. |
| 100 | + |
| 101 | +### 6. **Performance and Efficiency** |
| 102 | + |
| 103 | +Solutions should be reasonably efficient. |
| 104 | + |
| 105 | +- **Algorithms**: Are appropriate algorithms used? Any O(n²) where O(n) is possible? |
| 106 | +- **Resource usage**: Are resources allocated and freed properly? |
| 107 | +- **Redundancy**: Is there unnecessary duplication or recomputation? |
| 108 | + |
| 109 | +### 7. **Overall rating** |
| 110 | + |
| 111 | +Evaluate how well the solution, and its verification, match their purpose of being compelling examples of simple execution and formal verification of small programming tasks. |
| 112 | + |
| 113 | +## Review Structure |
| 114 | + |
| 115 | +Present your review organized by category: |
| 116 | + |
| 117 | +```markdown |
| 118 | +## Summary |
| 119 | +Brief overview of the changes and overall assessment. |
| 120 | + |
| 121 | +## ✅ Strengths |
| 122 | +What the code does well. |
| 123 | + |
| 124 | +## 🔍 Issues Found |
| 125 | + |
| 126 | +### File Structure |
| 127 | +- [Severity] Description (missing sections, missing API not separated, incorrect section ordering) |
| 128 | + |
| 129 | +### Correctness/Verification Gaps |
| 130 | +- [Severity] Description (semantic mismatches, undocumented interpretation choices, verification gaps) |
| 131 | + |
| 132 | +### Documentation |
| 133 | +- [Severity] Description (unclear section headers, poor narrative flow, redundant explanations) |
| 134 | + |
| 135 | +### Style/Readability |
| 136 | +- [Severity] Description |
| 137 | + |
| 138 | +### Performance |
| 139 | +- [Severity] Description |
| 140 | + |
| 141 | +### Design |
| 142 | +- [Severity] Description |
| 143 | + |
| 144 | +## 🤔 Questions |
| 145 | +Clarifications needed about intent or design decisions. |
| 146 | + |
| 147 | +## 📋 Suggestions (Optional) |
| 148 | +Nice-to-haves that aren't blocking. |
| 149 | + |
| 150 | +## Overall Assessment |
| 151 | +Final recommendation: Approve / Request Changes / Comment |
| 152 | +``` |
| 153 | + |
| 154 | +## Severity Levels |
| 155 | + |
| 156 | +- **Critical**: Causes incorrect behavior, security vulnerability, data loss, or completely invalidates the verification |
| 157 | +- **High**: |
| 158 | + - Semantic mismatches between prompt and formalization that could hide bugs |
| 159 | + - Significant verification gaps where theorems don't prove what the prompt asks |
| 160 | + - Significant design flaws or performance issues |
| 161 | + - Maintainability concerns |
| 162 | + - Missing API lemmas not separated into their own section |
| 163 | +- **Medium**: |
| 164 | + - Undocumented semantic interpretation choices that work correctly but confuse readers |
| 165 | + - Code quality issues or readability problems |
| 166 | + - Non-obvious bugs |
| 167 | + - Missing or unclear section headers |
| 168 | +- **Low**: |
| 169 | + - Minor style issues |
| 170 | + - Documentation improvements |
| 171 | + - Suggestions for enhancement |
| 172 | + |
| 173 | +## Tips for Effective Reviews |
| 174 | + |
| 175 | +1. **Be constructive**: Focus on the code, not the person. Use "this could be clearer" not "you wrote this unclearly" |
| 176 | +3. **Ask questions**: If you don't understand, ask. That's a code clarity issue |
| 177 | +5. **Know when to stop**: Don't demand perfection; distinguish between blocking and nice-to-have |
| 178 | +6. **Reference standards**: Point to style guides, documentation, or existing patterns when suggesting changes |
| 179 | +7. **Test mentally**: Walk through the code with example inputs to catch edge cases *where appropriate* |
| 180 | +8. **Read as a newcomer**: When reviewing documentation, imagine you're encountering the code for the first time. Are section headers clear? Does the narrative flow? Would you understand the motivation and approach? |
| 181 | +9. **Check file structure first**: Before diving into code details, verify the file has the correct sections in the right order, and that missing API lemmas are properly separated. |
| 182 | +10. **Scrutinize semantics**: Always read the prompt carefully and compare it line-by-line with the formalization. Look for: |
| 183 | + - Different data structure interpretations (arrays vs lists, closed vs half-open intervals) |
| 184 | + - Different counting conventions (0-indexed vs 1-indexed, inclusive vs exclusive) |
| 185 | + - Different notions of "length", "size", "count", "empty", etc. |
| 186 | + - Any implicit assumptions in the formalization not stated in the prompt |
| 187 | + Even if mathematically equivalent, these differences should be documented and flagged appropriately. |
0 commit comments