Skip to content

Commit 117f0df

Browse files
eiennohitoclaude
andcommitted
agent: introduce CLAUDE.md and modernize build docs
Add CLAUDE.md so cold-start agent sessions pick up project context without re-deriving it: work modes (evolve as default, patch as exceptional), subsystem map, build/test commands, and style rules. Bundle the supporting docs changes in the same commit since agentic coding is defined by the documentation it points to: - CMakeLists.txt: cmake_minimum_required 3.10 -> 3.13, matching what the modern -S/-B invocation needs. - README.md: CMake requirement 3.1 -> 3.13, and modernized build snippets to cmake -S/-B, cmake --build, cmake --install. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 50e600a commit 117f0df

File tree

3 files changed

+116
-14
lines changed

3 files changed

+116
-14
lines changed

CLAUDE.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# General
2+
3+
> **Meta-rule**: Rules in this file include their rationale. Rules with rationales are followed more reliably and transfer better to novel situations.
4+
- **Documentation is dual-use**: Agents start each session with no memory. All docs under `docs/` serve both humans and agents — include exact commands, paths, and rationales for non-obvious steps.
5+
- **Do not be a yes-man**: Humans make bad decisions and forget to tell the whole picture. Ask the user to clarify and use data to improve your decisions.
6+
- **Markdown line style**: One sentence per line, no hard-wrapping at 80 columns. Sentence-per-line keeps diffs readable and lets editors soft-wrap to the viewer's width.
7+
8+
# Codebase Stage & Work Modes
9+
10+
jumanpp v2 is a released, mature C++ morphological analyzer — **but the codebase is under active evolution**.
11+
Treat existing code as *provisional where it serves the new direction* and *load-bearing where downstream users depend on it* (CLI, output format, model-file compatibility with released tarballs).
12+
13+
**Continuous refactor is the correct default, not the exception.**
14+
Past experience with minimum-diff / patch-mode on this codebase produced technical-debt accumulation that later refactors had to pay off at higher cost.
15+
"Prefer minimal edits" means: minimize comprehension cost for session N+5, not line count in this diff.
16+
17+
**Work modes** (user sets at session start or switches mid-session):
18+
- **Evolve** (~80%): evolve domain model and codebase toward correct modeling. Refactors and rewrites welcome, including revolutionary ones. **Default mode.**
19+
- **Analyze** (~15%): read code/logs, make plans, no code changes.
20+
- **Meta** (~5%): improve interaction workflows (this file, `docs/`).
21+
- **Patch**: minimal diff for a specific problem. **Never assume this mode** — user must request it explicitly. Defensive minimalism on this codebase has historically produced debt, not safety.
22+
23+
**Design rules** (evolve mode):
24+
- Domain objects over god services. If logic only needs one object's data, it belongs on that object.
25+
- Make invalid states non-representable. If two values are meaningless without each other, they're one type. If a pipeline has stages, the stage outputs are types.
26+
- Concepts map to domain objects. Nouns are types. Verbs can be both methods and types.
27+
28+
# Project Rules & Guidelines
29+
30+
## Environment & Configuration
31+
- **Git Protocol**: User intent is always partial; they edit files between turns. Never commit until triggered. Inspect actual state when committing.
32+
- **Commit Prefixes** (match existing log style):
33+
- `fix:` bug fix
34+
- `build:` CMake / dependency / packaging
35+
- `refactor:` structural change, no behavior
36+
- `feat:` new user-visible capability
37+
- `docs:` human-and-agent system knowledge under `docs/` or README
38+
- `agent:` changes to `CLAUDE.md` or agent-only artifacts
39+
- `test:` test-only changes
40+
41+
## Workflow & Planning
42+
- **Hypothesis Protocol**: When investigating:
43+
1. State multiple conflicting hypotheses. A single hypothesis is a conclusion in disguise.
44+
2. State them to the user before running off to validate for 10 minutes.
45+
3. Persist what survives in `docs/` (e.g. `docs/knowledge/` — create if needed). Session-scoped hypotheses stay in the plan/issue.
46+
- **Review Mode**: For non-trivial changes: propose in text, read-only checks only, wait for approval. Surface everything relevant, not just what was asked.
47+
- **Incremental Implementation**: Don't assume you know full scope. Work on small sub-tasks, verify alignment after each.
48+
- **Plan Fidelity**: Don't silently reduce plan scope. If A turns out wrong mid-implementation, stop and update the plan — don't deliver partial work and call it done.
49+
- **Scope Decisions Are Not Yours to Make Silently**: Related work is either included (if excluding breaks coherence) or asked about. "That's separate work" is never a silent conclusion.
50+
- **Meta Feedback**: On `meta:` prefix, pause immediately. Propose instruction-file changes via Review Mode, apply after approval, resume. Update this file, not memory.
51+
52+
## Plans
53+
Use GitHub Issues on `ku-nlp/jumanpp` for plan tracking.
54+
Never use per-project memory — it's local only.
55+
56+
# Build & Test
57+
58+
Out-of-source build (CMake refuses in-source):
59+
60+
```bash
61+
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug # or Release for perf work
62+
cmake --build build -j
63+
ctest --test-dir build --output-on-failure
64+
```
65+
66+
For formatting before commits: `./do_format.sh` (clang-format).
67+
68+
Ubuntu 22.04 needs `libprotobuf-dev protobuf-compiler`.
69+
70+
**Model compatibility warning**: Current git HEAD is not compatible with released model files (2.0-rc1 / rc2).
71+
End-to-end analysis requires either rebuilding the dictionary or using a matching release tarball.
72+
Do not commit model or dictionary binaries.
73+
74+
# Subsystem Map
75+
76+
- `src/core/` — analysis engine (lattice), feature computation, dictionary compilation, training, codegen, spec DSL.
77+
- `analysis/` — runtime lattice, beam search, char lattice, analyzer
78+
- `spec/` — DSL for declaring dictionary fields, features, unks
79+
- `training/` — structured perceptron / loss
80+
- `codegen/` — generated feature-compute C++
81+
- `dic/` — dictionary builder & reader
82+
- `src/jumandic/` — Juman dictionary schema + `jumanpp` CLI.
83+
- `src/rnn/` — RNNLM scorer. **Experimental replacement target** (transformer).
84+
- `src/util/` — containers, mmap, serialization, flatmap, logging.
85+
- `src/testing/` — standalone test harness used by `*_test.cc` files.
86+
87+
# Language & Style
88+
89+
- **C++14 baseline.** Widely supported everywhere we build (gcc, clang, MSVC, mingw64). No reason to artificially avoid its features; also no reason to reach for C++17/20 without discussing — CMake and CI expect C++14.
90+
- Headers and sources colocated under `src/`; tests sit next to the code they test as `*_test.cc`.
91+
- Run `./do_format.sh` before committing. It wraps `script/git-clang-format.py` and formats only changed hunks, not full files.
92+
- **Do not mass-reformat the codebase.** Formatting migrates per-hunk as files are edited, so the tree gradually picks up whatever clang-format version contributors have locally. Full-file passes break `git blame` and produce churn with no real benefit.
93+
94+
# Testing
95+
96+
- In-tree harness: `src/testing/standalone_test.h` wraps Catch-style `TEST_CASE`. Tests are `*_test.cc` files next to their subject and are discovered by CMake.
97+
- Run the full suite: `ctest --test-dir build --output-on-failure`. For one test: `ctest --test-dir build -R <name> --output-on-failure`.
98+
99+
# Documentation
100+
101+
> Agents start cold. Docs exist so session N+5 doesn't re-derive session N.
102+
103+
- **System docs** (`docs/`): existing files — `analysis.md`, `building.md`, `dictionary.md`, `output.md`, `spec.md`. High-level *what* and *why*; implementation lives in code.
104+
- **Knowledge** (`docs/knowledge/`): instructive dead-ends and confirmed non-obvious facts from investigations. Create the directory on first use.
105+
- **Code comments**: describe the *goal*, not the mechanism. If a comment only narrates what the code already says, delete it. Non-obvious code with no clear goal is a refactor target, not a comment target.

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
cmake_minimum_required(VERSION 3.10)
1+
cmake_minimum_required(VERSION 3.13)
22
project(jumanpp)
33

44
include(version.cmake)

README.md

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ the original Juman++.
1515
* Compiler: C++14 compatible
1616
* For example gcc 5.1+, clang 3.4+, MSVC 2017
1717
* We test on GCC and clang on Linux/MacOS, mingw64-gcc and MSVC2017 on Windows
18-
- CMake v3.1 or later
18+
- CMake v3.13 or later
1919
- For Ubuntu22.04, you need to install additional packages as follows: `sudo apt install libprotobuf-dev protobuf-compiler`
2020

2121
Read [this document](docs/building.md) for CentOS and RHEL derivatives or non-CMake alternatives.
@@ -27,25 +27,22 @@ Download the package from [Releases](https://github.com/ku-nlp/jumanpp/releases)
2727
**Important**: The download should be around 300 MB. If it is not you have probably downloaded a source snapshot which does not contain a model.
2828

2929
```bash
30-
$ tar xf jumanpp-<version>.tar.xz # decompress the package
31-
$ cd jumanpp-<version> # move into the directory
32-
$ mkdir bld # make a subdirectory for build
33-
$ cd bld
34-
$ cmake .. \
35-
-DCMAKE_BUILD_TYPE=Release \ # you want to do this for performance
36-
-DCMAKE_INSTALL_PREFIX=<prefix> # where to install Juman++
37-
$ make install -j<parallelism>
30+
$ tar xf jumanpp-<version>.tar.xz # decompress the package
31+
$ cd jumanpp-<version>
32+
$ cmake -S . -B build \
33+
-DCMAKE_BUILD_TYPE=Release \
34+
-DCMAKE_INSTALL_PREFIX=<prefix>
35+
$ cmake --build build -j<parallelism>
36+
$ cmake --install build
3837
```
3938
## Building from git
4039

4140
**Important**: Only the package distribution contains a pretrained model and can be used for analysis.
4241
The current git version is not compatible with the models of 2.0-rc1 and 2.0-rc2.
4342

4443
```bash
45-
$ mkdir cmake-build-dir # CMake does not support in-source builds
46-
$ cd cmake-build-dir
47-
$ cmake ..
48-
$ make # -j
44+
$ cmake -S . -B build
45+
$ cmake --build build -j
4946
```
5047

5148
# Usage

0 commit comments

Comments
 (0)