theletterf
diff --git a/‎.github/workflows/pages.yml‎
Lines changed: 35 additions & 0 deletions b/‎.github/workflows/pages.yml‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 60 additions & 0 deletions b/‎README.md‎
Lines changed: 60 additions & 0 deletions
diff --git a/‎WRITEUP.md‎
Lines changed: 57 additions & 0 deletions b/‎WRITEUP.md‎
Lines changed: 57 additions & 0 deletions
@@ -0,0 +1,35 @@
+name: Deploy viewer to GitHub Pages
+
+on:
+  workflow_dispatch:
+
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+concurrency:
+  group: pages
+  cancel-in-progress: false
+
+jobs:
+  deploy:
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Prepare site
+        run: |
+          mkdir _site
+          cp viewer.html _site/index.html
+
+      - uses: actions/configure-pages@v5
+
+      - uses: actions/upload-pages-artifact@v3
+
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v4
@@ -0,0 +1,2 @@
+__pycache__/
+.claude/
@@ -0,0 +1,60 @@
+# Convergence -- LLM Documentation-Code Drift Experiment
+
+What happens when LLMs repeatedly build code from a spec, then re-document the code they just built? Does the spec drift into nonsense, or does the system converge?
+
+This experiment answers that question by running a build-document loop for 10 iterations and measuring what changes.
+
+## How it works
+
+Three LLM roles operate in a cycle:
+
+1. **Documenter** -- writes a product spec from a seed prompt (iteration 0) or from reading the current code (iterations 1-N)
+2. **Builder** -- implements the spec as a plain HTML/CSS/JS application
+3. **Judge** -- compares the current spec against the original and scores intent preservation (0-10)
+
+Each iteration: build from spec, re-document from code, judge the result. The workspace is a git repo, so every step is committed and diffable.
+
+## Repository contents
+
+| Path | Purpose |
+|------|---------|
+| `entropy.py` | Main orchestrator -- runs the full experiment |
+| `prompts.py` | System/user prompts for all three LLM roles |
+| `metrics.py` | Per-iteration metrics collection (LOC, complexity, doc stats) |
+| `git_ops.py` | Git helper functions for workspace commits |
+| `run_iterations.sh` | Shell script for running individual iterations via Claude CLI |
+| `viewer.html` | Interactive results viewer with embedded spec versions and charts |
+| `workspace/` | The LLM-generated application (to-do list) and its evolving spec |
+| `output/` | Experiment artifacts: `metrics.json`, `original_spec.md` |
+
+## Running the experiment
+
+**Prerequisites:** Python 3.10+, an Anthropic API key set as `ANTHROPIC_API_KEY`.
+
+```bash
+pip install -r requirements.txt
+python entropy.py --clean --verbose
+```
+
+Options:
+- `--iterations N` -- number of cycles (default: 10)
+- `--model MODEL` -- Anthropic model ID (default: claude-sonnet-4-6)
+- `--clean` -- remove existing workspace before starting
+
+## Viewing results
+
+**GitHub Pages:** [https://theletterf.github.io/convergence-llm-experiment/](https://theletterf.github.io/convergence-llm-experiment/)
+
+**Locally:**
+
+```bash
+open viewer.html
+```
+
+The viewer shows iteration-over-iteration charts for intent score, lines of code, spec word count, and JS complexity, plus the full text of each spec version.
+
+## Artifacts
+
+- **`output/metrics.json`** -- structured metrics for all iterations (intent scores, LOC, word counts, complexity, drift descriptions)
+- **`workspace/`** -- the generated application with git history showing every build and re-document step
+- **`viewer.html`** -- self-contained results viewer with all data embedded
@@ -0,0 +1,57 @@
+# LLM Documentation-Code Convergence: A Build-Document Loop Experiment
+
+## Motivation
+
+Software documentation drifts from code over time. Humans forget to update specs, add undocumented features, and let the gap widen. But what happens when *LLMs* are both the builder and the documenter? If a model builds code from a spec, then another model re-documents that code into a new spec, and the cycle repeats -- does the spec degrade into noise, or does something else happen?
+
+The intuition might be that each pass introduces small errors that compound -- a game of telephone where meaning is gradually lost. This experiment tests that assumption.
+
+## Method
+
+**Seed.** A documenter LLM generates a product spec from a one-line prompt: *"Build a to-do list web app."* This produces a structured PRD with user stories, functional requirements, non-functional requirements, and an out-of-scope section (~300 words).
+
+**Loop (10 iterations).** Each iteration has three steps:
+
+1. **Build.** A builder LLM reads the current spec and writes a complete HTML/CSS/JS application. If code already exists, it rewrites from scratch to match the spec.
+2. **Re-document.** A documenter LLM reads only the source code and writes a new product spec describing what the application does. It is instructed not to speculate about unimplemented features.
+3. **Judge.** A judge LLM compares the new spec against the *original* (iteration 0) spec and scores intent preservation on a 0-10 scale, notes feature drift, and classifies specificity shift.
+
+All three roles use the same model (Claude Sonnet). The workspace is a git repository; every build and re-document step is committed, creating a full diff history.
+
+**Metrics collected per iteration:** lines of code (by language), file count, JS cyclomatic complexity (keyword heuristic), spec word count, spec section count, intent score, feature drift description, and specificity shift classification.
+
+## Findings
+
+### Intent is preserved
+
+Intent scores ranged from **8 to 9 out of 10** across all 10 iterations. The judge consistently found that all core features from the original spec were present in every subsequent version. The system never dropped below 8.
+
+### The spec changes structure, not meaning
+
+The original spec was a traditional PRD: user stories, functional requirements, non-functional requirements, out-of-scope. By iteration 2, the re-documented specs had shifted to a behavioral prose style -- describing what the user sees and does rather than listing requirements in formal categories. The *information* was the same; the *format* changed.
+
+### Non-functional requirements and out-of-scope sections vanish
+
+The documenter, instructed to describe what the code *does*, correctly omits things the code doesn't explicitly express: responsiveness goals, accessibility commitments, and the out-of-scope list. These aren't "lost" -- they were never in the code to begin with. The documenter is doing its job.
+
+### Code stabilizes early
+
+Lines of code, file count, and JS complexity all stabilized by iteration 6 and remained identical through iteration 10. The builder, given a spec that faithfully describes the existing code, produces the same code. This is a fixed point.
+
+| Metric | Iter 1 | Iter 6 | Iter 10 |
+|--------|--------|--------|---------|
+| Total LOC | 254 | 188 | 188 |
+| JS lines | 123 | 71 | 71 |
+| JS complexity | 26 | 35 | 35 |
+| Spec words | 495 | 438 | 466 |
+| Intent score | 9 | 9 | 9 |
+
+### Specificity always increases
+
+Every iteration was classified as `more_specific` by the judge. The re-documenter, working from concrete code, naturally adds implementation details the original abstract spec didn't have (e.g., "modal dialogs," "drag handle," "localStorage persistence"). Specificity is a one-way ratchet: once a detail is in the code, the documenter captures it.
+
+## Interpretation
+
+The system **converges rather than diverges.** The build-document loop acts as a compression function: abstract requirements are compiled into code, then decompiled back into concrete behavioral descriptions. Information about *what the product does* is preserved. Information about *what the product should aspire to* (NFRs, scope boundaries) is lost -- because it was never encoded in the artifact the documenter reads.
+
+This suggests that LLM-driven documentation loops are more stable than the telephone-game intuition predicts, at least for small, well-scoped applications. The interesting failure mode isn't catastrophic drift -- it's the quiet disappearance of intent that lives outside the code.