Convergence: LLM Docs-Code Drift Experiment

What happens when LLMs repeatedly build code from a spec, then re-document the code they just built? Does the spec drift into nonsense, or does the system converge?

This experiment answers that question by running a build-document loop for 10 iterations and measuring what changes.

How it works

Three LLM roles operate in a cycle:

Documenter -- writes a product spec from a seed prompt (iteration 0) or from reading the current code (iterations 1-N)
Builder -- implements the spec as a plain HTML/CSS/JS application
Judge -- compares the current spec against the original and scores intent preservation (0-10)

Each iteration: build from spec, re-document from code, judge the result. The workspace is a git repo, so every step is committed and diffable.

Repository contents

Path	Purpose
`entropy.py`	Main orchestrator -- runs the full experiment
`prompts.py`	System/user prompts for all three LLM roles
`metrics.py`	Per-iteration metrics collection (LOC, complexity, doc stats)
`git_ops.py`	Git helper functions for workspace commits
`run_iterations.sh`	Shell script for running individual iterations via Claude CLI
`viewer.html`	Interactive results viewer with embedded spec versions and charts
`workspace/`	The LLM-generated application (to-do list) and its evolving spec
`output/`	Experiment artifacts: `metrics.json`, `original_spec.md`

Running the experiment

Prerequisites: Python 3.10+, an Anthropic API key set as ANTHROPIC_API_KEY.

pip install -r requirements.txt
python entropy.py --clean --verbose

Options:

--iterations N -- number of cycles (default: 10)
--model MODEL -- Anthropic model ID (default: claude-sonnet-4-6)
--clean -- remove existing workspace before starting

Viewing results

GitHub Pages: https://theletterf.github.io/convergence-llm-experiment/

Locally:

open viewer.html

The viewer shows iteration-over-iteration charts for intent score, lines of code, spec word count, and JS complexity, plus the full text of each spec version.

Artifacts

output/metrics.json -- structured metrics for all iterations (intent scores, LOC, word counts, complexity, drift descriptions)
workspace/ -- the generated application with git history showing every build and re-document step
viewer.html -- self-contained results viewer with all data embedded

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Convergence: LLM Docs-Code Drift Experiment

How it works

Repository contents

Running the experiment

Viewing results

Artifacts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
output		output
workspace		workspace
.gitignore		.gitignore
README.md		README.md
WRITEUP.md		WRITEUP.md
entropy.py		entropy.py
git_ops.py		git_ops.py
metrics.py		metrics.py
prompts.py		prompts.py
requirements.txt		requirements.txt
run_iterations.sh		run_iterations.sh
viewer.html		viewer.html

Folders and files

Latest commit

History

Repository files navigation

Convergence: LLM Docs-Code Drift Experiment

How it works

Repository contents

Running the experiment

Viewing results

Artifacts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages