Skip to content

theletterf/convergence-llm-experiment

Repository files navigation

Convergence: LLM Docs-Code Drift Experiment

What happens when LLMs repeatedly build code from a spec, then re-document the code they just built? Does the spec drift into nonsense, or does the system converge?

This experiment answers that question by running a build-document loop for 10 iterations and measuring what changes.

How it works

Three LLM roles operate in a cycle:

  1. Documenter -- writes a product spec from a seed prompt (iteration 0) or from reading the current code (iterations 1-N)
  2. Builder -- implements the spec as a plain HTML/CSS/JS application
  3. Judge -- compares the current spec against the original and scores intent preservation (0-10)

Each iteration: build from spec, re-document from code, judge the result. The workspace is a git repo, so every step is committed and diffable.

Repository contents

Path Purpose
entropy.py Main orchestrator -- runs the full experiment
prompts.py System/user prompts for all three LLM roles
metrics.py Per-iteration metrics collection (LOC, complexity, doc stats)
git_ops.py Git helper functions for workspace commits
run_iterations.sh Shell script for running individual iterations via Claude CLI
viewer.html Interactive results viewer with embedded spec versions and charts
workspace/ The LLM-generated application (to-do list) and its evolving spec
output/ Experiment artifacts: metrics.json, original_spec.md

Running the experiment

Prerequisites: Python 3.10+, an Anthropic API key set as ANTHROPIC_API_KEY.

pip install -r requirements.txt
python entropy.py --clean --verbose

Options:

  • --iterations N -- number of cycles (default: 10)
  • --model MODEL -- Anthropic model ID (default: claude-sonnet-4-6)
  • --clean -- remove existing workspace before starting

Viewing results

GitHub Pages: https://theletterf.github.io/convergence-llm-experiment/

Locally:

open viewer.html

The viewer shows iteration-over-iteration charts for intent score, lines of code, spec word count, and JS complexity, plus the full text of each spec version.

Artifacts

  • output/metrics.json -- structured metrics for all iterations (intent scores, LOC, word counts, complexity, drift descriptions)
  • workspace/ -- the generated application with git history showing every build and re-document step
  • viewer.html -- self-contained results viewer with all data embedded

About

An experiment in information loss

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors