Skip to content

Commit e5f5738

Browse files
committed
Write "Map-reducing-myself" article (#15)
* Write blog * Improve * Fix proofer * wip fix htmlproofer
1 parent 6d59167 commit e5f5738

2 files changed

Lines changed: 103 additions & 4 deletions

File tree

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
title: "Map-Reducing Myself"
3+
date: 2026-03-28
4+
description: "312 conversations with Claude, compressed to 15 words."
5+
tags: ["ai", "reflection", "data"]
6+
---
7+
8+
I talk to Claude every day. It writes code with me, reviews my PRs, debugs my specs, validates my homework. At 3am it is the thing I talk to when I am still awake and thinking about something I cannot put into words yet. Over six months that added up to 312 conversations and 21MB of JSON.
9+
10+
Claude Code has a `/insights` command that analyzes your usage. It told me I am a "persistent, correction-driven iterator" who steers Claude through "frequent wrong approaches with direct feedback." 98 sessions, 198 hours. Accurate. Shallow.
11+
12+
I wanted to go deeper. So I built a pipeline to summarize all 312 conversations into a profile of myself.
13+
14+
## The pipeline
15+
16+
The raw export is noisy: metadata, UUIDs, timestamps, tool calls, thinking blocks, and the actual text buried inside. The first step is stripping everything that is not what I said or what Claude said.
17+
18+
```
19+
21MB raw JSON -> 4MB normalized -> 2.9MB plain text
20+
```
21+
22+
Most of the file was structure, not content. The actual conversations compress to 2.9MB.
23+
24+
Next, chunking. Each chunk needs to fit in a context window, so I targeted 80K tokens per chunk, roughly 320KB of text:
25+
26+
```python
27+
TARGET_BYTES = 320_000
28+
29+
for conv in data:
30+
conv_size = sum(len(m.get('text', '')) for m in conv['messages'])
31+
if current_size + conv_size > TARGET_BYTES and current_chunk:
32+
chunks.append(current_chunk)
33+
current_chunk, current_size = [], 0
34+
current_chunk.append(conv)
35+
current_size += conv_size
36+
```
37+
38+
That gives 10 chunks. Each gets fed to Claude Sonnet with a prompt asking for everything the conversations reveal about me: identity, preferences, code style, communication style, personality, interests, struggles, projects. Each observation labeled `[stated]`, `[demonstrated]`, or `[inferred]`. The 10 summaries then merge into one profile. Five parallel workers, a couple of minutes, $3.31 via the API.
39+
40+
```
41+
10 chunks -> Sonnet -> 10 summaries (~28K tokens) -> Sonnet -> 1 profile
42+
```
43+
44+
## What it found
45+
46+
The profile was accurate. Renuo, ZHAW, Neovim, Ruby, Rust, it/its pronouns, strong opinions about naming, many typos, rejects em-dashes and emojis in technical contexts. It noted that I ask for 20 naming options, then 50, then "more generic," and called this "an intellectual interest, not thoroughness." It catalogued my projects, my coursework, my side interests. All correct. And shallow.
47+
48+
It described what I do. Not why.
49+
50+
## What it missed
51+
52+
The profile found that I write inline asserts in my numerical methods code. Pre/postcondition checks inside the functions, not in a test file. It noted this as "inline verification" and moved on.
53+
54+
The asserts are not checking the math. I understand the math. They are checking the transcription. I type fast and produce "everutjing" when I mean "everything." The asserts exist because I do not trust that what I typed is what I meant. The mind is precise. The hands are not. Everything I build sits in that gap.
55+
56+
Claude could not find this. It sees patterns but not the thing that connects them. During the conversation that followed the pipeline, we kept pulling on that thread, and it turned out to be the thread that tied everything together. The typos, the naming obsession, the formatting rules, the inline asserts, the pronouns. All the same gap. None of that was in the data. It came out of the conversation.
57+
58+
## The compression
59+
60+
After the pipeline produced the profile, I spent the better part of a day in conversation with Claude, questioning it, correcting it, adding context the data did not contain. Claude read my blog posts and gallery captions on cb341.dev to understand how I write. It asked me things the data could not answer. Then we started compressing.
61+
62+
```
63+
21MB    raw JSON
64+
 4MB    normalized
65+
2.9MB   plain text
66+
 28K    tokens of summaries
67+
  6K    tokens of merged profile
68+
  200   words of dense description
69+
```
70+
71+
At each stage, something is lost. The metadata, the structure, the individual conversations, the contradictions, the context. But at each stage, what survives is more essential than what was removed.
72+
73+
There is a concept in information theory called Kolmogorov complexity: the length of the shortest program that produces a given output. It is uncomputable in general, but the idea is useful. The 200-word profile described me accurately. But so did the 6K-token version, and so did the 28K-token version. Each was shorter, none was minimal. Still describing, not yet true.
74+
75+
200 words felt close. Not close enough. So we kept compressing.
76+
77+
## 15 words
78+
79+
Back and forth, rejecting drafts, cutting what was description and keeping what was true, until there was nothing left to remove.
80+
81+
> dani understands and cannot express.
82+
> so it builds until it can.
83+
> and gives it away.
84+
85+
21 megabytes to 15 words.
86+
87+
## Still figuring things out
88+
89+
I went into this thinking I could automate self-knowledge. I could not. Claude could not ask "when was the last time someone took care of you." Some things only surface when someone asks the right question.
90+
91+
But the automated profile was accurate enough to see myself in and say "yes, but also this." Claude was the first draft. The conversation was the edit.
92+
93+
Claude drafted the words. I shaped them until they were mine. The [code](https://gist.github.com/cb341/032d32bc2d8c161f7c414865a6e3e1e6) and the [result](https://gist.github.com/cb341/80fa11ce8df67e289241585d3b67e06b) are MIT licensed.

bin/htmlproofer

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,14 @@
11
#!/usr/bin/env bash
22
set -e
33

4+
MAX_RETRIES=3
5+
46
# linkedin.com, amd.com, seagate.com, corsair.com, reddit.com, apple.com: bot-blocking (links are valid, fail without browser UA)
5-
# TODO: remove /cb341\.dev\/gallery/ and /github\.com\/cb341/ once gallery is deployed
6-
bundle exec htmlproofer _site \
7-
--ignore-urls "/linkedin\.com/,/amd\.com/,/seagate\.com/,/corsair\.com/,/reddit\.com/,/apple\.com/,/cb341\.dev\/blog\/tags/,/cb341\.dev\/gallery/,/github\.com\/cb341/" \
8-
--ignore-status-codes "429,999"
7+
for i in $(seq 1 $MAX_RETRIES); do
8+
bundle exec htmlproofer _site \
9+
--ignore-urls "/linkedin\.com/,/amd\.com/,/seagate\.com/,/corsair\.com/,/reddit\.com/,/apple\.com/,/cb341\.dev\/blog\/tags/,/cb341\.dev\/blog\/map-reduce-myself/,/github\.com\/cb341/" \
10+
--ignore-status-codes "429,999" \
11+
&& exit 0
12+
echo "Attempt $i failed, retrying..."
13+
done
14+
exit 1

0 commit comments

Comments
 (0)