Skip to content

Commit f298edc

Browse files
authored
Merge pull request #1889 from link-assistant/issue-1886-d58595c4ffc5
fix(cost): accumulate Anthropic cost across limit-reset resumes (#1886)
2 parents 3f5cde8 + f7d7732 commit f298edc

12 files changed

Lines changed: 887 additions & 13 deletions
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
'@link-assistant/hive-mind': patch
3+
---
4+
5+
fix(cost): accumulate Anthropic cost across limit-reset resumes (#1886)
6+
7+
The session cost summary could report a large negative "Difference" (e.g.
8+
`$-11.422796 (-31.66%)`) between the public pricing estimate and the Anthropic
9+
figure. Root cause: the public estimate is computed from the session JSONL,
10+
which accumulates the **entire** session across every limit-reset resume, while
11+
the Anthropic `total_cost_usd` from the stream-json `result` event is scoped to a
12+
**single** Claude process (only the resumed run). Comparing a full-session
13+
estimate against a single-process figure produced a misleading gap even though
14+
both numbers were individually correct.
15+
16+
The per-token math (`calculateModelCost`) was audited and is correct; this is a
17+
scope mismatch, not a pricing error.
18+
19+
Fix:
20+
21+
- New `src/anthropic-cost-accumulator.lib.mjs` keeps a model-agnostic running
22+
total of Anthropic's per-process `total_cost_usd` (it sums dollars, never
23+
inspecting per-token prices, so it is correct for all models).
24+
- `runClaude` seeds from and returns the cumulative total on every terminal path;
25+
the cross-process limit-reset resume threads it via a new hidden
26+
`--previous-anthropic-cost` option (`autoContinueWhenLimitResets`).
27+
- A usage-limit hit ends as `is_error` with no `success` result event, so its
28+
cost was previously discarded. The cost from a non-success terminal `result`
29+
event is now kept as a fallback and folded into the accumulator, closing the
30+
gap in the reported scenario.
31+
- `displayCostComparison` / `displaySessionTokenUsage` print a verbose
32+
accumulation breakdown ("cumulative across resume iterations: this run … +
33+
carried forward … = …") so the figure is never mysterious again.
34+
35+
A deep case study (timeline, proven root causes, exact reproduced numbers, online
36+
prior art incl. `anthropics/claude-code#13088`) is compiled under
37+
`docs/case-studies/issue-1886/`.
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Case study — Issue #1886: "Calculation of cost has difference"
2+
3+
- Issue: <https://github.com/link-assistant/hive-mind/issues/1886>
4+
- Observed in: <https://github.com/link-assistant/formal-ai/pull/396#issuecomment-4672854592>
5+
- Source log (gist): <https://gist.githubusercontent.com/konard/4c233f1134b97d5ca4b20482743a85fb/raw/1e72d523a79073c2c81e7bdfe4089dd7a0baf2c8/solution-draft-log-pr-1781113643393.txt>
6+
- Fix PR: <https://github.com/link-assistant/hive-mind/pull/1889>
7+
8+
## Summary
9+
10+
A working-session log reported a cost discrepancy in its final summary:
11+
12+
```
13+
💰 Cost estimation:
14+
Public pricing estimate: $36.085016
15+
Calculated by Anthropic: $24.662220
16+
Difference: $-11.422796 (-31.66%)
17+
```
18+
19+
The instinct is "the per-token pricing math is wrong." **It is not.** Both numbers
20+
are individually correct — they simply cover **different scopes**:
21+
22+
- **"Public pricing estimate" ($36.085016)** is computed from the session JSONL
23+
file, which accumulates the **entire** session across every limit-reset resume.
24+
- **"Calculated by Anthropic" ($24.662220)** comes from the stream-json `result`
25+
event's `total_cost_usd`, which is scoped to a **single Claude process** — only
26+
the last (resumed) run.
27+
28+
This session hit the Anthropic usage limit during the first run, was auto-resumed
29+
into a second process ~2.5 hours later, and the second process's `result` event
30+
naturally only knew about its own cost. The summary then compared a **full-session
31+
estimate** against a **single-process Anthropic figure**, producing the misleading
32+
`-31.66%`.
33+
34+
The fix accumulates Anthropic's per-process `total_cost_usd` across resume
35+
iterations so the displayed Anthropic figure shares the same full-session scope as
36+
the public estimate. The accumulation is **model-agnostic** — it sums dollar
37+
amounts and never inspects per-token prices, so it is correct for all models.
38+
39+
## Timeline (reconstructed from the gist log)
40+
41+
All times UTC, 2026-06-10. Session id: `160da4c5-d2f8-4488-873e-5936eacfac37`.
42+
Raw excerpts are preserved under [`data/`](./data).
43+
44+
| Time | Event |
45+
| -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
46+
| 14:08:21 | **Run 1** starts — original `solve` process for formal-ai issue #395 / PR #396. Writes to session JSONL `160da4c5…jsonl`. |
47+
| 14:43:41 | **Usage limit reached.** Run 1 is interrupted (ends as `is_error` — no `success` result event). Comment "⏳ Usage Limit Reached" posted. |
48+
| 17:03:10 | **Auto Resume (on limit reset).** `autoContinueWhenLimitResets` spawns **Run 2**: `solve … --resume 160da4c5… --auto-resume-on-limit-reset --auto-resume-iteration 1 --session-type auto-resume` (a fresh node process). |
49+
| 17:03:20 | Run 2's Claude process starts with `claude --resume 160da4c5…`. It **appends** to the same JSONL, which already holds Run 1's turns. |
50+
| 17:45:44 | Run 2's context auto-compacts ("This session is being continued from a previous conversation…"). |
51+
| 17:47:07 | Run 2 emits its `success` `result` event: `total_cost_usd: 24.662219…`, `modelUsage.claude-fable-5` = 31 490 / 137 297 / 13 211 220 / 341 700 (in/out/cache-read/cache-write). Captured: `💰 Anthropic official cost captured from success result: $24.662220`. |
52+
| 17:47:12 | Final **Token Usage Summary** computed from the **full JSONL**: 45 265 / 185 995 / 16 444 028 / 791 087 → **$36.085016**. Cost comparison prints the `-31.66%` difference. |
53+
54+
The key structural fact: **Run 1 and Run 2 are separate OS processes that share one
55+
JSONL file.** The JSONL is cumulative; the `result` event is per-process.
56+
57+
## Reproducing the discrepancy
58+
59+
The exact numbers reproduce from the real token counts (see
60+
[`../../../experiments/issue-1886-costcheck.mjs`](../../../experiments/issue-1886-costcheck.mjs)
61+
and [`../../../tests/test-issue-1886-cost-accumulation.mjs`](../../../tests/test-issue-1886-cost-accumulation.mjs)):
62+
63+
```bash
64+
node experiments/issue-1886-costcheck.mjs
65+
# result-event scope cost (should ~= 24.662220): 24.662220
66+
# full-session scope cost (should ~= 36.085016): 36.085015
67+
# reported difference -31.66% reproduced: -31.66%
68+
# run1 folded from non-success fallback (should ~= 11.42): 11.422795
69+
# cumulative anthropic after resume: 36.085015 -> matches full estimate: true
70+
```
71+
72+
Fable 5 pricing (per million tokens, from models.dev): input $10, cache-write
73+
$12.5, cache-read $1, output $50.
74+
75+
| Scope | input | cache-write | cache-read | output | × prices = cost |
76+
| --------------------------------- | ------ | ----------- | ---------- | ------- | --------------- |
77+
| Run 2 (result-event `modelUsage`) | 31 490 | 341 700 | 13 211 220 | 137 297 | **$24.662220** |
78+
| Full session (JSONL summary) | 45 265 | 791 087 | 16 444 028 | 185 995 | **$36.085016** |
79+
| Run 1 (difference) | 13 775 | 449 387 | 3 232 808 | 48 698 | **~$11.422796** |
80+
81+
`(24.662220 − 36.085016) / 36.085016 × 100 = −31.66%` — the reported gap, exactly.
82+
This proves the per-token math is correct and the gap is purely a **scope mismatch**.
83+
84+
## Requirements (from the issue body)
85+
86+
1. **Find the root cause of the cost-calculation difference and fix it for all models.**
87+
2. **Double-check the logs; make sure all usage tokens are properly calculated.**
88+
3. **Download all related logs/data into `docs/case-studies/issue-1886`.**
89+
4. **Deep case study analysis** (incl. online search): reconstruct timeline, list every
90+
requirement, find root causes per problem, propose solutions/plans, and check
91+
known existing components/libraries that solve similar problems.
92+
5. **If data is insufficient for the root cause, add debug output / verbose mode** for
93+
the next iteration.
94+
6. **If the issue is related to another repository, report it** with reproducible
95+
examples, workarounds, and code-fix suggestions.
96+
7. **Apply the fix in all places** in the codebase where the issue exists.
97+
8. **Plan and execute everything in a single PR** (#1889).
98+
99+
## Root cause analysis
100+
101+
### Primary root cause — scope mismatch (proven)
102+
103+
`displayCostComparison` (in `src/claude.budget-stats.lib.mjs`) compares:
104+
105+
- `publicCost``calculateModelCost(usage, modelInfo)` over the **full session JSONL**
106+
(the JSONL accumulates every resume iteration; limit-reset resumes append to the
107+
same `<session-id>.jsonl`), and
108+
- `anthropicCost` — the `result` event's `total_cost_usd`, **scoped to one Claude
109+
process** (`src/claude.lib.mjs`, captured at the `subtype === 'success'` branch).
110+
111+
When a session spans more than one process (limit-reset resume, fallback-model
112+
switch, etc.), these scopes diverge and the comparison is apples-to-oranges. The
113+
per-token cost function `calculateModelCost` was audited and is **correct** — it
114+
multiplies input/cache-write/cache-read/output tokens by the model's per-million
115+
prices using `decimal.js-light`, plus web-search per-request. No pricing bug exists.
116+
117+
### Secondary root cause — limit-hit cost was discarded
118+
119+
The Anthropic cost was only captured from a `result` event with
120+
`subtype === 'success'`. A usage-limit hit (Run 1) ends as `is_error`, so **its
121+
`total_cost_usd` was explicitly ignored** (the old code logged
122+
`💰 Anthropic cost from … result ignored`). That meant Run 1's ~$11.42 could never
123+
be folded into a cumulative total even in principle — so accumulation alone would
124+
still have under-counted the very scenario in the report.
125+
126+
### External corroboration
127+
128+
This is a known, documented property of the Claude Code Agent SDK, not a
129+
hive-mind-specific miscalculation:
130+
131+
- The official **"Track cost and usage"** docs state each `query()` call returns its
132+
own `total_cost_usd` and _"The SDK does not provide a session-level total… you
133+
need to accumulate the totals yourself"_
134+
(<https://platform.claude.com/docs/en/agent-sdk/cost-tracking>).
135+
- Upstream bug **anthropics/claude-code#13088**_"`/cost` Command Resets on Session
136+
Resume"_ — describes exactly this: after resuming a session, `/cost` shows only the
137+
cost since resume, not the cumulative cost from the beginning
138+
(<https://github.com/anthropics/claude-code/issues/13088>).
139+
140+
Because the upstream SDK deliberately scopes cost per-process and leaves
141+
session-level aggregation to the caller, the correct place to fix this is **in
142+
hive-mind** (the caller), which is what this PR does. No new upstream issue is
143+
warranted — #13088 already tracks the SDK-side behavior, and this PR links to it.
144+
145+
## The fix
146+
147+
### 1. A centralized cumulative-cost accumulator
148+
149+
`src/anthropic-cost-accumulator.lib.mjs` (new) holds a module-level running total
150+
per node process:
151+
152+
- `seedCumulativeAnthropicCost(previousAnthropicCostUSD)` — seeds the total **once**
153+
per process from the carried-forward value (idempotent, so the in-process
154+
auto-merge / keep-working loop can call it repeatedly without double-seeding).
155+
- `addAnthropicRunCost(runCostUSD)` — folds one finished process's cost into the
156+
total (non-positive / non-finite values add nothing). Returns the cumulative.
157+
- `getCumulativeAnthropicCost()`, `hasCumulativeAnthropicCost()`,
158+
`resetCumulativeAnthropicCost()` (test helper).
159+
160+
Summing dollar amounts makes it **model-agnostic** — it satisfies "fix it for all
161+
models" without ever touching per-token prices.
162+
163+
### 2. Thread the cumulative total across the cross-process resume
164+
165+
- `src/solve.config.lib.mjs` — adds a hidden `--previous-anthropic-cost` option.
166+
- `src/claude.lib.mjs` — on every terminal path (success **and** all failure paths:
167+
limit hit, stuck-retry, retries-exhausted, exception) it seeds from
168+
`argv.previousAnthropicCost`, folds this process's cost, and returns the
169+
**cumulative** total as `anthropicTotalCostUSD`.
170+
- `src/solve.auto-continue.lib.mjs``autoContinueWhenLimitResets` reads the
171+
cumulative total and passes `--previous-anthropic-cost <total>` to the resumed
172+
`solve` process, so Run 2 continues Run 1's running total.
173+
174+
Because `runClaude` now returns the cumulative value, the **in-process** auto-merge
175+
/ watch / keep-working loops in `solve.mjs` pick it up automatically
176+
(`latestAnthropicCost = toolResult.anthropicTotalCostUSD`) — no extra `+=` needed.
177+
178+
### 3. Capture the limit-hit cost (secondary root cause)
179+
180+
`src/claude.lib.mjs` now keeps the `total_cost_usd` from a **non-success** terminal
181+
`result` event as a fallback (`anthropicCostFromAnyResult`) and folds
182+
`successCost ?? nonSuccessResultCost` on the failure paths. This lets Run 1's
183+
~$11.42 be carried into Run 2, fully closing the gap in the reported scenario.
184+
185+
### 4. Scope-aware diagnostics (so the number is never mysterious again)
186+
187+
`displayCostComparison` / `displaySessionTokenUsage` now accept
188+
`previousAnthropicCost`. When a carried-forward cost is present, verbose mode prints
189+
an explicit breakdown:
190+
191+
```
192+
↳ Anthropic cost is cumulative across resume iterations (issue #1886):
193+
this run: $24.662220 + carried forward: $11.422796 = $36.085016
194+
```
195+
196+
If a future scenario still can't capture an earlier process's cost (e.g. the SDK
197+
emits no cost at all on a hard limit), this breakdown makes the residual scope
198+
difference visible instead of surfacing a bare misleading percentage.
199+
200+
## Verification
201+
202+
- `node experiments/issue-1886-costcheck.mjs` — reproduces $24.662220 / $36.085016 /
203+
−31.66% and shows accumulation closing the gap to the full-session estimate.
204+
- `node tests/test-issue-1886-cost-accumulation.mjs` — 12 tests covering the
205+
reproduction, the accumulator (idempotent seed, accumulation, input sanitization),
206+
the non-success fallback, and the display breakdown.
207+
- `node tests/test-display-cost-comparison.mjs` — existing display tests still pass.
208+
- `node scripts/run-tests.mjs --suite default` — all 237 default test files pass.
209+
- `npm run lint` — clean.
210+
211+
## Solution alternatives considered
212+
213+
| Option | Verdict |
214+
| ------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------- |
215+
| Compute the public estimate over the per-process `modelUsage` scope (shrink the public number to match Anthropic). | Rejected — it would hide the true full-session cost, which is the number users actually care about. |
216+
| Accumulate Anthropic `total_cost_usd` across resume iterations (chosen). | Adopted — both numbers end up at full-session scope; model-agnostic; matches the official SDK guidance to "accumulate the totals yourself". |
217+
| Drop the Anthropic figure entirely on resumed sessions. | Rejected — loses Anthropic's authoritative cost and the useful public-vs-actual comparison. |
218+
219+
## Existing components / libraries checked
220+
221+
- **Anthropic Claude Code Agent SDK cost-tracking guidance** — the canonical pattern
222+
is exactly "accumulate `total_cost_usd` yourself across `query()` calls"; this PR
223+
implements that pattern (<https://platform.claude.com/docs/en/agent-sdk/cost-tracking>).
224+
- **`decimal.js-light`** — already used by `src/claude.cost.lib.mjs` for precise
225+
per-token math; reused, unchanged.
226+
- **In-repo precedent**`src/claude.cost.lib.mjs` / `src/claude.budget-stats.lib.mjs`
227+
already centralize cost computation/rendering (Issues #1557, #1703, #1834); the new
228+
accumulator follows the same single-responsibility, well-tested module convention.
229+
230+
## Sources
231+
232+
- Anthropic — Track cost and usage (Agent SDK): <https://platform.claude.com/docs/en/agent-sdk/cost-tracking>
233+
- anthropics/claude-code#13088`/cost` resets on session resume: <https://github.com/anthropics/claude-code/issues/13088>
234+
- Original observation: <https://github.com/link-assistant/formal-ai/pull/396#issuecomment-4672854592>
235+
- Full session log: <https://gist.github.com/konard/4c233f1134b97d5ca4b20482743a85fb>

0 commit comments

Comments
 (0)