|
| 1 | +# Issue 1900: Interactive Thinking Token Comment Noise |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +Issue #1900 reported that interactive mode created large numbers of "Unrecognized Event" PR comments while solving an external PR. The main source was Codex system events with `type: "system"` and `subtype: "thinking_tokens"`. These events can arrive hundreds of times during one turn, so treating each one as an unknown event caused PR comment spam and hid more useful progress updates. |
| 6 | + |
| 7 | +The fix groups consecutive `system.thinking_tokens` events into one editable "Thinking..." comment, throttles live edits, and finalizes the comment when the next non-thinking event arrives or the handler flushes. It also handles the lower-volume observed system lifecycle subtypes `status`, `compact_boundary`, and `task_updated` without falling through to the unrecognized-event renderer. |
| 8 | + |
| 9 | +## Evidence |
| 10 | + |
| 11 | +Raw investigation data is checked in under this directory: |
| 12 | + |
| 13 | +- `raw/github/hive-mind-issue-1900.json` |
| 14 | +- `raw/github/hive-mind-issue-1900-comments.json` |
| 15 | +- `raw/github/hive-mind-pr-1908.json` |
| 16 | +- `raw/github/lefinepro-kefine-pr-173.json` |
| 17 | +- `raw/github/lefinepro-kefine-pr-173-conversation-comments.json` |
| 18 | +- `raw/github/lefinepro-kefine-pr-173-review-comments.json` |
| 19 | +- `raw/github/lefinepro-kefine-pr-173-reviews.json` |
| 20 | +- `raw/github/lefinepro-kefine-pr-173.diff` |
| 21 | +- `raw/github/lefinepro-kefine-pr-173-runs.json` |
| 22 | +- `raw/logs/solution-draft-log-pr-1781180008338.txt` |
| 23 | +- `raw/logs/solution-draft-log-pr-1781183077272.txt` |
| 24 | +- `ci-logs/lefinepro-kefine-run-*.log` |
| 25 | + |
| 26 | +The external PR was `lefinepro/kefine#173`, created on 2026-06-11 at 10:34:20 UTC and merged on 2026-06-11 at 13:25:07 UTC. Its conversation comments contained 1,095 comments. Among comments rendered as unrecognized events, the observed system subtypes were: |
| 27 | + |
| 28 | +| Subtype | Count | First observed | |
| 29 | +| ------------------ | ----: | -------------------- | |
| 30 | +| `thinking_tokens` | 732 | 2026-06-11T10:35:14Z | |
| 31 | +| `status` | 4 | 2026-06-11T11:03:12Z | |
| 32 | +| `task_updated` | 3 | 2026-06-11T12:10:20Z | |
| 33 | +| `compact_boundary` | 2 | 2026-06-11T11:03:25Z | |
| 34 | + |
| 35 | +Example event shapes from the captured comments: |
| 36 | + |
| 37 | +```json |
| 38 | +{ |
| 39 | + "type": "system", |
| 40 | + "subtype": "thinking_tokens", |
| 41 | + "estimated_tokens": 50, |
| 42 | + "estimated_tokens_delta": 50, |
| 43 | + "uuid": "..." |
| 44 | +} |
| 45 | +``` |
| 46 | + |
| 47 | +```json |
| 48 | +{ |
| 49 | + "type": "system", |
| 50 | + "subtype": "compact_boundary", |
| 51 | + "compact_metadata": { |
| 52 | + "trigger": "auto", |
| 53 | + "pre_tokens": 117063, |
| 54 | + "post_tokens": 8499, |
| 55 | + "duration_ms": 84620 |
| 56 | + } |
| 57 | +} |
| 58 | +``` |
| 59 | + |
| 60 | +## Root Cause |
| 61 | + |
| 62 | +`src/interactive-mode.lib.mjs` already recognized several `system.*` subtypes used by Claude-style task events, but it did not recognize Codex lifecycle subtypes emitted during long turns. The fallback for unknown event types posts a public "Unrecognized Event" comment with raw JSON for debugging. That fallback is useful for new event discovery, but it is not acceptable for a high-frequency progress event. |
| 63 | + |
| 64 | +The implementation also lacked a way to remember the GitHub comment ID returned by queued `postComment` calls. Grouping a stream of thinking events into one comment requires posting once and then editing that same comment. GitHub's REST issue-comments API supports both creating and updating PR conversation comments because PR conversation comments are issue comments; the handler already had a safe `editComment` path for PATCH updates. |
| 65 | + |
| 66 | +Reference: [GitHub REST issue comments documentation](https://docs.github.com/rest/issues/comments). |
| 67 | + |
| 68 | +## Behavioral Contract |
| 69 | + |
| 70 | +The new behavior is: |
| 71 | + |
| 72 | +1. Consecutive `system.thinking_tokens` events create one "Thinking..." comment. |
| 73 | +2. Live updates edit that comment no more often than `CONFIG.MIN_THINKING_COMMENT_UPDATE_INTERVAL`. |
| 74 | +3. The comment finalizes to "Thought for ..." when a non-thinking event arrives or `flush()` is called. |
| 75 | +4. The raw JSON section contains the grouped array of thinking-token events for debugging. |
| 76 | +5. `system.status` is verbose-log-only by default. |
| 77 | +6. `system.compact_boundary` posts one context-compaction summary comment. |
| 78 | +7. `system.task_updated` updates known pending tasks when possible and otherwise stays out of public PR comments. |
| 79 | + |
| 80 | +## Test Evidence |
| 81 | + |
| 82 | +The reproducing test was first run against the pre-fix behavior in a temporary worktree. The preserved log is `raw/logs/test-interactive-mode-before-fix-real.log`: |
| 83 | + |
| 84 | +- Line 249: `Expected initial Thinking comment` |
| 85 | +- Line 268: `Expected no unrecognized comments/edits, got 3` |
| 86 | +- Lines 431-433: 111 passed, 2 failed |
| 87 | + |
| 88 | +After the implementation, the same behavior is covered by `tests/test-interactive-mode-thinking-1900.mjs`. Final verification logs include: |
| 89 | + |
| 90 | +- `raw/logs/test-interactive-mode-thinking-1900-after-fix.log`: 2 passed, 0 failed. |
| 91 | +- `raw/logs/test-interactive-mode-after-fix.log`: 111 passed, 0 failed. |
| 92 | +- `raw/logs/npm-lint.log`: `npm run lint` completed successfully. |
| 93 | +- `raw/logs/npm-format-check.log`: `npm run format:check` completed successfully. |
| 94 | +- `raw/logs/check-file-line-limits.log`: all checked files were within the 1,500-line limit. |
| 95 | +- `raw/logs/npm-test.log`: all 243 selected default-suite test files passed. |
0 commit comments