feat(solve): experimental --keep-working-until-all-requirements-are-fully-done (#1883) by konard · Pull Request #1884 · link-assistant/hive-mind

konard · 2026-06-10T09:24:43Z

Summary

Implements the experimental solve option requested in #1883:
--keep-working-until-all-requirements-are-fully-done.

After the main run (and any --finalize pass), the feature scans three cheap,
token-free sources — the PR description, the AI solution summary, and the
added lines of changed markdown documents — for strong indicators of deferred
work ("out of scope", "future work", "follow-up PR", "deferred", "delayed",
"TODO"/"TBD", etc.) using ~14 regular expressions. When indicators are found it
auto-restarts the AI tool with the concrete detected reasons plus the
verbatim reinforcement prompt from the issue, and repeats until the scan is
clean or the restart limit is reached.

Closes #1883.

Usage

solve <issue-url> --keep-working-until-all-requirements-are-fully-done       # 5 restarts
solve <issue-url> --keep-working-until-all-requirements-are-fully-done 3     # explicit count
solve <issue-url> --keep-working-until-all-requirements-are-fully-done forever  # no limit
# aliases:
solve <issue-url> --keep-working
solve <issue-url> --keep-going unlimited

Limit semantics: bare flag → 5; explicit number → that count;
forever / unlimited / infinite / 0 → no limit (with a hard safety cap of
3 consecutive errors so a broken tool can never spin forever).

How it maps to the issue requirements

Issue requirement	Where
Experimental `--keep-working-until-all-requirements-are-fully-done`	`src/solve.config.lib.mjs` (`[EXPERIMENTAL]`)
Find unfinished/planned/delayed work and auto-restart	`runKeepWorkingUntilDone` in `src/solve.keep-working.lib.mjs`
Inject the verbatim reinforcement prompt in addition to the detected reason	`KEEP_WORKING_PROMPT` + `buildKeepWorkingFeedback`
Use regex / partial parsing for strong deferral indicators	`DEFERRED_WORK_PATTERNS` (14 regexes)
Scan only PR description + AI summary + changed markdown (no token waste)	`collectDeferredWorkSources`
Ignore false positives (bias to keep going)	high-recall patterns, any match restarts
Default 5 restarts; support `forever`/`unlimited`	`normalizeKeepWorkingLimit`
Shorter alias `--keep-going-until-all-requirements-are-fully-done`	yargs aliases
Compile case study to `docs/case-studies/issue-{id}`	`docs/case-studies/issue-1883/`

A full breakdown lives in
docs/case-studies/issue-1883/requirements.md.

Design

Pure, network-free detection in src/solve.keep-working.detect.lib.mjs
(regexes, limit normalization, feedback building) → fully unit-testable without
mocks, mirroring the repo's auto-iteration-limits.lib.mjs idiom.
Orchestration (source collection via gh api + the restart loop) in
src/solve.keep-working.lib.mjs.
Wired into the post-solve flow in src/solve.mjs (via a small shared
applyRestartResult helper that also de-duplicates the existing
restart/finalize cost-merge blocks).
Infinite-loop safety: patterns are anchored on deferral semantics so the
reinforcement prompt does not self-trigger (unit-tested); the prompt and the
feedback block are never scanned; restarts are bounded; and forever mode still
aborts after 3 consecutive tool errors. Each restart disables nested
keep-working to prevent recursion.

Tests

tests/test-keep-working-until-done-1883.mjs — 31 tests, all passing
(detection, self-match avoidance, limit normalization for every CLI variant,
patch extraction, feedback rendering, and end-to-end CLI parsing of the flag and
its aliases).
npm run lint clean; docs-sync tests (test-docs-options-sync,
test-docs-language-sync) pass across CONFIGURATION.md + .ru/.zh/.hi.

How to reproduce the problem this fixes

Run an AI solver on a large issue without the flag: it frequently ships a partial
PR whose description says things like "caching is out of scope for this PR" or
leaves TODOs, and reports the issue done. Because this workflow has no follow-up
PR, that work is lost. With
--keep-working-until-all-requirements-are-fully-done, those phrases are detected
and the AI is restarted to finish them.

Docs

docs/CONFIGURATION.md (+ .ru, .zh, .hi) — new option row.
docs/case-studies/issue-1883/ — deep case study (overview, full requirement
list with solution plans, root-cause analysis, existing-components/prior-art
survey, and the indicator catalogue).
Changeset: minor.

Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: #1883

…ully-done (#1883) Scan PR description, AI solution summary, and changed markdown for deferred-work indicators (out of scope, future work, follow-up PR, deferred, delayed, TODO/TBD, etc.) using ~14 regexes. When found, auto-restart the AI with the detected reasons plus a verbatim reinforcement prompt, until the scan is clean or the restart limit is hit (default 5; explicit count; forever/unlimited/0 -> no limit, with a 3 consecutive-error safety cap). - Pure, network-free detection in src/solve.keep-working.detect.lib.mjs (unit-tested) - Orchestration in src/solve.keep-working.lib.mjs - CLI option + normalization in src/solve.config.lib.mjs, aliases keep-going/keep-working - Wired into post-solve flow via applyRestartResult() in src/solve.mjs - 31 tests in tests/test-keep-working-until-done-1883.mjs - Docs in CONFIGURATION.md (+ru/zh/hi) and case study under docs/case-studies/issue-1883/ - Changeset (minor)

konard · 2026-06-10T10:08:28Z

Working session summary

CI is fully green (conclusion: success, no failed jobs). The work is complete.

Summary

Issue #1883 is implemented and shipped on PR #1884, which is now marked ready for review with all CI checks passing.

What was built

An experimental solve flag — --keep-working-until-all-requirements-are-fully-done (aliases: --keep-going-until-all-requirements-are-fully-done, --keep-working, --keep-going). After the main run (and any --finalize pass), it:

Collects three cheap, token-free sources — PR description, the in-memory AI solution summary, and the added lines of changed markdown — via collectDeferredWorkSources.
Scans them with ~14 regexes (out of scope, future work, follow-up PR, deferred, delayed, TODO/TBD, …) in DEFERRED_WORK_PATTERNS.
Auto-restarts the AI with the concrete detected reasons plus the verbatim reinforcement prompt from the issue, repeating until the scan is clean or the limit is hit.

Limit semantics: bare → 5; explicit count → that number; forever/unlimited/infinite/0 → no limit, with a hard 3-consecutive-error safety cap.

Key files

src/solve.keep-working.detect.lib.mjs — pure, network-free detection/normalization (unit-tested)
src/solve.keep-working.lib.mjs — orchestration (source collection + restart loop)
src/solve.config.lib.mjs — CLI option + value normalization
src/solve.mjs — wired into the post-solve flow (via a shared applyRestartResult helper that also de-duplicated the existing restart/finalize cost-merge blocks, keeping the file at the 1500-line limit)
tests/test-keep-working-until-done-1883.mjs — 31 tests, all passing
docs/CONFIGURATION.md (+ .ru/.zh/.hi) and docs/case-studies/issue-1883/ (5-file deep case study)
.changeset/keep-working-until-done-1883.md (minor)

Verification

31 feature tests pass; docs-sync tests pass; npm run lint clean.
Infinite-loop safety: patterns are anchored so the reinforcement prompt does not self-trigger (unit-tested); the prompt/feedback block are never scanned; restarts are bounded; nested keep-working is disabled to prevent recursion.
Full CI on commit d2adf6b4: success — lint, line-limits, validate-docs, changesets, test-compilation, test-execution, test-suites, memory-check all green.

PR: #1884

This summary was automatically extracted from the AI working session output.

konard · 2026-06-10T10:08:39Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $10.426117

📊 Context and tokens usage:

Claude Opus 4.8: (3 sub-sessions)

109.2K / 1M (11%) input tokens, 34.7K / 128K (27%) output tokens
115.6K / 1M (12%) input tokens, 32.7K / 128K (26%) output tokens
83.3K / 1M (8%) input tokens, 26.4K / 128K (21%) output tokens

Total: (36.3K new + 338.5K cache writes + 10.7M cache reads) input tokens, 108.4K output tokens, $10.379106 cost

Claude Haiku 4.5:

22.6K / 200K (11%) input tokens, 888 / 64K (1%) output tokens

Total: 22.6K input tokens, 888 output tokens, $0.047011 cost

🤖 Models used:

Tool: Anthropic Claude Code
Requested: opus
Main model: Claude Opus 4.8 (claude-opus-4-8)
Additional models:
- Claude Haiku 4.5 (claude-haiku-4-5-20251001)

📎 Log file uploaded as Gist (6720KB)

View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2026-06-10T10:11:00Z

✅ Ready to merge

This pull request is now ready to be merged:

All CI checks have passed
No merge conflicts
No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

This reverts commit f4593f1.

Initial commit with task details

f4593f1

Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: #1883

konard self-assigned this Jun 10, 2026

konard changed the title ~~[WIP] Auto restart/resume on "out of scope", "future work", "deferred", "delayed", planned for other pull requests~~ feat(solve): experimental --keep-working-until-all-requirements-are-fully-done (#1883) Jun 10, 2026

konard marked this pull request as ready for review June 10, 2026 09:56

Revert "Initial commit with task details"

fd4b295

This reverts commit f4593f1.

konard merged commit 7acf733 into main Jun 10, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(solve): experimental --keep-working-until-all-requirements-are-fully-done (#1883)#1884

feat(solve): experimental --keep-working-until-all-requirements-are-fully-done (#1883)#1884
konard merged 3 commits into
mainfrom
issue-1883-1a8c72928617

konard commented Jun 10, 2026 •

edited

Loading

Uh oh!

konard commented Jun 10, 2026

Uh oh!

konard commented Jun 10, 2026

Uh oh!

konard commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

konard commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

How it maps to the issue requirements

Design

Tests

How to reproduce the problem this fixes

Docs

Uh oh!

konard commented Jun 10, 2026

Working session summary

Summary

What was built

Key files

Verification

Uh oh!

konard commented Jun 10, 2026

🤖 Solution Draft Log

💰 Cost: $10.426117

📊 Context and tokens usage:

🤖 Models used:

📎 Log file uploaded as Gist (6720KB)

Uh oh!

konard commented Jun 10, 2026

✅ Ready to merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

konard commented Jun 10, 2026 •

edited

Loading