Skip to content

feat(solve): experimental --keep-working-until-all-requirements-are-fully-done (#1883)#1884

Merged
konard merged 3 commits into
mainfrom
issue-1883-1a8c72928617
Jun 10, 2026
Merged

feat(solve): experimental --keep-working-until-all-requirements-are-fully-done (#1883)#1884
konard merged 3 commits into
mainfrom
issue-1883-1a8c72928617

Conversation

@konard

@konard konard commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements the experimental solve option requested in #1883:
--keep-working-until-all-requirements-are-fully-done.

After the main run (and any --finalize pass), the feature scans three cheap,
token-free sources — the PR description, the AI solution summary, and the
added lines of changed markdown documents — for strong indicators of deferred
work ("out of scope", "future work", "follow-up PR", "deferred", "delayed",
"TODO"/"TBD", etc.) using ~14 regular expressions. When indicators are found it
auto-restarts the AI tool with the concrete detected reasons plus the
verbatim reinforcement prompt
from the issue, and repeats until the scan is
clean or the restart limit is reached.

Closes #1883.

Usage

solve <issue-url> --keep-working-until-all-requirements-are-fully-done       # 5 restarts
solve <issue-url> --keep-working-until-all-requirements-are-fully-done 3     # explicit count
solve <issue-url> --keep-working-until-all-requirements-are-fully-done forever  # no limit
# aliases:
solve <issue-url> --keep-working
solve <issue-url> --keep-going unlimited

Limit semantics: bare flag → 5; explicit number → that count;
forever / unlimited / infinite / 0 → no limit (with a hard safety cap of
3 consecutive errors so a broken tool can never spin forever).

How it maps to the issue requirements

Issue requirement Where
Experimental --keep-working-until-all-requirements-are-fully-done src/solve.config.lib.mjs ([EXPERIMENTAL])
Find unfinished/planned/delayed work and auto-restart runKeepWorkingUntilDone in src/solve.keep-working.lib.mjs
Inject the verbatim reinforcement prompt in addition to the detected reason KEEP_WORKING_PROMPT + buildKeepWorkingFeedback
Use regex / partial parsing for strong deferral indicators DEFERRED_WORK_PATTERNS (14 regexes)
Scan only PR description + AI summary + changed markdown (no token waste) collectDeferredWorkSources
Ignore false positives (bias to keep going) high-recall patterns, any match restarts
Default 5 restarts; support forever/unlimited normalizeKeepWorkingLimit
Shorter alias --keep-going-until-all-requirements-are-fully-done yargs aliases
Compile case study to docs/case-studies/issue-{id} docs/case-studies/issue-1883/

A full breakdown lives in
docs/case-studies/issue-1883/requirements.md.

Design

  • Pure, network-free detection in src/solve.keep-working.detect.lib.mjs
    (regexes, limit normalization, feedback building) → fully unit-testable without
    mocks, mirroring the repo's auto-iteration-limits.lib.mjs idiom.
  • Orchestration (source collection via gh api + the restart loop) in
    src/solve.keep-working.lib.mjs.
  • Wired into the post-solve flow in src/solve.mjs (via a small shared
    applyRestartResult helper that also de-duplicates the existing
    restart/finalize cost-merge blocks).
  • Infinite-loop safety: patterns are anchored on deferral semantics so the
    reinforcement prompt does not self-trigger (unit-tested); the prompt and the
    feedback block are never scanned; restarts are bounded; and forever mode still
    aborts after 3 consecutive tool errors. Each restart disables nested
    keep-working to prevent recursion.

Tests

  • tests/test-keep-working-until-done-1883.mjs31 tests, all passing
    (detection, self-match avoidance, limit normalization for every CLI variant,
    patch extraction, feedback rendering, and end-to-end CLI parsing of the flag and
    its aliases).
  • npm run lint clean; docs-sync tests (test-docs-options-sync,
    test-docs-language-sync) pass across CONFIGURATION.md + .ru/.zh/.hi.

How to reproduce the problem this fixes

Run an AI solver on a large issue without the flag: it frequently ships a partial
PR whose description says things like "caching is out of scope for this PR" or
leaves TODOs, and reports the issue done. Because this workflow has no follow-up
PR, that work is lost. With
--keep-working-until-all-requirements-are-fully-done, those phrases are detected
and the AI is restarted to finish them.

Docs

  • docs/CONFIGURATION.md (+ .ru, .zh, .hi) — new option row.
  • docs/case-studies/issue-1883/ — deep case study (overview, full requirement
    list with solution plans, root-cause analysis, existing-components/prior-art
    survey, and the indicator catalogue).
  • Changeset: minor.

Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: #1883
@konard konard self-assigned this Jun 10, 2026
…ully-done (#1883)

Scan PR description, AI solution summary, and changed markdown for deferred-work
indicators (out of scope, future work, follow-up PR, deferred, delayed, TODO/TBD,
etc.) using ~14 regexes. When found, auto-restart the AI with the detected reasons
plus a verbatim reinforcement prompt, until the scan is clean or the restart limit
is hit (default 5; explicit count; forever/unlimited/0 -> no limit, with a 3
consecutive-error safety cap).

- Pure, network-free detection in src/solve.keep-working.detect.lib.mjs (unit-tested)
- Orchestration in src/solve.keep-working.lib.mjs
- CLI option + normalization in src/solve.config.lib.mjs, aliases keep-going/keep-working
- Wired into post-solve flow via applyRestartResult() in src/solve.mjs
- 31 tests in tests/test-keep-working-until-done-1883.mjs
- Docs in CONFIGURATION.md (+ru/zh/hi) and case study under docs/case-studies/issue-1883/
- Changeset (minor)
@konard konard changed the title [WIP] Auto restart/resume on "out of scope", "future work", "deferred", "delayed", planned for other pull requests feat(solve): experimental --keep-working-until-all-requirements-are-fully-done (#1883) Jun 10, 2026
@konard konard marked this pull request as ready for review June 10, 2026 09:56
@konard

konard commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Working session summary

CI is fully green (conclusion: success, no failed jobs). The work is complete.

Summary

Issue #1883 is implemented and shipped on PR #1884, which is now marked ready for review with all CI checks passing.

What was built

An experimental solve flag — --keep-working-until-all-requirements-are-fully-done (aliases: --keep-going-until-all-requirements-are-fully-done, --keep-working, --keep-going). After the main run (and any --finalize pass), it:

  1. Collects three cheap, token-free sources — PR description, the in-memory AI solution summary, and the added lines of changed markdown — via collectDeferredWorkSources.
  2. Scans them with ~14 regexes (out of scope, future work, follow-up PR, deferred, delayed, TODO/TBD, …) in DEFERRED_WORK_PATTERNS.
  3. Auto-restarts the AI with the concrete detected reasons plus the verbatim reinforcement prompt from the issue, repeating until the scan is clean or the limit is hit.

Limit semantics: bare → 5; explicit count → that number; forever/unlimited/infinite/0 → no limit, with a hard 3-consecutive-error safety cap.

Key files

  • src/solve.keep-working.detect.lib.mjs — pure, network-free detection/normalization (unit-tested)
  • src/solve.keep-working.lib.mjs — orchestration (source collection + restart loop)
  • src/solve.config.lib.mjs — CLI option + value normalization
  • src/solve.mjs — wired into the post-solve flow (via a shared applyRestartResult helper that also de-duplicated the existing restart/finalize cost-merge blocks, keeping the file at the 1500-line limit)
  • tests/test-keep-working-until-done-1883.mjs — 31 tests, all passing
  • docs/CONFIGURATION.md (+ .ru/.zh/.hi) and docs/case-studies/issue-1883/ (5-file deep case study)
  • .changeset/keep-working-until-done-1883.md (minor)

Verification

  • 31 feature tests pass; docs-sync tests pass; npm run lint clean.
  • Infinite-loop safety: patterns are anchored so the reinforcement prompt does not self-trigger (unit-tested); the prompt/feedback block are never scanned; restarts are bounded; nested keep-working is disabled to prevent recursion.
  • Full CI on commit d2adf6b4: success — lint, line-limits, validate-docs, changesets, test-compilation, test-execution, test-suites, memory-check all green.

PR: #1884


This summary was automatically extracted from the AI working session output.

@konard

konard commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $10.426117

📊 Context and tokens usage:

Claude Opus 4.8: (3 sub-sessions)

  1. 109.2K / 1M (11%) input tokens, 34.7K / 128K (27%) output tokens
  2. 115.6K / 1M (12%) input tokens, 32.7K / 128K (26%) output tokens
  3. 83.3K / 1M (8%) input tokens, 26.4K / 128K (21%) output tokens

Total: (36.3K new + 338.5K cache writes + 10.7M cache reads) input tokens, 108.4K output tokens, $10.379106 cost

Claude Haiku 4.5:

  • 22.6K / 200K (11%) input tokens, 888 / 64K (1%) output tokens

Total: 22.6K input tokens, 888 output tokens, $0.047011 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Main model: Claude Opus 4.8 (claude-opus-4-8)
  • Additional models:
    • Claude Haiku 4.5 (claude-haiku-4-5-20251001)

📎 Log file uploaded as Gist (6720KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard konard merged commit 7acf733 into main Jun 10, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto restart/resume on "out of scope", "future work", "deferred", "delayed", planned for other pull requests

1 participant