Skip to content

Latest commit

 

History

History
767 lines (488 loc) · 37.4 KB

File metadata and controls

767 lines (488 loc) · 37.4 KB

Changelog

0.4.0 (2026-05-05)

Compare the full difference.

Fixes

  • Fix ResourceWarning: close TaskQueue and sqlite3 connections properly. 02ae42f

    Wrap TaskQueue in a with-block in _run_start so the connection is closed on all exit paths, including sys.exit() from container startup errors.

    Replace five with sqlite3.connect(...) as conn: patterns in test_executor.py with explicit open/close — the with-form only manages transactions, leaving connections open until GC.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

New

  • Add heartbeat thread to _process_task in reference agent. 9499c19

    Task 6 from pr-21-fixes.md:

    • _process_task now starts a daemon threading.Thread that calls client.heartbeat(task.task_id) every _HEARTBEAT_INTERVAL (25 s) while triage runs so the harness does not re-queue the task mid-flight
    • threading.Event stops the heartbeat thread in a finally block after triage returns or raises
    • import threading added; _HEARTBEAT_INTERVAL module constant added

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Add task_id identity callout to complete_task docs. 596e38d

    Warns readers that task_id and decision.task_id must match; a silent mismatch causes the drain loop to miss the result.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Add configurable timeout parameter to ForemanClient. 05d81cc

    Exposes timeout: float = 5.0 on ForemanClient.__init__ and forwards it to httpx.Client, so callers can tune per-deployment latency requirements without monkey-patching the transport.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Add integration test for agent restart resilience (Task 17, Phase 6). 1549117

    Implements the MVP acceptance criterion: zero task loss under a simulated agent restart. The test uses a minimal in-process harness (real TaskQueue, real MemoryStore) and exercises the actual ForemanClient + agent startup- poll code path without live network sockets.

    Also adds --run-integration pytest flag and integration marker so the test is skipped in CI by default.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Add write-an-agent how-to guide (Task 16, Phase 6). 789dbc2

    Documents the foreman-client SDK for agent authors: install, ForemanClient constructor args, next_task/complete_task/heartbeat methods, claim timeout, heartbeat cadence, idempotency contract, and a ≤30-line minimal example.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Add initial .superset/config.json and .memsearch/memory/ tooling artifacts. e59100e

    • Introduce .superset/config.json with an empty setup, teardown, and run configuration.
    • Add .memsearch/memory/2026-04-26.md for session logging and transcript retention.
  • Address Phase 3 code review: fix resource leak, export types, clean up tests. 38c72c0

    • Add close(), enter, exit to ForemanClient to prevent httpx connection pool leak
    • Export LLMBackendRef and TaskContext from foremanclient package init
    • Move import json to module level in test_client.py; remove misleading call-ordering comment
    • Add TestForemanClientLifecycle tests for close() and context manager behaviour
    • Mark Phase 3 plan tasks and checkpoint complete; add phase-3-review.md

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Add threading lock to TaskQueue for improved concurrency safety. a3633be

    Refactored claim_next to use threading lock in conjunction with BEGIN IMMEDIATE for same-process thread serialization. Updated related tests and improved cleanup with explicit resource management using close().

  • Add QueueConfig to config.py (Task 1). f3a548d

    Extends ForemanConfig with a new QueueConfig model matching the queue-mediated agent protocol spec. Adds corresponding tests and documents the new section in config.example.yaml.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

Other

  • Resolve high-priority issues from phase-3 review:. f01758a

    • Wrap _drain_loop and _requeue_loop bodies in exception handlers to ensure background loops do not terminate on errors.
    • Split drain_completed into a new mark_done method for per-task completion after successful execution.
    • Update startup poll to drain all queued tasks on agent boot.
    • Add heartbeat thread to _process_task to prevent requeue during long-running LLM calls.
    • Publicize Dispatcher.executor to remove private attribute access between modules.
  • Mark all pr-21-fixes.md acceptance criteria complete. 0a36267

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Publicize Dispatcher.executor (remove private-attribute cross-module access). 74cf0e0

    Task 5 from pr-21-fixes.md:

    • Rename Dispatcher._executor → Dispatcher.executor (public attribute)
    • main.py updated to use dispatcher.executor instead of dispatcher._executor

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Drain all queued tasks on agent startup (loop until empty). 1a9bf71

    Task 3 from pr-21-fixes.md:

    • _lifespan startup poll now loops calling next_task() until it returns None, processing each task before moving to the next; previously only one task was claimed, leaving N-1 accumulated tasks permanently stuck
    • New test: startup poll with 3 queued tasks drains all 3 before yield

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Wrap _requeue_loop body in exception handler. 0712ad8

    Task 4 from pr-21-fixes.md:

    • requeue_stale() + fail_exhausted() wrapped in try/except Exception so one bad cycle does not kill the requeue loop permanently
    • _lifespan finally uses suppress(CancelledError, Exception) for requeue_task

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Split drain_completed/add mark_done; wrap _drain_loop in exception handlers. 4cf9097

    Tasks 2+1 from pr-21-fixes.md:

    • drain_completed() no longer marks rows done; rows stay 'completed'
    • New mark_done(task_id) transitions completed→done after successful execute
    • _drain_loop wraps drain_completed() in outer try/except (loop never dies)
    • _drain_loop wraps per-task execute+memory+mark_done in inner try/except (one bad task does not abort others in the same batch)
    • _lifespan finally uses suppress(CancelledError, Exception) for drain_task

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • [pre-commit.ci] pre-commit autoupdate. 16cc083

    updates: - github.com/astral-sh/ruff-pre-commit: v0.15.11 → v0.15.12

  • Mark verification steps complete for Phase 6 tasks in plan. f10729f

  • Mark Phase 6 Task 17 complete in plan. 525e797

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Mark Phase 5 tasks complete in plan. 1156699

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Implement Phase 5: update issue-triage agent to use ForemanClient. 4ce2bec

    POST /task now returns 202 immediately and fires a background task that claims the pending task via ForemanClient.next_task(), runs triage, and reports back via complete_task(). Lifespan startup poll picks up any tasks queued while the agent was down.

    Inline protocol models removed; foremanclient.models is the single source of truth for TaskMessage / DecisionMessage across agent and tests.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Convert TaskQueue tests to use context manager and update installed packages. a0ba8dd

  • Implement Phase 4 Task 12: add --queue-db CLI arg and wire TaskQueue. 170d707

    Add --queue-db argument to the start subcommand so users can override the queue database path without changing config. Priority: --queue-db > config db_path > ~/.agent-harness/queue.db default. Update plan.md to mark Tasks 11, 12, 13 and Phase 4 checkpoint complete.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Implement Phase 4 Task 11: drain and requeue background loops in lifespan. 3ba1ceb

    Add two background asyncio tasks started in a FastAPI lifespan context manager:

    • _drain_loop: wakes on drain_event or drain_interval_seconds; calls TaskQueue.drain_completed(), executor.execute(), and memory.upsert_memory_summary() for each completed task.
    • _requeue_loop: runs every requeue_interval_seconds; calls requeue_stale() and fail_exhausted(max_retries=config.queue.max_retries).

    Both tasks cancel cleanly on shutdown. The lifespan also initialises app.state.drain_event so /harness/result and /queue/complete can signal it. main.py wires app.state.executor, .memory, and .config for the lifespan.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Implement Phase 4 Task 10: refactor Dispatcher to enqueue + nudge. 1e2283e

    Replace synchronous POST→parse dispatch with durable enqueue:

    • Dispatcher.dispatch() now enqueues the TaskMessage in TaskQueue and sends a fire-and-forget nudge ({"task_id": ...}) to the agent endpoint.
    • DecisionMessage parsing and executor.execute() are removed from dispatch(); those belong to the drain loop (Task 11).
    • Dispatcher.init gains a required task_queue: TaskQueue parameter.
    • main.py creates TaskQueue from config.queue and passes it to Dispatcher.
    • Integration and server tests updated to reflect new enqueue-based protocol.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Implement Phase 3: foreman-client package with ForemanClient. adffcef

    Creates the standalone foreman-client/ package that agent authors install to communicate with the harness queue. Exposes next_task(), complete_task(), and heartbeat() over synchronous httpx, with structlog events and ForemanClientError on non-2xx responses. 100% line and branch coverage via respx HTTP mocks.

    Also excludes foreman-client/ and agents/ from root pytest collection, and excludes foreman-client/ from the root mypy pre-commit hook to prevent duplicate module name conflicts.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Implement Phase 2: queue HTTP endpoints and harness result nudge. 89316f3

    • foreman/routers/queue.py: POST /queue/next (claim task or 204), POST /queue/complete (store decision + signal drain), POST /queue/heartbeat
    • foreman/routers/result.py: POST /harness/result (drain-loop nudge)
    • server.py: register both new routers on the FastAPI app
    • tests/test_queue_router.py, tests/test_result_router.py: HTTP contract tests using FastAPI TestClient with dependency_overrides (no SQLite in router tests)
    • pyproject.toml: per-file-ignores for FastAPI router B008/TC001/TC003 patterns

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Implement TaskQueue and tests (Tasks 2 & 3). 73cf3cb

    SQLite-backed task queue with enqueue, claim_next (concurrency-safe via BEGIN IMMEDIATE), complete, heartbeat, drain_completed, requeue_stale, and fail_exhausted. 21 tests cover all methods including concurrent claim.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

Updates

  • Update minimal example and Startup Poll docs to use drain loop lifespan. a7f2905

    Task 7 from pr-21-fixes.md:

    • Minimal example now uses @asynccontextmanager lifespan: creates ForemanClient, drains queued tasks via while-loop, yields, closes client
    • FastAPI(lifespan=lifespan) used instead of bare FastAPI()
    • Startup Poll section updated from single next_task() call to the correct loop-until-None pattern with an explanation of why a single call is wrong

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Remove obsolete "How Tos" index and fix installation link in write-an-agent guide. 23a836e

  • Update messaging protocol design spec to propose queue-mediated agent architecture. 99b02c8

    Adds detailed problem statement, design rationale, MVP scope, key assumptions, and open questions for implementing a robust task queue backed by SQLite. Documents at-least-once delivery, claim/requeue logic, and API adjustments. Addresses gaps in current synchronous dispatch handling.

0.3.0 (2026-05-01)

Compare the full difference.

New

  • Add design system assets, CSS variables, and comprehensive API reference structure. 38cfce0

  • Add CHANGELOG.md to excluded files in linter configuration. a3fa809

Other

  • Restructure and update design specs; add messaging update proposal and index file. f8027a5

Updates

  • Remove outdated tutorials and API docs; add home page layout, visual assets, and updated CSS. 8b2a2fc

0.2.5 (2026-04-22)

Compare the full difference.

New

  • Add reference documentation for agent protocol, CLI commands, and configuration schema. b35c600

  • Add rumdl linting support, update README link, and configure pre-commit hooks. 68c7d76

Other

  • Reformat several Markdown files. b97ec91

  • Mark Phase 5 and Final Checkpoint tasks as complete in todo.md. 9149c15

  • Task 17: mark Phase 7 tasks complete; final coverage at 96%. f8f6d35

    config.example.yaml already matches full schema and loads cleanly. CHANGELOG.md already maintained by bump-my-version toolchain. 214 tests passing, 96% line coverage (target ≥85%), pre-commit clean.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Task 16: End-to-end integration test for full issue triage pipeline. 440ecec

    Covers the complete path: poller event → router → dispatcher → executor → memory (real SQLite DB). Mocks are limited to PyGithub and httpx boundaries.

    Six tests across two classes:

    • TestFullTriagePipeline: label+comment applied, memory updated, action logged before GitHub call, prior summary injected, close_issue blocked when allow_close=False
    • TestPollerFeedsDispatcher: poller.poll_all callback routes and dispatches a polled issue end-to-end

    214 tests passing.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • [pre-commit.ci] pre-commit autoupdate. 068ab20

    updates: - github.com/astral-sh/ruff-pre-commit: v0.15.10 → v0.15.11

Updates

  • Remove redundant sections from CONTRIBUTING.md and fix Code of Conduct link. 5c891a5

  • Remove outdated agent-harness spec, update CLAUDE.md with spec-driven development process. bc252ba

0.2.4 (2026-04-20)

Compare the full difference.

Other

  • Wire ContainerManager and agent lifecycle into foreman start. Update agent paths, config, tests, and Dockerfile to align with refactored issue-triage structure. Mark Phase 6 tasks as complete. 7e7846d

  • Use SecretStr for sensitive fields in configuration and GitHubPoller, removing custom masking logic. Update tests accordingly. d2e437a

  • Task 15: Triage logic and prompt (prompts/triage.py). 6518095

    • build_prompt: formats issue title/body/author/labels + memory_summary
    • parse_llm_response: extracts JSON from prose, validates decision type, applies allow_close guard, defaults to skip on parse failure
    • _call_llm: LiteLLM wrapper (provider/model from task context)
    • run_triage: duplicate-comment guard (memory keyword check) before LLM call
    • 18 triage tests + full suite at 195 passing

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Task 14: Agent HTTP server scaffold + Dockerfile. 60778eb

    • FastAPI app with POST /task (DecisionMessage) and GET /health (200 ok)
    • Self-contained protocol models (TaskMessage, DecisionMessage, ActionItem)
    • triage() delegates to prompts/triage.run_triage() — stub for Task 15
    • Dockerfile installs deps and runs uvicorn on port 8000
    • agents/issue-triage/pyproject.toml with runtime deps
    • 7 agent server tests; full suite at 177 passing

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Task 13: Container lifecycle manager (foreman/containers.py). 7e7c407

    • ContainerManager pulls images on demand, starts containers, waits for /health
    • stop_all() stops all managed containers; safe to call multiple times
    • handle_container_exit() logs error and restarts once; marks failed on second exit
    • ContainerError raised when Docker socket is unavailable at init
    • 14 tests covering all acceptance criteria; full suite at 170 passing

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Set environment to github-pages for publish-docs workflow. e2f100f

0.2.3 (2026-04-19)

Compare the full difference.

New

  • Add .api-env to .gitignore. ff63ae3

    Prevents accidental commit of local env file containing GitHub token and API keys.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Add initial README with project description, features, requirements, and setup instructions. 3a9e9ba

Other

  • Phase 5 — Harness Core + polling error visibility. 0a3c781

    Implements router, server dispatch loop, and main entrypoint (Tasks 10–12). Fixes two bugs found during integration testing:

    • SQLite connection used across threads now opens with check_same_thread=False
    • Poller task was created but never awaited; fixed by running concurrently in _run_loop

    Also fixes silent failure on GitHub API errors: non-rate-limit exceptions (including 401 bad credentials) are now logged immediately at critical/error level instead of being swallowed until process shutdown. Done callback on the poller task surfaces any unexpected crash in real time.

    156 tests passing, all pre-commit hooks green.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

Updates

  • Update license in README to MIT. 64a1e71

  • Update dependency versions in uv.lock file, including FastAPI (0.136.0), FastAPI Cloud CLI (0.17.0), FileLock (3.28.0), HuggingFace Hub (1.11.0), Identify (2.6.19), MkDocStrings (1.0.4), Packaging (26.1), and Virtualenv (21.2.4). e0bf184

0.2.2 (2026-04-18)

Compare the full difference.

Other


updated-dependencies: - dependency-name: litellm dependency-version: 1.83.9 dependency-type: direct: production update-type: version-update:semver-patch dependency-group: uv

signed-off-by: dependabot[bot] support@github.com

0.2.1 (2026-04-18)

Compare the full difference.

Other

  • Use TYPE_CHECKING for imports in test files and update Phase 4 todo items. 6043d54

  • Phase 4: implement GitHub executor and poller (Tasks 8 & 9). 9efa175

    executor.py:

    • GitHubExecutor.execute() logs decision to action_log BEFORE any GitHub API call
    • Handles add_label, comment, close_issue (with allow_close guard)
    • Raises UnknownActionError for unrecognized action types

    poller.py:

    • GitHubPoller.poll_repo() fetches issues since last_polled, skips collaborator issues
    • poll_all() runs repos concurrently via asyncio + semaphore (default max 5)
    • Exponential backoff on 403/429; other GithubExceptions propagate
    • Continuous run() loop at configurable interval

    memory.py:

    • Add poll_state table with get_last_polled() / set_last_polled() methods
    • Timestamps stored as ISO-8601 strings, returned as timezone-aware datetime

    39 new tests; 125 total passing.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

Updates

  • Remove draft flag from release creation script. 131ea10

0.2.0 (2026-04-18)

Compare the full difference.

Fixes

  • Fix unclosed DB connection warnings in test_memory.py. 982f6ec

    Switch store fixtures to yield+context-manager so the connection is closed after each test, and remove manual store.close() calls that were no longer needed with WAL mode + committed writes.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

New

  • Add docstrings for clarity in LLM backend tests, remove unused imports, and update CLAUDE.md with test-writing guidance. 0b73671

Other


updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct: production update-type: version-update:semver-major dependency-group: github-actions

signed-off-by: dependabot[bot] support@github.com

  • Phase 3 Tasks 6-7: implement LLM backend abstraction. 02733dc

    • LLMBackend ABC with complete() method and from_config() factory in base.py
    • AnthropicBackend and OllamaBackend wrapping LiteLLM
    • Recorded fixture files for both backends (no live LLM calls in tests)
    • 16 new tests across test_llm_base.py and test_llm_backends.py

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Refine type annotations and optimize imports in protocol and memory tests. 12a6bd8

  • Phase 2 human review approved. 3846ea8

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Mark Phase 2 tasks complete in todo.md. 78318a0

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Phase 2 Task 5: implement SQLite memory store. 6b39f0b

    Add MemoryStore with action_log and memory_summary tables (WAL mode) . log_action(), get_memory_summary(), upsert_memory_summary() covered by 13 tests using real temp-file DBs — no mocks.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Phase 2 Task 4: implement agent protocol Pydantic models. 829f47f

    Add TaskMessage, DecisionMessage, ActionItem, LLMBackendRef, TaskContext, and DecisionType to foreman/protocol.py with 22 tests.

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

  • Phase 1: scaffold, config system, and credential injection. 9f21485

    • pyproject.toml: add runtime deps (PyYAML, PyGithub, litellm, httpx, docker), uncomment [project.scripts] entry pointing to foreman.main:main
    • Add stub modules for all planned foreman/ submodules and llm/ package
    • Add agents/issue-triage/ scaffolding (Dockerfile placeholder, prompts/)
    • Implement foreman/config.py: YAML loader with ${VAR} env resolution, Pydantic validation, ConfigError, secret-masking repr for tokens/keys
    • Implement foreman/credentials.py: resolve_env_refs(), get_github_token(), CredentialError (variable name only — no secrets in error messages)
    • Add config.example.yaml matching the full schema from spec §5
    • Add types-PyYAML to mypy pre-commit additional_dependencies
    • 35 tests pass; coverage >85% on new modules

    co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com

Updates


updated-dependencies: - dependency-name: httpx dependency-version: 0.28.1 dependency-type: direct:production

signed-off-by: dependabot[bot] support@github.com


updated-dependencies: - dependency-name: pydantic-settings dependency-version: 2.13.1 dependency-type: direct: production

signed-off-by: dependabot[bot] support@github.com


updated-dependencies: - dependency-name: opentelemetry-api dependency-version: 1.41.0 dependency-type: direct: production

signed-off-by: dependabot[bot] support@github.com


updated-dependencies: - dependency-name: docker dependency-version: 7.1.0 dependency-type: direct:production

signed-off-by: dependabot[bot] support@github.com


updated-dependencies: - dependency-name: structlog dependency-version: 25.5.0 dependency-type: direct:production

signed-off-by: dependabot[bot] support@github.com

  • Update HealthCheckModel dependencies type annotation for clarity. a0ad023

  • Remove outdated test, add CLAUDE.md for developer guidance, and update scaffolding notes. dae2a06

0.1.0 (2026-04-14)

Other