Skip to content

Add additional support for autonomous ML experiments#521

Open
Saba9 wants to merge 25 commits into
mainfrom
saba/auto-ml
Open

Add additional support for autonomous ML experiments#521
Saba9 wants to merge 25 commits into
mainfrom
saba/auto-ml

Conversation

@Saba9
Copy link
Copy Markdown
Collaborator

@Saba9 Saba9 commented Apr 20, 2026

Short description

Apologies for the large PR. About half the additions are from tests and docs.

This PR adds features designed for use in AI-driven training loops where an agent or automated script needs to monitor metric health, compare runs, and query results
programmatically without human supervision.

Metric watchers (trackio.watch() / trackio.should_stop()): Register rules upfront and have Trackio fire alerts automatically on every trackio.log() call. Supported
conditions:

  • NaN/Inf detection, max/min threshold breaches
  • Spike detection against a moving average
  • Stagnation — no improvement for N steps
  • Custom conditions via an fn callback: trackio.watch("loss", fn=my_check)

Any condition that warrants stopping sets trackio.should_stop() to True, which training loops can poll to exit early. Custom conditions can signal early stopping by including "stop": True in a returned alert dict.

Run status tracking: Runs now record a lifecycle status (runningfinished / failed) in SQLite. Status is set to running on init(), finished on finish(), and failed on unexpected process exit. The best, compare, and summary commands default to finished runs only; --include-all overrides this.

New CLI commands:

  • trackio best --project X --metric Y [--direction min|max] [--mode last|min|max] — find the best run for a metric
  • trackio compare --project X [--runs ...] [--metrics ...] — side-by-side run comparison
  • trackio summary --project X [--runs ...] — per-run metric summary table

Python API additions on Run: status, final_metrics, metrics(), history().

AlertReason constants: Every watcher-generated alert includes a data["reason"] field matching one of the trackio.AlertReason constants, allowing agents to identify alert types programmatically. Custom conditions that return True use AlertReason.CUSTOM.

AI Disclosure

  • I used AI to write everything. I then manually reviewed and revised the code.

Type of Change

  • New feature (non-breaking)
  • Documentation update

Related Issues

Closes:

Testing and linting

python -m pytest

ruff check --fix --select I && ruff format

Saba9 and others added 4 commits April 20, 2026 10:45
…ers, run status, structured alerts

Adds agent-facing query APIs and tooling to make Trackio the definitive tracker
for autonomous ML experiments driven by AI coding agents.

New CLI commands:
- `trackio best` — rank runs by metric, return winner + leaderboard
- `trackio compare` — side-by-side run comparison across metrics
- `trackio summary` — full experiment overview with status/configs/metrics

New features:
- Run status tracking (running/finished/failed) with automatic lifecycle mgmt
- Structured alert data (`data={}` param on `trackio.alert()`)
- Metric watchers (`trackio.watch()`) for auto NaN/spike/stagnation detection
- `trackio.should_stop()` for training loop early stopping
- Python API: `run.metrics()`, `run.history()`, `run.summary`, `run.status`

Test infrastructure:
- Synthetic training simulator (no ML deps, runs in seconds)
- Agent test runner with 5 experiment types

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Resolve merge conflicts integrating autonomous ML features with main branch
changes (run_id support, RemoteClient, query command, OAuth, error handling).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@gradio-pr-bot
Copy link
Copy Markdown
Contributor

gradio-pr-bot commented Apr 20, 2026

🪼 branch checks and previews

Name Status URL
🦄 Changes detected! Details

@gradio-pr-bot
Copy link
Copy Markdown
Contributor

gradio-pr-bot commented Apr 20, 2026

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
trackio minor

  • Add additional support for autonomous ML experiments

    • trackio.watch() / trackio.should_stop(): register metric watchers (NaN/Inf, threshold, spike, stagnation, custom fn) that fire alerts automatically on every trackio.log() call
  • AlertReason constants for programmatic alert filtering

  • Run lifecycle status tracking (runningfinished / failed) persisted in SQLite

  • New CLI commands: trackio best, trackio compare, trackio summary

  • Run.status, Run.final_metrics, Run.metrics(), Run.history() on the Python API

  • alerts.data column (SQL migration) for structured alert metadata


‼️ Changeset not approved. Ensure the version bump is appropriate for all packages before approving.

  • Maintainers can approve the changeset by checking this checkbox.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

HuggingFaceDocBuilderDev commented Apr 20, 2026

🪼 branch checks and previews

Name Status URL
Spaces ready! Spaces preview

Install Trackio from this PR (includes built frontend)

pip install "https://huggingface.co/buckets/trackio/trackio-wheels/resolve/e58200b25c87922e98bbc1241009e5f2c26848eb/trackio-0.25.1-py3-none-any.whl"

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Saba9 and others added 2 commits April 20, 2026 11:50
Fixes:
- Fix 1: Run status "failed" no longer overwritten by "finished" —
  finish() now accepts a status parameter, _cleanup_current_run passes
  status="failed" directly
- Fix 2: Watcher patience supports both min and max mode via new
  mode parameter (was hardcoded to minimization only)
- Fix 3: Replace dead --minimize/--maximize flags with --direction
  {min,max} on trackio best
- Fix 5: Watcher _values now uses deque(maxlen=window) to bound memory;
  alert dedup via per-condition flags that reset on return to normal
- Fix 6: Warn if watchers exist when init() clears them; docstring
  documents ordering requirement
- Fix 7: Drop Api.Run.summary cache — recompute on each access
- Fix 9: set_run_status/get_run_status now accept and use run_id,
  INSERT handles run_id NOT NULL column in new schema
- Fix 12: Remove unused enumerate variable in agent_runner

Tests:
- test_watchers.py: 18 tests covering nan, spike, max/min threshold,
  dedup, patience min/max mode, window bounds, manager propagation
- test_run_status.py: 6 tests covering running→finished, failed status,
  idempotent finish, Api.Run.status, multi-run status
- test_cli_agent_commands.py: 10 tests covering best/compare/summary
  in JSON and human-readable modes, error cases

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@Saba9 Saba9 changed the title Add autonomous ML experiment support Add additional support for autonomous ML experiments Apr 27, 2026
gradio-pr-bot and others added 14 commits April 27, 2026 16:52
…ests

- Watchers: 18 → 9 tests by merging nan+inf, combining dedup with
  trigger tests, merging patience modes
- Run status: 6 → 4 tests by dropping idempotent (covered by overwrite)
- CLI: 10 → 4 tests by seeding project once via module fixture,
  dropping human-readable format tests, merging maximize/subset into
  main tests

Total: 34 → 17 tests, 494 → 336 lines, same code path coverage.
CLI tests 3x faster (7.5s vs 25s) from shared fixture.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@Saba9 Saba9 marked this pull request as ready for review May 4, 2026 23:45
@Saba9 Saba9 requested review from abidlabs, qgallouedec and znation May 4, 2026 23:45
@znation
Copy link
Copy Markdown
Collaborator

znation commented May 5, 2026

I had Claude analyze it and write up some actionable feedback. At first I was going to go through and leave comments for each item in the source code, but I realized it's probably more efficient to just include the Markdown file here.
autonomous-ml-trackio-pr-review.md

Note that Claude considers two issues blockers:

B1. Run-status writes hit the local SQLite DB unconditionally — even for Space / self-hosted runs
B2. Remote-mode `Run.alert(..., data=...)` silently drops the `data` field

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants