Add additional support for autonomous ML experiments by Saba9 · Pull Request #521 · gradio-app/trackio

Saba9 · 2026-04-20T17:47:56Z

Short description

Apologies for the large PR. About half the additions are from tests and docs.

This PR adds features designed for use in AI-driven training loops where an agent or automated script needs to monitor metric health, compare runs, and query results
programmatically without human supervision.

Metric watchers (trackio.watch() / trackio.should_stop()): Register rules upfront and have Trackio fire alerts automatically on every trackio.log() call. Supported
conditions:

NaN/Inf detection, max/min threshold breaches
Spike detection against a moving average
Stagnation — no improvement for N steps
Custom conditions via an fn callback: trackio.watch("loss", fn=my_check)

Any condition that warrants stopping sets trackio.should_stop() to True, which training loops can poll to exit early. Custom conditions can signal early stopping by including "stop": True in a returned alert dict.

Run status tracking: Runs now record a lifecycle status (running → finished / failed) in SQLite. Status is set to running on init(), finished on finish(), and failed on unexpected process exit. The best, compare, and summary commands default to finished runs only; --include-all overrides this.

New CLI commands:

trackio best --project X --metric Y [--direction min|max] [--mode last|min|max] — find the best run for a metric
trackio compare --project X [--runs ...] [--metrics ...] — side-by-side run comparison
trackio summary --project X [--runs ...] — per-run metric summary table

Python API additions on Run: status, final_metrics, metrics(), history().

AlertReason constants: Every watcher-generated alert includes a data["reason"] field matching one of the trackio.AlertReason constants, allowing agents to identify alert types programmatically. Custom conditions that return True use AlertReason.CUSTOM.

AI Disclosure

I used AI to write everything. I then manually reviewed and revised the code.

Type of Change

New feature (non-breaking)
Documentation update

Related Issues

Closes:

Testing and linting

python -m pytest

ruff check --fix --select I && ruff format

…ers, run status, structured alerts Adds agent-facing query APIs and tooling to make Trackio the definitive tracker for autonomous ML experiments driven by AI coding agents. New CLI commands: - `trackio best` — rank runs by metric, return winner + leaderboard - `trackio compare` — side-by-side run comparison across metrics - `trackio summary` — full experiment overview with status/configs/metrics New features: - Run status tracking (running/finished/failed) with automatic lifecycle mgmt - Structured alert data (`data={}` param on `trackio.alert()`) - Metric watchers (`trackio.watch()`) for auto NaN/spike/stagnation detection - `trackio.should_stop()` for training loop early stopping - Python API: `run.metrics()`, `run.history()`, `run.summary`, `run.status` Test infrastructure: - Synthetic training simulator (no ML deps, runs in seconds) - Agent test runner with 5 experiment types Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Resolve merge conflicts integrating autonomous ML features with main branch changes (run_id support, RemoteClient, query command, OAuth, error handling). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

gradio-pr-bot · 2026-04-20T18:25:53Z

🪼 branch checks and previews

•	Name	Status	URL
🦄	Changes	detected!	Details

gradio-pr-bot · 2026-04-20T18:25:56Z

🦄 change detected

This Pull Request includes changes to the following packages.

Package	Version
`trackio`	`minor`

Add additional support for autonomous ML experiments
- trackio.watch() / trackio.should_stop(): register metric watchers (NaN/Inf, threshold, spike, stagnation, custom fn) that fire alerts automatically on every trackio.log() call
AlertReason constants for programmatic alert filtering
Run lifecycle status tracking (running → finished / failed) persisted in SQLite
New CLI commands: trackio best, trackio compare, trackio summary
Run.status, Run.final_metrics, Run.metrics(), Run.history() on the Python API
alerts.data column (SQL migration) for structured alert metadata

‼️ Changeset not approved. Ensure the version bump is appropriate for all packages before approving.

Maintainers can approve the changeset by checking this checkbox.

Something isn't right?

Maintainers can change the version label to modify the version bump.
If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

HuggingFaceDocBuilderDev · 2026-04-20T18:26:25Z

🪼 branch checks and previews

•	Name	Status	URL
	Spaces	ready!	Spaces preview

Install Trackio from this PR (includes built frontend)

pip install "https://huggingface.co/buckets/trackio/trackio-wheels/resolve/e58200b25c87922e98bbc1241009e5f2c26848eb/trackio-0.25.1-py3-none-any.whl"

HuggingFaceDocBuilderDev · 2026-04-20T18:28:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Fixes: - Fix 1: Run status "failed" no longer overwritten by "finished" — finish() now accepts a status parameter, _cleanup_current_run passes status="failed" directly - Fix 2: Watcher patience supports both min and max mode via new mode parameter (was hardcoded to minimization only) - Fix 3: Replace dead --minimize/--maximize flags with --direction {min,max} on trackio best - Fix 5: Watcher _values now uses deque(maxlen=window) to bound memory; alert dedup via per-condition flags that reset on return to normal - Fix 6: Warn if watchers exist when init() clears them; docstring documents ordering requirement - Fix 7: Drop Api.Run.summary cache — recompute on each access - Fix 9: set_run_status/get_run_status now accept and use run_id, INSERT handles run_id NOT NULL column in new schema - Fix 12: Remove unused enumerate variable in agent_runner Tests: - test_watchers.py: 18 tests covering nan, spike, max/min threshold, dedup, patience min/max mode, window bounds, manager propagation - test_run_status.py: 6 tests covering running→finished, failed status, idempotent finish, Api.Run.status, multi-run status - test_cli_agent_commands.py: 10 tests covering best/compare/summary in JSON and human-readable modes, error cases Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…ests - Watchers: 18 → 9 tests by merging nan+inf, combining dedup with trigger tests, merging patience modes - Run status: 6 → 4 tests by dropping idempotent (covered by overwrite) - CLI: 10 → 4 tests by seeding project once via module fixture, dropping human-readable format tests, merging maximize/subset into main tests Total: 34 → 17 tests, 494 → 336 lines, same code path coverage. CLI tests 3x faster (7.5s vs 25s) from shared fixture. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

znation · 2026-05-05T10:26:57Z

I had Claude analyze it and write up some actionable feedback. At first I was going to go through and leave comments for each item in the source code, but I realized it's probably more efficient to just include the Markdown file here.
autonomous-ml-trackio-pr-review.md

Note that Claude considers two issues blockers:

B1. Run-status writes hit the local SQLite DB unconditionally — even for Space / self-hosted runs
B2. Remote-mode `Run.alert(..., data=...)` silently drops the `data` field

Saba9 and others added 4 commits April 20, 2026 10:45

Fix lint warnings in test harness scripts

f8303cd

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Remove remaining unused variable in simulator

4df7718

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Merge origin/main into saba/auto-ml

c1f8311

Resolve merge conflicts integrating autonomous ML features with main branch changes (run_id support, RemoteClient, query command, OAuth, error handling). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

add changeset

b959e49

Saba9 and others added 2 commits April 20, 2026 11:50

Fix undefined variable in agent_runner after enumerate removal

2283983

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Saba9 changed the title ~~Add autonomous ML experiment support~~ Add additional support for autonomous ML experiments Apr 27, 2026

gradio-pr-bot and others added 14 commits April 27, 2026 16:52

add changeset

5dac2d9

refactors

db8f78f

simpify testing setup

3757afb

factor out project existence check

6f09d2b

misc

8f03ab3

simplify tests

17c3f6f

further simplify

6122070

format

34791b4

Merge branch 'main' into saba/auto-ml

17d3917

provide start, stop time in CLI output

842bd54

format

af8331c

support custom conditions

2704679

legacy support

81e2561

Saba9 marked this pull request as ready for review May 4, 2026 23:45

Saba9 requested review from abidlabs, qgallouedec and znation May 4, 2026 23:45

Saba9 added 4 commits May 11, 2026 16:03

misc fixes

73f17d0

optimize queries

f6c4755

server api

75f93f6

fix test harnesses

e58200b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional support for autonomous ML experiments#521

Add additional support for autonomous ML experiments#521
Saba9 wants to merge 25 commits into
mainfrom
saba/auto-ml

Saba9 commented Apr 20, 2026 •

edited

Loading

Uh oh!

gradio-pr-bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

gradio-pr-bot commented Apr 20, 2026 •

edited

Loading

Something isn't right?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 20, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 20, 2026

Uh oh!

znation commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Saba9 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short description

AI Disclosure

Type of Change

Related Issues

Testing and linting

Uh oh!

gradio-pr-bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪼 branch checks and previews

Uh oh!

gradio-pr-bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦄 change detected

This Pull Request includes changes to the following packages.

Something isn't right?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪼 branch checks and previews

Uh oh!

HuggingFaceDocBuilderDev commented Apr 20, 2026

Uh oh!

znation commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Saba9 commented Apr 20, 2026 •

edited

Loading

gradio-pr-bot commented Apr 20, 2026 •

edited

Loading

gradio-pr-bot commented Apr 20, 2026 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 20, 2026 •

edited

Loading