Add alerts with webhooks, CLI, and documentation by abidlabs · Pull Request #439 · gradio-app/trackio

abidlabs · 2026-02-24T17:40:04Z

Summary

Adds a complete alerts system to Trackio. Alerts let users flag important events during training runs — they're printed to the terminal, stored in the database, displayed in the dashboard, and optionally sent to webhooks.

In Slack (check the #trackio-alerts channel internally), looks like this:

Basic Usage

import trackio

trackio.init(project="my-project", webhook_url="https://hooks.slack.com/services/T.../B.../xxx")

for epoch in range(100):
    loss = train(...)
    trackio.log({"loss": loss})

    if epoch > 10 and loss > 5.0:
        trackio.alert(
            title="Loss spike",
            text=f"Loss jumped to {loss:.2f} at epoch {epoch}",
            level=trackio.AlertLevel.ERROR,
        )

trackio.finish()

Using with Transformers / TRL

When using report_to="trackio", the TrackioCallback handles init/log/finish. To add alerts, pass a custom callback:

import trackio
from transformers import Trainer, TrainerCallback, TrainingArguments

class AlertCallback(TrainerCallback):
    def on_log(self, args, state, control, logs=None, **kwargs):
        if "trackio" not in args.report_to or logs is None:
            return
        if logs.get("loss", 0) > 5.0:
            trackio.alert(
                title="Training loss spike",
                text=f"loss={logs['loss']:.4f} at step {state.global_step}",
                level=trackio.AlertLevel.ERROR,
            )

    def on_evaluate(self, args, state, control, metrics=None, **kwargs):
        if "trackio" not in args.report_to or metrics is None:
            return
        if metrics.get("eval_loss", 0) > 2.0:
            trackio.alert(
                title="High eval loss",
                text=f"eval_loss={metrics['eval_loss']:.4f}",
                level=trackio.AlertLevel.WARN,
            )

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./output",
        report_to="trackio",
        project="my-project",
    ),
    train_dataset=train_dataset,
    callbacks=[AlertCallback()],
)
trainer.train()

Same pattern works with TRL trainers (GRPOTrainer, SFTTrainer, etc.):

class RLAlertCallback(TrainerCallback):
    def on_log(self, args, state, control, logs=None, **kwargs):
        if "trackio" not in args.report_to or logs is None:
            return
        if logs.get("train/reward", 0) < -1.0:
            trackio.alert(title="Reward collapse", level=trackio.AlertLevel.ERROR)
        if logs.get("train/kl", 0) > 10.0:
            trackio.alert(title="KL divergence too high", level=trackio.AlertLevel.WARN)

This PR was authored with AI assistance, but I tested and reviewed it myself.

gradio-pr-bot · 2026-02-24T17:40:36Z

🦄 change detected

This Pull Request includes changes to the following packages.

Package	Version
`trackio`	`minor`

Add alerts with webhooks, CLI, and documentation

‼️ Changeset not approved. Ensure the version bump is appropriate for all packages before approving.

Maintainers can approve the changeset by checking this checkbox.

Something isn't right?

Maintainers can change the version label to modify the version bump.
If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

gradio-pr-bot · 2026-02-24T17:40:37Z

🪼 branch checks and previews

•	Name	Status	URL
🦄	Changes	detected!	Details

Co-authored-by: Cursor <cursoragent@cursor.com>

HuggingFaceDocBuilderDev · 2026-02-24T18:03:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2026-02-24T19:58:14Z

Looking forward to use it with Slack!!

abidlabs · 2026-02-24T20:04:07Z

just testing it rn @qgallouedec!

Added details about Slack Block Kit messages in alerts documentation.

Removed an image and added a description of Slack Block Kit messages.

Co-authored-by: Cursor <cursoragent@cursor.com>

abidlabs · 2026-02-25T19:54:06Z

Also added an agent skill (/.agents/skills/trackio/) so that LLM coding agents (Cursor, Claude Code, etc.) can automatically discover and use Trackio when running ML experiments.

This same skill is also published to hf-skills as hugging-face-trackio so it can be installed by any agent.

abidlabs · 2026-02-25T19:54:20Z

Also added two capabilities for inspecting metrics at specific points in time — designed for the workflow where an alert fires and you (or an agent) need to quickly understand what happened.

trackio get metric now supports --step, --around, --at-time, and --window

Filter a single metric to a specific step or a window around a step/timestamp:

# Exact step
trackio get metric --project P --run R --metric loss --step 200 --json

# Window of ±10 steps (default) around step 200
trackio get metric --project P --run R --metric loss --around 200 --json

# Window of ±60 seconds around a timestamp
trackio get metric --project P --run R --metric loss --at-time "2025-06-01T12:05:30" --window 60 --json

trackio get snapshot

Returns all metrics at/around a step or timestamp in a single call. This is the fastest way to understand the full state of a run at a specific point:

trackio get snapshot --project P --run R --around 200 --window 5 --json

Returns:

{
  "project": "P",
  "run": "R",
  "around": 200,
  "window": 5,
  "metrics": {
    "loss": [{"step": 198, "value": 0.42}, {"step": 200, "value": 0.45}, ...],
    "accuracy": [{"step": 198, "value": 0.88}, {"step": 200, "value": 0.87}, ...],
    "lr": [{"step": 198, "value": 0.0001}, {"step": 200, "value": 0.0001}, ...]
  }
}

The typical agent workflow is: see alert at step N → inspect metrics around step N → decide to continue or adjust. Previously, the agent would need to fetch the entire metric history and filter client-side. Now it's a single CLI call.

qgallouedec · 2026-02-25T21:21:40Z

do you think we should also call gr.Warning? At first at was a bit surprised that nothing was showing on my dashboard

qgallouedec · 2026-02-25T21:27:29Z

Another question: did you consider having a dedicated “panel” within the Metrics view instead of creating a new tab? I’m wondering if switching tabs back and forth might become cumbersome when monitoring a run (ie, look at the curves while also keeping an eye on the latest alerts)

abidlabs · 2026-02-26T19:38:42Z

do you think we should also call gr.Warning? At first at was a bit surprised that nothing was showing on my dashboard

Another question: did you consider having a dedicated “panel” within the Metrics view instead of creating a new tab? I’m wondering if switching tabs back and forth might become cumbersome when monitoring a run (ie, look at the curves while also keeping an eye on the latest alerts)

Great feedback @qgallouedec! I've redesigned the UI in the Trackio dashboard, replacing the dedicated Alerts page with an Alerts box that appears on the bottom right of every page:

This box can be expanded to view the latest alerts or collapsed. (It only appears if there is at least 1 alert). Let me know what you think!

This box will show the alerts that have been generated since you launched the Trackio dashboard. You can also view the historical alerts by going to the Reports page.

qgallouedec · 2026-02-26T20:11:38Z

Awesome! I think it's a better design indeed. I will try it again today.

abidlabs · 2026-03-03T20:22:33Z

Will go ahead and merge this, will release tomorrow if you have time to take a look before then @qgallouedec but no pressure if not

abidlabs and others added 2 commits February 24, 2026 09:36

changes

1fde6a1

add changeset

3a555e9

abidlabs and others added 3 commits February 24, 2026 09:56

changes

fcecb13

changes

87d5705

Co-authored-by: Cursor <cursoragent@cursor.com>

changes

6b50ccc

abidlabs changed the title ~~Add alerts~~ Add alerts with webhooks, CLI, and documentation Feb 24, 2026

add changeset

6f0b480

abidlabs added 2 commits February 24, 2026 10:11

changes

4adc195

changes

64230e6

abidlabs and others added 5 commits February 24, 2026 12:04

changes

935a216

changes

4d19458

Enhance alerts.md with Slack message details

ae5450a

Added details about Slack Block Kit messages in alerts documentation.

Update alerts documentation for Slack messages

16a7b79

Removed an image and added a description of Slack Block Kit messages.

changes

8249074

Co-authored-by: Cursor <cursoragent@cursor.com>

abidlabs marked this pull request as ready for review February 24, 2026 20:27

abidlabs requested review from Saba9, qgallouedec and znation February 24, 2026 20:28

abidlabs mentioned this pull request Feb 25, 2026

Update Trackio skill with alerts and new CLI commands huggingface/skills#60

Merged

abidlabs added 4 commits February 25, 2026 12:01

changes

af8b062

changes

b1995cc

changes

397f702

changes

71dd6f1

sohrabjony37 approved these changes Feb 26, 2026

View reviewed changes

abidlabs and others added 3 commits February 26, 2026 11:39

changes

780a6fc

changes

6edaae6

Update alerts.md

86a1589

abidlabs merged commit 18e9650 into main Mar 3, 2026
8 checks passed

gradio-pr-bot mentioned this pull request Mar 3, 2026

chore: update versions #440

Merged

Conversation

abidlabs commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Basic Usage

Using with Transformers / TRL

Uh oh!

gradio-pr-bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦄 change detected

This Pull Request includes changes to the following packages.

Something isn't right?

Uh oh!

gradio-pr-bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪼 branch checks and previews

Uh oh!

HuggingFaceDocBuilderDev commented Feb 24, 2026

Uh oh!

qgallouedec commented Feb 24, 2026

Uh oh!

abidlabs commented Feb 24, 2026

Uh oh!

abidlabs commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abidlabs commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Feb 25, 2026

Uh oh!

qgallouedec commented Feb 25, 2026

Uh oh!

abidlabs commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Feb 26, 2026

Uh oh!

abidlabs commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

abidlabs commented Feb 24, 2026 •

edited

Loading

gradio-pr-bot commented Feb 24, 2026 •

edited

Loading

gradio-pr-bot commented Feb 24, 2026 •

edited

Loading

abidlabs commented Feb 25, 2026 •

edited

Loading

abidlabs commented Feb 25, 2026 •

edited

Loading

abidlabs commented Feb 26, 2026 •

edited

Loading