Conversation
🦄 change detectedThis Pull Request includes changes to the following packages.
|
🪼 branch checks and previews
|
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Looking forward to use it with Slack!! |
|
just testing it rn @qgallouedec! |
Added details about Slack Block Kit messages in alerts documentation.
Removed an image and added a description of Slack Block Kit messages.
|
Also added an agent skill ( This same skill is also published to hf-skills as |
|
Also added two capabilities for inspecting metrics at specific points in time — designed for the workflow where an alert fires and you (or an agent) need to quickly understand what happened.
Filter a single metric to a specific step or a window around a step/timestamp: # Exact step
trackio get metric --project P --run R --metric loss --step 200 --json
# Window of ±10 steps (default) around step 200
trackio get metric --project P --run R --metric loss --around 200 --json
# Window of ±60 seconds around a timestamp
trackio get metric --project P --run R --metric loss --at-time "2025-06-01T12:05:30" --window 60 --json
Returns all metrics at/around a step or timestamp in a single call. This is the fastest way to understand the full state of a run at a specific point: trackio get snapshot --project P --run R --around 200 --window 5 --jsonReturns: {
"project": "P",
"run": "R",
"around": 200,
"window": 5,
"metrics": {
"loss": [{"step": 198, "value": 0.42}, {"step": 200, "value": 0.45}, ...],
"accuracy": [{"step": 198, "value": 0.88}, {"step": 200, "value": 0.87}, ...],
"lr": [{"step": 198, "value": 0.0001}, {"step": 200, "value": 0.0001}, ...]
}
}The typical agent workflow is: see alert at step N → inspect metrics around step N → decide to continue or adjust. Previously, the agent would need to fetch the entire metric history and filter client-side. Now it's a single CLI call. |
|
do you think we should also call |
|
Another question: did you consider having a dedicated “panel” within the Metrics view instead of creating a new tab? I’m wondering if switching tabs back and forth might become cumbersome when monitoring a run (ie, look at the curves while also keeping an eye on the latest alerts) |
Great feedback @qgallouedec! I've redesigned the UI in the Trackio dashboard, replacing the dedicated Alerts page with an Alerts box that appears on the bottom right of every page:
This box can be expanded to view the latest alerts or collapsed. (It only appears if there is at least 1 alert). Let me know what you think! This box will show the alerts that have been generated since you launched the Trackio dashboard. You can also view the historical alerts by going to the Reports page. |
|
Awesome! I think it's a better design indeed. I will try it again today. |
|
Will go ahead and merge this, will release tomorrow if you have time to take a look before then @qgallouedec but no pressure if not |

Summary
Adds a complete alerts system to Trackio. Alerts let users flag important events during training runs — they're printed to the terminal, stored in the database, displayed in the dashboard, and optionally sent to webhooks.
In Slack (check the
#trackio-alertschannel internally), looks like this:Basic Usage
Using with Transformers / TRL
When using
report_to="trackio", theTrackioCallbackhandles init/log/finish. To add alerts, pass a custom callback:Same pattern works with TRL trainers (
GRPOTrainer,SFTTrainer, etc.):This PR was authored with AI assistance, but I tested and reviewed it myself.