You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .changeset/every-spoons-smash.md
+8-1Lines changed: 8 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,4 +2,11 @@
2
2
"trackio": minor
3
3
---
4
4
5
-
feat:Add additional support for autonomous ML experiments
5
+
feat: Add additional support for autonomous ML experiments
6
+
7
+
-`trackio.watch()` / `trackio.should_stop()`: register metric watchers (NaN/Inf, threshold, spike, stagnation, custom fn) that fire alerts automatically on every `trackio.log()` call
8
+
-`AlertReason` constants for programmatic alert filtering
9
+
- Run lifecycle status tracking (`running` → `finished` / `failed`) persisted in SQLite
10
+
- New CLI commands: `trackio best`, `trackio compare`, `trackio summary`
11
+
-`Run.status`, `Run.final_metrics`, `Run.metrics()`, `Run.history()` on the Python API
12
+
-`alerts.data` column (SQL migration) for structured alert metadata
Copy file name to clipboardExpand all lines: docs/source/alerts.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,7 +73,7 @@ Watcher-generated alerts are stored, displayed in the dashboard, and delivered t
73
73
|---|---|---|---|
74
74
|`metric`|`str`|*(required)*| The metric name to watch (e.g., `"train/loss"`). |
75
75
|`nan`|`bool`|`True`| Fire an ERROR alert if the value becomes NaN or Inf. |
76
-
|`spike_factor`|`float \| None`|`None`| Fire a WARN alert when the value deviates from the recent moving average by this factor (e.g., `3.0`= 3× the average). |
76
+
|`spike_factor`|`float \| None`|`None`| Fire a WARN alert when `\|value − recent_avg\| > (spike_factor − 1) × \|recent_avg\|`(e.g., `3.0`triggers when the deviation exceeds 2× `\|avg\|`). Symmetric — drops trigger too. |
77
77
|`patience`|`int \| None`|`None`| Fire a WARN alert if no improvement is seen for this many log steps. Also sets `should_stop()` to `True`. |
78
78
|`min_delta`|`float`|`0.0`| Minimum change to count as an improvement (used with `patience`). |
79
79
|`max_value`|`float \| None`|`None`| Fire an ERROR alert if the value exceeds this threshold. Also sets `should_stop()` to `True`. |
`max_value` fires an **ERROR** alert (and stops) when the metric exceeds the threshold. `min_value` fires a **WARN** alert when it falls below. Each alert fires once when the threshold is crossed and resets if the value recovers.
96
+
`max_value` fires an **ERROR** alert (and stops) when the metric exceeds the threshold. `min_value` fires a **WARN** alert when it falls below, but — unlike `max_value` — does **not** set `should_stop()`. Each alert fires once when the threshold is crossed and resets if the value recovers.
Fires a **WARN** alert when the value deviates from the recent moving average by more than `(spike_factor - 1) × avg`. The alert resets automatically once the value returns to normal.
105
+
Fires a **WARN** alert when the value deviates from the recent moving average by more than `(spike_factor - 1) × |recent_avg|` — that is, when `|value − recent_avg| > (spike_factor − 1) × |recent_avg|`. Detection is symmetric: sudden drops trigger the alert in addition to sudden rises. With `spike_factor=3.0` and a recent average of `1.0`, the alert fires once `|value − 1.0| > 2.0`. The alert resets automatically once the value returns to normal.
[`should_stop`] returns `True` if any watcher has triggered a stop condition (NaN/Inf, `max_value` exceeded, or `patience` exhausted):
121
+
[`should_stop`] returns `True` if any watcher has triggered a stop condition (NaN/Inf, `max_value` exceeded, `patience` exhausted, or a custom watcher returned `{"stop": True}`):
0 commit comments