Skip to content

Commit 3b397df

Browse files
Add docs on SQL & Parquet schema / format, as well as a new CLI command: trackio query --sql SQL_QUERY (#502)
Co-authored-by: gradio-pr-bot <gradio-pr-bot@users.noreply.github.com>
1 parent 1b96db3 commit 3b397df

13 files changed

Lines changed: 704 additions & 8 deletions

File tree

.agents/skills/trackio/SKILL.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Trackio is an experiment tracking library for logging and visualizing ML trainin
1414
| **Logging metrics** during training | Python API | [logging_metrics.md](logging_metrics.md) |
1515
| **Firing alerts** for training diagnostics | Python API | [alerts.md](alerts.md) |
1616
| **Retrieving metrics & alerts** after/during training | CLI | [retrieving_metrics.md](retrieving_metrics.md) |
17+
| **Inspecting storage schema and running direct SQL** | CLI | [storage_schema.md](storage_schema.md) |
1718

1819
## When to Use Each
1920

@@ -47,15 +48,17 @@ Use the `trackio` command to query logged metrics and alerts:
4748

4849
- `trackio list projects/runs/metrics` — discover what's available
4950
- `trackio get project/run/metric` — retrieve summaries and values
51+
- `trackio query project --project <name> --sql "SELECT ..."` — run catch-all read-only SQL
5052
- `trackio list alerts --project <name> --json` — retrieve alerts
5153
- `trackio show` — launch the dashboard
5254
- `trackio sync` — sync to HF Space
5355

5456
**Key concept**: Add `--json` for programmatic output suitable for automation and LLM agents.
5557

56-
**Remote Spaces**: Add `--space <space_id_or_url>` to any `list`/`get` command to query a remote HF Space instead of local data. Use `--hf-token` for private Spaces.
58+
**Remote Spaces**: Add `--space <space_id_or_url>` to any `list`/`get`/`query` command to query a remote HF Space instead of local data. Use `--hf-token` for private Spaces.
5759

5860
→ See [retrieving_metrics.md](retrieving_metrics.md) for all commands, workflows, and JSON output formats.
61+
→ See [storage_schema.md](storage_schema.md) for SQLite tables, parquet layout, and direct query examples.
5962

6063
## Minimal Logging Setup
6164

@@ -73,6 +76,7 @@ trackio.finish()
7376
```bash
7477
trackio list projects --json
7578
trackio get metric --project my-project --run my-run --metric loss --json
79+
trackio query project --project my-project --sql "SELECT name FROM sqlite_master WHERE type = 'table'" --json
7680

7781
# Query a remote Space
7882
trackio list projects --space username/my-space --json

.agents/skills/trackio/retrieving_metrics.md

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ The `trackio` CLI provides direct terminal access to query Trackio experiment tr
1818
| Get metric around step | `trackio get metric ... --metric <name> --around <N> --window <W>` |
1919
| Get all metrics snapshot | `trackio get snapshot --project <name> --run <name> --step <N>` |
2020
| Get system metrics | `trackio get system-metric --project <name> --run <name>` |
21+
| Run direct SQL | `trackio query project --project <name> --sql "SELECT ..."` |
2122
| Query remote Space | `trackio list projects --space <space_id_or_url>` |
2223
| Show dashboard | `trackio show [--project <name>]` |
2324
| Sync to Space | `trackio sync --project <name> --space-id <space_id>` |
@@ -69,14 +70,23 @@ trackio get system-metric --project <name> --run <name> --metric <name> # Speci
6970
trackio get system-metric --project <name> --run <name> --json
7071
```
7172

73+
### Query Command
74+
75+
```bash
76+
trackio query project --project <name> --sql "SELECT name FROM sqlite_master WHERE type = 'table'"
77+
trackio query project --project <name> --sql "PRAGMA table_info(metrics)" --json
78+
trackio query project --project <name> --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name"
79+
```
80+
7281
### Remote Space Queries
7382

74-
All `list` and `get` commands support querying a remote HF Space with `--space`:
83+
All `list`, `get`, and `query` commands support querying a remote HF Space with `--space`:
7584

7685
```bash
7786
trackio list projects --space user/my-space # Space ID
7887
trackio list projects --space https://user-my-space.hf.space # Space URL
7988
trackio get metric --project <name> --run <name> --metric loss --space user/my-space
89+
trackio query project --project <name> --sql "SELECT COUNT(*) AS num_alerts FROM alerts" --space user/my-space
8090
trackio list projects --space user/private-space --hf-token hf_xxx # Private Space
8191
```
8292

@@ -100,7 +110,7 @@ trackio sync --project <name> --space-id <space_id> --force # Overwrite
100110

101111
## Output Formats
102112

103-
All `list` and `get` commands support two output formats:
113+
All `list`, `get`, and `query` commands support two output formats:
104114

105115
- **Human-readable** (default): Formatted text for terminal viewing
106116
- **JSON** (with `--json` flag): Structured JSON for programmatic use
@@ -157,6 +167,9 @@ trackio get run --project my-project --run my-run --json > run_summary.json
157167

158168
# Filter runs with jq
159169
trackio list runs --project my-project --json | jq '.runs[] | select(startswith("train"))'
170+
171+
# Run a direct SQL aggregate
172+
trackio query project --project my-project --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name" --json
160173
```
161174

162175
### LLM Agent Workflow
@@ -179,6 +192,9 @@ trackio list alerts --project my-project --json --since "2025-06-01T00:00:00"
179192

180193
# 6. When an alert fires at step N, get all metrics around that point
181194
trackio get snapshot --project my-project --run my-run --around 200 --window 5 --json
195+
196+
# 7. Fall back to direct SQL for one-off inspection
197+
trackio query project --project my-project --sql "SELECT timestamp, run_name, level, title FROM alerts ORDER BY timestamp DESC LIMIT 20" --json
182198
```
183199

184200
## Error Handling
@@ -196,9 +212,10 @@ All errors exit with non-zero status code and write to stderr.
196212
- `--project`: Project name (required for most commands)
197213
- `--run`: Run name (required for run-specific commands)
198214
- `--metric`: Metric name (required for metric-specific commands)
215+
- `--sql`: Read-only SQL query (for `trackio query`)
199216
- `--json`: Output in JSON format instead of human-readable
200-
- `--space`: HF Space ID (e.g. `user/space`) or Space URL to query remotely (for `list`/`get` commands)
201-
- `--hf-token`: HF token for accessing private Spaces (for `list`/`get` commands with `--space`)
217+
- `--space`: HF Space ID (e.g. `user/space`) or Space URL to query remotely (for `list`/`get`/`query` commands)
218+
- `--hf-token`: HF token for accessing private Spaces (for `list`/`get`/`query` commands with `--space`)
202219
- `--step`: Exact step filter (for `get metric`, `get snapshot`)
203220
- `--around`: Center step for window filter (for `get metric`, `get snapshot`)
204221
- `--at-time`: Center ISO timestamp for window filter (for `get metric`, `get snapshot`)
@@ -258,8 +275,24 @@ All errors exit with non-zero status code and write to stderr.
258275
}
259276
```
260277

278+
### Query Result
279+
```json
280+
{
281+
"project": "my-project",
282+
"query": "SELECT name FROM sqlite_master WHERE type = 'table' ORDER BY name",
283+
"columns": ["name"],
284+
"rows": [
285+
{"name": "alerts"},
286+
{"name": "configs"},
287+
{"name": "metrics"}
288+
],
289+
"row_count": 3
290+
}
291+
```
292+
261293
## References
262294

263295
- **Complete CLI documentation**: See [docs/source/cli_commands.md](docs/source/cli_commands.md)
296+
- **Storage schema and direct SQL**: See [storage_schema.md](storage_schema.md)
264297
- **API and MCP Server**: See [docs/source/api_mcp_server.md](docs/source/api_mcp_server.md)
265298

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# Trackio Storage Schema and Direct SQL
2+
3+
Use this reference when you need to inspect Trackio data directly instead of going through higher-level `trackio list` or `trackio get` commands.
4+
5+
## Where Data Is Stored
6+
7+
- Local project databases live in `TRACKIO_DIR`, which defaults to `~/.cache/huggingface/trackio`.
8+
- Each project is stored in its own SQLite file: `{project}.db`.
9+
- Media files live under `TRACKIO_DIR/media/`.
10+
- Parquet files are derived exports written from SQLite for syncing and static Spaces.
11+
12+
## SQLite Tables
13+
14+
Trackio defines its live schema in `trackio/sqlite_storage.py` inside `SQLiteStorage.init_db()`.
15+
16+
### `metrics`
17+
18+
- `id`: integer primary key
19+
- `timestamp`: ISO timestamp
20+
- `run_name`: run identifier
21+
- `step`: integer step
22+
- `metrics`: JSON text payload
23+
- `log_id`: optional deduplication key
24+
- `space_id`: optional pending-sync marker
25+
26+
Indexes:
27+
28+
- `(run_name, step)`
29+
- `(run_name, timestamp)`
30+
- unique partial index on `log_id`
31+
- partial index on `space_id`
32+
33+
### `configs`
34+
35+
- `id`: integer primary key
36+
- `run_name`: run identifier
37+
- `config`: JSON text payload
38+
- `created_at`: ISO timestamp
39+
40+
Constraints:
41+
42+
- unique `run_name`
43+
- index on `run_name`
44+
45+
### `system_metrics`
46+
47+
- `id`: integer primary key
48+
- `timestamp`: ISO timestamp
49+
- `run_name`: run identifier
50+
- `metrics`: JSON text payload
51+
- `log_id`: optional deduplication key
52+
- `space_id`: optional pending-sync marker
53+
54+
Indexes:
55+
56+
- `(run_name, timestamp)`
57+
- unique partial index on `log_id`
58+
- partial index on `space_id`
59+
60+
### `project_metadata`
61+
62+
- `key`: primary key
63+
- `value`: metadata value
64+
65+
### `pending_uploads`
66+
67+
- `id`
68+
- `space_id`
69+
- `run_name`
70+
- `step`
71+
- `file_path`
72+
- `relative_path`
73+
- `created_at`
74+
75+
### `alerts`
76+
77+
- `id`
78+
- `timestamp`
79+
- `run_name`
80+
- `title`
81+
- `text`
82+
- `level`
83+
- `step`
84+
- `alert_id`
85+
86+
Indexes:
87+
88+
- `run_name`
89+
- `timestamp`
90+
- unique partial index on `alert_id`
91+
92+
## Parquet Layout
93+
94+
Trackio flattens JSON blobs when exporting parquet:
95+
96+
- `{project}.parquet` comes from `metrics`
97+
- `{project}_system.parquet` comes from `system_metrics`
98+
- `{project}_configs.parquet` comes from `configs`
99+
100+
Static export layout:
101+
102+
- `metrics.parquet`
103+
- `aux/system_metrics.parquet`
104+
- `aux/configs.parquet`
105+
- `runs.json`
106+
- `settings.json`
107+
108+
The flattened parquet files keep structural columns such as `timestamp`, `run_name`, and `step`, then add one column per JSON key found in the source payload.
109+
110+
## Direct SQL With The CLI
111+
112+
Use `trackio query` for read-only SQL:
113+
114+
```bash
115+
trackio query project --project my-project --sql "SELECT name FROM sqlite_master WHERE type = 'table' ORDER BY name" --json
116+
trackio query project --project my-project --sql "PRAGMA table_info(metrics)"
117+
trackio query project --project my-project --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name ORDER BY last_step DESC"
118+
```
119+
120+
Remote query works too:
121+
122+
```bash
123+
trackio query project --project my-project --sql "SELECT COUNT(*) AS num_alerts FROM alerts" --space username/my-space --json
124+
```
125+
126+
`trackio query` accepts read-only `SELECT`, `WITH`, and safe schema `PRAGMA` queries.
127+
128+
## Common Query Patterns
129+
130+
Recent alerts:
131+
132+
```bash
133+
trackio query project --project my-project --sql "SELECT timestamp, run_name, level, title, step FROM alerts ORDER BY timestamp DESC LIMIT 20"
134+
```
135+
136+
Latest step per run:
137+
138+
```bash
139+
trackio query project --project my-project --sql "SELECT run_name, MAX(step) AS last_step FROM metrics GROUP BY run_name ORDER BY last_step DESC"
140+
```
141+
142+
Recent configs:
143+
144+
```bash
145+
trackio query project --project my-project --sql "SELECT run_name, created_at, config FROM configs ORDER BY created_at DESC"
146+
```
147+
148+
Schema inspection:
149+
150+
```bash
151+
trackio query project --project my-project --sql "PRAGMA index_list(metrics)"
152+
```
153+
154+
## Agent Guidance
155+
156+
- Start with `trackio list projects --json` if you do not know the project name yet.
157+
- Use `trackio get` for common summaries and metric retrieval.
158+
- Fall back to `trackio query` when you need one-off aggregates, joins, or schema introspection.
159+
- Prefer `--json` when another agent or script needs to consume the result.

.changeset/cyan-forks-hang.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"trackio": minor
3+
---
4+
5+
feat:Add docs on SQL & Parquet schema / format, as well as a new CLI command: `trackio query project --project PROJECT --sql SQL_QUERY`

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ Trackio's main features:
3939
- Persists logs in a Sqlite database locally (or, if you provide a `space_id`, in a private Hugging Face Dataset)
4040
- Visualize experiments with a **Svelte 5** dashboard locally (or, if you provide a `space_id`, on Hugging Face Spaces)
4141
- **LLM-friendly**: Built with autonomous ML experiments in mind, Trackio includes a CLI for programmatic access and a Python API for run management, making it easy for LLMs to log metrics and query experiment data.
42+
- Use `trackio query project --project <name> --sql "SELECT ..."` for read-only SQL when `trackio list` and `trackio get` are not enough
43+
- See the storage schema and direct query reference at https://huggingface.co/docs/trackio/storage_schema
4244

4345
- **Free**: Everything here, including hosting on Hugging Face, is free!
4446

@@ -266,6 +268,8 @@ These numbers were measured against a free-tier Hugging Face Space (2 vCPU / 16
266268

267269
Note that Trackio is in pre-release right now and we may release breaking changes. In particular, the schema of the Trackio sqlite database may change, which may require migrating or deleting existing database files (located by default at: `~/.cache/huggingface/trackio`).
268270

271+
The current SQLite and parquet layout is documented in the [Storage Schema and Direct Queries](https://huggingface.co/docs/trackio/storage_schema) guide, including examples for `trackio query`.
272+
269273
Since Trackio is in beta, your feedback is welcome! Please create issues with bug reports or feature requests.
270274

271275
## License

docs/source/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121
title: Python API for Managing Runs
2222
- local: cli_commands
2323
title: CLI Commands
24+
- local: storage_schema
25+
title: Storage Schema and Direct Queries
2426
- local: api_mcp_server
2527
title: API and MCP Server
2628
- local: alerts

0 commit comments

Comments
 (0)