Traces in Trackio#518
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🦄 change detectedThis Pull Request includes changes to the following packages.
|
🪼 branch checks and previews
|
🪼 branch checks and previews
Install Trackio from this PR (includes built frontend) pip install "https://huggingface.co/buckets/trackio/trackio-wheels/resolve/aa0f89bd16f476ebf7b7ea8e56544a89e3f148f5/trackio-0.24.2-py3-none-any.whl" |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sergiopaniego
left a comment
There was a problem hiding this comment.
Thanks for the proposal!!
One question: would it support rendering images?
possible use case: you're training with GRPO + an env (e.g., OpenEnv), and the env returns a list of images (e.g., a browser env returning screenshots). It'd be nice to render them inline with the messages
|
Very cool! looking forward to integrate this! |
yep can do! We already support images in tables, so we should be able to do the same here |
# Conflicts: # trackio/frontend/src/App.svelte
|
Ok based on great feedback from everyone, have updated this PR. Here's a basic example: Screen.Recording.2026-04-20.at.2.00.32.PM.mov(I've removed many of the earliers to make the UI less opinionated, thanks @adithya-s-k for the suggestion) A more complex example including images and tool calls: Screen.Recording.2026-04-20.at.2.03.29.PM.movAnd a potential example of how to use it with TRL: Any other suggestions/improvements are welcome! |
There was a problem hiding this comment.
Pull request overview
Adds first-class “trace” logging and a UI for browsing conversational/agent traces in Trackio, integrating with the existing metrics/log storage and dashboard routing.
Changes:
- Introduce
Tracepayload type that serializes nested Trackio media inside messages/metadata. - Add
SQLiteStorage.get_traces()+ server API/get_tracesto extract/search/sort traces from metric logs. - Add a new Svelte “Traces” page and navigation wiring (dynamic + static modes).
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| trackio/trace.py | New Trace object with nested media serialization. |
| trackio/run.py | Logs Trace instances and recursively queues nested media uploads. |
| trackio/sqlite_storage.py | Extracts trace records from metric logs; supports search/sort/limit/offset. |
| trackio/server.py | Exposes get_traces via the server API registry. |
| trackio/frontend/src/pages/Traces.svelte | New UI page to list/search/sort and expand trace conversations. |
| trackio/frontend/src/lib/api.js | Adds getTraces() client wrapper (static + server modes). |
| trackio/frontend/src/lib/staticApi.js | Implements static-mode trace extraction/search/sort from exported logs. |
| trackio/frontend/src/lib/router.js | Adds /traces route mapping. |
| trackio/frontend/src/components/Navbar.svelte | Adds “Traces” nav link. |
| trackio/frontend/src/App.svelte | Renders the Traces page and includes it in sidebar-enabled pages. |
| trackio/init.py | Exports Trace from the top-level package API. |
| tests/unit/test_trace.py | Unit coverage for trace serialization + storage search/sort. |
| tests/e2e-local/test_trace_e2e.py | E2E round-trip test for logging and reading traces. |
| examples/traces/basic-trace.py | Example: minimal trace logging. |
| examples/traces/complex-trace.py | Example: rich trace with tool calls + images. |
| examples/traces/trl-trace-integration.py | Example: TRL callback logging traces during training. |
| .changeset/easy-apes-hammer.md | Changeset marking a minor feature release. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const trace = { | ||
| id: `${normalizeRun(run).id || normalizeRun(run).name || "run"}:${log.step}:${key}${traceIndex !== null ? `:${traceIndex}` : ""}`, | ||
| key, | ||
| index: traceIndex, | ||
| run: normalizeRun(run).name, | ||
| run_id: normalizeRun(run).id, | ||
| step: log.step, |
| elif isinstance(value, Trace): | ||
| metrics[key] = value._to_dict( | ||
| project=self.project, run=self.name, step=step | ||
| ) | ||
| self._scan_and_queue_media_uploads(metrics[key], step) |
| offset: int = 0, | ||
| run_id: str | None = None, | ||
| ) -> list[dict[str, Any]]: | ||
| logs = SQLiteStorage.get_logs(project, run, max_points=None, run_id=run_id) |
There was a problem hiding this comment.
Good point — this is a real concern for very large runs, but filtering server-side is non-trivial because trace payloads are stored inline inside metric rows (no separate trace index), so SQLite has no cheap way to skip non-trace rows without a schema change. The input normalization is addressed in the follow-up commit; I'd like to defer the scan-reduction work to a dedicated PR that introduces a lightweight trace index table so pagination/sort can be pushed down to SQL.
| if offset > 0: | ||
| traces = traces[offset:] | ||
| if limit is not None: | ||
| traces = traces[:limit] | ||
|
|
| def get_traces( | ||
| project: str, | ||
| run: str | None = None, | ||
| run_id: str | None = None, | ||
| search: str | None = None, | ||
| sort: str | None = None, | ||
| limit: int | None = None, | ||
| offset: int | None = 0, | ||
| ) -> list[dict[str, Any]]: | ||
| return SQLiteStorage.get_traces( | ||
| project, | ||
| run, | ||
| search=search, | ||
| sort=sort, | ||
| limit=limit, | ||
| offset=offset or 0, | ||
| run_id=run_id, | ||
| ) |
| {#each visibleTraces as trace} | ||
| <tr class="trace-row" onclick={() => toggleTrace(trace.id)}> | ||
| <td class="trace-id-cell"> | ||
| <span class="trace-id">{trace.id}</span> | ||
| </td> | ||
| <td class="request-cell"> |
| async function loadTraces() { | ||
| if (!project || selectedRuns.length === 0) { | ||
| traces = []; | ||
| expandedTraceId = null; | ||
| return; | ||
| } | ||
|
|
||
| loading = true; | ||
| try { | ||
| const batches = await Promise.all( | ||
| selectedRuns.map(async (run) => { | ||
| const runTraces = await getTraces(project, run); | ||
| return runTraces.map((trace) => normalizeTrace(trace, run.name)); | ||
| }), | ||
| ); | ||
| traces = batches.flat(); | ||
| if (!traces.find((trace) => trace.id === expandedTraceId)) { | ||
| expandedTraceId = null; | ||
| } | ||
| } catch (error) { | ||
| console.error("Failed to load traces:", error); | ||
| traces = []; | ||
| } finally { | ||
| loading = false; | ||
| } |
- Cache normalizeRun result in staticApi getTraces - Normalize step (None -> _next_step) before queuing trace/table media - Validate offset/limit/sort inputs in server.get_traces and storage - Make trace rows keyboard-accessible (role/tabindex/keydown) - Guard Traces.svelte loadTraces against stale responses via request id
znation
left a comment
There was a problem hiding this comment.
Overall looks good (I only skimmed, Claude reviewed more thoroughly). Left some optional comments for issues that Claude found.
| normalized_offset = max(0, int(offset)) if offset is not None else 0 | ||
| except (TypeError, ValueError): | ||
| normalized_offset = 0 | ||
| normalized_limit: int | None |
There was a problem hiding this comment.
Double-sanitization of offset/limit
trackio/server.py:843-856 normalizes offset and limit, then passes them to trackio/sqlite_storage.py:1968-1974 which normalizes them again with identical
logic. One layer should own this.
Recommendation: Remove sanitization from sqlite_storage.py and let the API layer (server.py) be the sole validator. The storage layer can trust its internal
callers.
| self._queue_upload(absolute_path, step) | ||
| return | ||
| for nested in value.values(): | ||
| self._scan_and_queue_media_uploads(nested, step) |
There was a problem hiding this comment.
Recursive _scan_and_queue_media_uploads has no depth limit
trackio/run.py:767-786 — The refactored _scan_and_queue_media_uploads now recurses into arbitrary dicts/lists. A deeply nested trace payload (or even an
accidental circular reference via a custom dict) could blow the stack. The old version was bounded to exactly 2 levels of nesting (table rows → values →
list items).
Recommendation: Add a max_depth parameter (e.g., 10) and stop recursing beyond it. This matches the practical ceiling for trace messages.
| continue | ||
|
|
||
| trace_index = index if isinstance(value, list) else None | ||
| trace_id_parts = [run_id or run or "run", str(step), key] |
There was a problem hiding this comment.
Trace ID collisions across runs
trackio/sqlite_storage.py:1934-1937 — Trace IDs are constructed as run_id_or_name:step:key[:index]. When run_id is None and run is None, the fallback is the
string "run". If two different runs are both queried with run=None, run_id=None, they'll produce identical trace IDs, causing collisions in the frontend (the
expand/collapse toggle uses trace.id).
The frontend in Traces.svelte:57-61 fetches traces for multiple selectedRuns, flattening them into one array. If two runs share a step number + key, the IDs
will collide.
Recommendation: Include the actual run name or run ID in the trace ID unconditionally (the caller always has it from the selectedRuns list), or generate a
unique ID (e.g., hash).
| } | ||
| } | ||
|
|
||
| let visibleTraces = $derived.by(() => { |
There was a problem hiding this comment.
Client-side search duplicates server-side search
Traces.svelte:79-96 — visibleTraces does full client-side filtering/sorting on the loaded traces. But getTraces in api.js also passes search/sort options to
the server. Currently loadTraces() at line 59 calls getTraces(project, run) with no options — so the server-side search/sort/pagination is never used from
the UI. The toolbar controls only drive the client-side $derived block.
This means the server endpoint accepts search/sort/limit/offset parameters that the frontend never sends. The two code paths (server-side in
sqlite_storage.py and client-side in Traces.svelte) are duplicated logic that can drift.
Recommendation: Either remove the unused server-side filtering (YAGNI) or wire it up in the frontend and remove the client-side duplicate. Given Issue 1,
moving filtering server-side would also be the path to fixing the performance problem.
| <p>Try a different search query or model filter.</p> | ||
| </div> | ||
| {:else} | ||
| <div class="toolbar"> |
There was a problem hiding this comment.
Toolbar duplication in Traces.svelte
Traces.svelte:175-189 and Traces.svelte:198-213 — The toolbar markup (search input, sort dropdown, count display) is duplicated verbatim in both the "no
matching traces" and "has traces" branches. If you change one, you'll need to change the other.
Recommendation: Extract the toolbar into its own {#snippet} or move it above the conditional so it renders once regardless of whether traces match.
|
Thanks so much for the review @znation! Cleaned up the frontend based on your comments, will merge this in once CI is green |
Edit: see below: #518 (comment)