Add docs on SQL & Parquet schema / format, as well as a new CLI command: `trackio query --sql SQL_QUERY` by abidlabs · Pull Request #502 · gradio-app/trackio

abidlabs · 2026-04-16T05:37:08Z

Adds a new trackio query project command so users and agents can run read-only SQL against Trackio project data locally or via --space. Also documents the SQLite/parquet storage schema and direct-query workflow across docs, README, and Trackio skill references for easier data inspection and automation.

gradio-pr-bot · 2026-04-16T05:37:39Z

🪼 branch checks and previews

•	Name	Status	URL
🦄	Changes	detected!	Details

gradio-pr-bot · 2026-04-16T05:37:39Z

🦄 change detected

This Pull Request includes changes to the following packages.

Package	Version
`trackio`	`minor`

Add docs on SQL & Parquet schema / format, as well as a new CLI command: trackio query project --project PROJECT --sql SQL_QUERY

‼️ Changeset not approved. Ensure the version bump is appropriate for all packages before approving.

Maintainers can approve the changeset by checking this checkbox.

Something isn't right?

Maintainers can change the version label to modify the version bump.
If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

HuggingFaceDocBuilderDev · 2026-04-16T05:38:43Z

🪼 branch checks and previews

•	Name	Status	URL
	Spaces	ready!	Spaces preview

Install Trackio from this PR (includes built frontend)

pip install "https://huggingface.co/buckets/trackio/trackio-wheels/resolve/f85a62c1b8bb6bd1a167656c95ce3fbd83525b02/trackio-0.23.0-py3-none-any.whl"

HuggingFaceDocBuilderDev · 2026-04-16T05:39:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copilot

Pull request overview

Adds end-user documentation for Trackio’s SQLite/Parquet storage format and introduces a new CLI/API path to run read-only SQL queries against a project database (locally or via a remote Space), intended as a “catch-all” when trackio list / trackio get aren’t sufficient.

Changes:

Add SQLiteStorage.query_project() with a read-only authorizer and expose it via the server API (query_project).
Add trackio query project --project ... --sql ... CLI command with human-readable and JSON output formatting.
Add new docs pages/entries describing the SQLite schema, Parquet export layout, and query examples (plus agent skill docs updates).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`trackio/sqlite_storage.py`	Implements read-only SQL validation/authorizer and `query_project()` result shaping
`trackio/server.py`	Exposes `query_project` as a server API endpoint
`trackio/cli_helpers.py`	Adds table-like formatting for query results
`trackio/cli.py`	Adds `trackio query project ...` command, local/remote execution paths, and skills doc inclusion
`docs/source/storage_schema.md`	New documentation page for SQLite schema + Parquet layout + query usage
`docs/source/cli_commands.md`	Documents the new `query` command and notes remote support
`docs/source/_toctree.yml`	Adds the new storage schema doc to the docs nav
`README.md`	Mentions `trackio query` and links to the storage schema docs
`.changeset/cyan-forks-hang.md`	Announces the feature for release notes
`.agents/skills/trackio/storage_schema.md`	New agent skill reference for storage schema + direct SQL
`.agents/skills/trackio/retrieving_metrics.md`	Updates agent CLI guidance to include `trackio query`
`.agents/skills/trackio/SKILL.md`	Adds the new storage schema / SQL querying skill link

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-16T05:42:48Z

+                cursor = conn.cursor()
+                cursor.execute(normalized_query)
+                description = cursor.description or []
+                columns = [column[0] for column in description]
+                rows = []
+                for row in cursor.fetchall():
+                    rows.append({column: row[column] for column in columns})
+            except sqlite3.DatabaseError as e:


query_project fetches all rows into memory (cursor.fetchall()) and then materializes a list of dicts. A query like SELECT * FROM metrics on a large project can cause high memory usage / long blocking time (especially on a remote Space), effectively becoming a DoS vector. Consider enforcing a server-side max row limit (e.g., fetchmany(max_rows+1) and error if exceeded) and/or adding a progress handler / timeout to abort overly expensive queries.

Good catch — added a max_rows cap (default 10,000) via fetchmany(max_rows + 1) that raises ValueError when exceeded. Users can add a LIMIT clause to refine.

Copilot · 2026-04-16T05:42:48Z

+        return {
+            "project": project,
+            "query": normalized_query,
+            "columns": columns,
+            "rows": rows,
+            "row_count": len(rows),
+        }


The query result may include non-JSON-serializable values (e.g., SELECT randomblob(4) returns bytes). Since the CLI --json path uses json.dumps without a default=..., this will raise TypeError at runtime. Consider normalizing result values before returning (e.g., bytes -> base64/hex string) so JSON output is robust.

Agreed — added _normalize_query_value that converts bytes/bytearray/memoryview to hex strings so json.dumps works on the CLI --json output.

Copilot · 2026-04-16T05:42:48Z

+    def query_project(project: str, query: str) -> dict[str, Any]:
+        SQLiteStorage._ensure_hub_loaded()
+        db_path = SQLiteStorage.get_project_db_path(project)
+        if not db_path.exists():
+            raise FileNotFoundError(f"Project '{project}' not found.")
+
+        normalized_query = SQLiteStorage._validate_read_only_query(query)
+        with SQLiteStorage._get_connection(db_path) as conn:
+            conn.set_authorizer(SQLiteStorage._query_authorizer)


query_project is newly introduced behavior but there are no corresponding unit tests in tests/unit/test_sqlite_storage.py. Adding tests for (1) allowed SELECT/WITH queries, (2) denied write attempts (e.g., INSERT/UPDATE), and (3) safe PRAGMA allowlist behavior would prevent regressions and verify the authorizer is doing what the docs promise.

Added tests in tests/unit/test_sqlite_storage.py covering allowed SELECT/WITH/safe PRAGMA, denied writes (INSERT/UPDATE/DELETE/DROP and unsafe PRAGMA), the new row limit, bytes normalization, and missing-project handling.

Copilot · 2026-04-16T05:42:48Z

    server.api(fn=get_system_logs, name="get_system_logs")
    server.api(fn=get_snapshot, name="get_snapshot")
    server.api(fn=get_logs, name="get_logs")
+    server.api(fn=query_project, name="query_project")


Registering query_project as a public server API enables arbitrary read-only SQL execution against the live project DB on a remote Space. Even with the authorizer, this expands the attack surface (large result sets, expensive queries/functions, and access to internal tables like pending_uploads that can contain absolute local file paths). Consider gating this API behind the same auth used for run mutations, and/or restricting accessible tables/adding server-side limits to mitigate data-leak and DoS risk.

Partially disagree. All the other read endpoints on this server (get_logs, get_metrics_for_run, get_runs_for_project, etc.) are exposed under the same unauthenticated policy — query_project doesn't newly expand the read surface, it just gives a generic SQL handle to the same data. Gating only this one behind the write-token would be inconsistent and would break trackio query against public Spaces, which is the feature's point.

The DoS angle is a fair concern, and that's now addressed with the server-side max_rows cap. pending_uploads is worth flagging separately — happy to scope the authorizer to user-facing tables in a follow-up if we decide the local-path leak is a real risk in practice.

Copilot · 2026-04-16T05:42:49Z

+"trackio": minor
+---
+
+feat:Add docs on SQL & Parquet schema / format, as well as a new CLI command: `trackio query --sql SQL_QUERY`


The changeset message says the new command is trackio query --sql SQL_QUERY, but the CLI implemented in this PR requires the project subcommand (trackio query project --project ... --sql ...). Please update the changeset text to match the actual CLI interface so release notes aren’t misleading.

Fixed — updated the changeset to trackio query project --project PROJECT --sql SQL_QUERY.

- query_project: enforce max_rows cap via fetchmany(max_rows+1) - normalize bytes/bytearray/memoryview to hex for JSON safety - add unit tests for allow/deny/limit/bytes/missing-project - update changeset to match actual CLI (trackio query project ...)

abidlabs · 2026-04-17T00:35:55Z

This is a new CLI addition, but doesn't really touch existing code, so going to merge this in to unblock other PRs, such as #505

abidlabs and others added 2 commits April 15, 2026 22:36

changes

b1a25f9

add changeset

f009949

abidlabs requested review from Saba9, Copilot, qgallouedec and znation April 16, 2026 05:38

Copilot started reviewing on behalf of abidlabs April 16, 2026 05:38 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

abidlabs added 2 commits April 16, 2026 17:28

Merge branch 'main' into format

59a67b4

abidlabs merged commit 3b397df into main Apr 17, 2026
9 checks passed

gradio-pr-bot mentioned this pull request Apr 17, 2026

chore: update versions #504

Merged

Conversation

abidlabs commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gradio-pr-bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪼 branch checks and previews

Uh oh!

gradio-pr-bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦄 change detected

This Pull Request includes changes to the following packages.

Something isn't right?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪼 branch checks and previews

Uh oh!

HuggingFaceDocBuilderDev commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

abidlabs Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

abidlabs Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

abidlabs Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

abidlabs Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

abidlabs Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

abidlabs commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abidlabs commented Apr 16, 2026 •

edited

Loading

gradio-pr-bot commented Apr 16, 2026 •

edited

Loading

gradio-pr-bot commented Apr 16, 2026 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 16, 2026 •

edited

Loading