Skip to content

Commit 246fce0

Browse files
abidlabsclaude
andauthored
Deprecate dataset backend (#471)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent bea8c9d commit 246fce0

11 files changed

Lines changed: 30 additions & 139 deletions

File tree

.changeset/open-meals-smoke.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"trackio": minor
3+
---
4+
5+
feat:Deprecate dataset backend in favor of buckets

README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,6 @@ You can query alerts via the CLI (`trackio get alerts --project "my-project" --j
237237
To get started and see basic examples of usage, see these files:
238238

239239
- [Basic example of logging metrics locally](https://github.com/gradio-app/trackio/blob/main/examples/fake-training.py)
240-
- [Persisting metrics in a Hugging Face Dataset](https://github.com/gradio-app/trackio/blob/main/examples/persist-dataset.py)
241240
- [Deploying the dashboard to Spaces](https://github.com/gradio-app/trackio/blob/main/examples/deploy-on-spaces.py)
242241

243242
## Throughput & Rate Limits

docs/source/environment_variables.md

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -106,18 +106,9 @@ export TRACKIO_WEBHOOK_MIN_LEVEL="warn"
106106
107107
With `warn`, only `WARN` and `ERROR` alerts are sent to webhook URLs.
108108
109-
### `TRACKIO_DATASET_ID`
110-
111-
Sets the Hugging Face Dataset ID where logs will be stored when running on Hugging Face Spaces. If not provided, the dataset name will be set automatically when deploying to Spaces.
112-
113-
114-
```bash
115-
export TRACKIO_DATASET_ID="username/dataset_name"
116-
```
117-
118109
### `HF_TOKEN`
119110
120-
Your Hugging Face authentication token. Required for creating Spaces and Datasets on Hugging Face. Set this locally when deploying to Spaces from your machine. Must have `write` permissions for the namespace that you are deploying the Trackio dashboard.
111+
Your Hugging Face authentication token. Required for creating Spaces and Buckets on Hugging Face. Set this locally when deploying to Spaces from your machine. Must have `write` permissions for the namespace that you are deploying the Trackio dashboard.
121112
122113
```bash
123114
export HF_TOKEN="hf_xxxxxxxxxxxxx"

examples/persist-dataset.py

Lines changed: 0 additions & 88 deletions
This file was deleted.

examples/sync-static-space.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
77
This will:
88
1. Log a few runs of fake training metrics locally
9-
2. Call trackio.sync() which exports data as Parquet to an HF Dataset
9+
2. Call trackio.sync() which uploads the local project to an HF Bucket
1010
and deploys a static dashboard Space (no running server needed)
1111
1212
Set HF_TOKEN or run `huggingface-cli login` first.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ readme = "README.md"
1414
requires-python = ">=3.10"
1515
dependencies = [
1616
"pandas<3.0.0",
17-
"huggingface-hub>=1.9.0,<2",
17+
"huggingface-hub>=1.9.2,<2",
1818
"gradio[oauth]>=6.10.0,<7.0.0",
1919
"numpy<3.0.0",
2020
"pillow<12.0.0",

trackio/__init__.py

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -139,15 +139,11 @@ def init(
139139
space_storage ([`~huggingface_hub.SpaceStorage`], *optional*):
140140
Choice of persistent storage tier.
141141
dataset_id (`str`, *optional*):
142-
If provided, uses the legacy Hugging Face Dataset backend for metric
143-
persistence (metrics are exported to Parquet and committed every 5 minutes).
144-
Specify a Dataset with name like `"username/datasetname"` or
145-
`"orgname/datasetname"`, or `"datasetname"` (uses currently-logged-in
146-
Hugging Face user's namespace). Cannot be used together with `bucket_id`.
142+
Deprecated. Use `bucket_id` instead.
147143
bucket_id (`str`, *optional*):
148144
The ID of the Hugging Face Bucket to use for metric persistence. By default,
149-
when a `space_id` is provided and neither `dataset_id` nor `bucket_id` is
150-
explicitly set, a bucket is auto-generated from the space_id. Buckets provide
145+
when a `space_id` is provided and `bucket_id` is not explicitly set, a
146+
bucket is auto-generated from the space_id. Buckets provide
151147
S3-like storage without git overhead - the SQLite database is stored directly
152148
via `hf-mount` in the Space. Specify a Bucket with name like
153149
`"username/bucketname"` or just `"bucketname"`.

trackio/cli.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,7 @@ def main():
223223
"--sdk",
224224
choices=["gradio", "static"],
225225
default="gradio",
226-
help="The type of Space to deploy. 'gradio' (default) deploys a live Gradio server. 'static' deploys a static Space that reads from an HF Dataset.",
226+
help="The type of Space to deploy. 'gradio' (default) deploys a live Gradio server. 'static' deploys a static Space that reads from an HF Bucket.",
227227
)
228228

229229
list_parser = subparsers.add_parser(

trackio/deploy.py

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -288,10 +288,10 @@ def create_space_if_not_exists(
288288
space_storage ([`~huggingface_hub.SpaceStorage`], *optional*):
289289
Choice of persistent storage tier for the Space.
290290
dataset_id (`str`, *optional*):
291-
The ID of the Dataset to add to the Space as a space variable.
291+
Deprecated. Use `bucket_id` instead.
292292
bucket_id (`str`, *optional*):
293293
Full Hub bucket id (`namespace/name`) to attach via the Hub volumes API (platform mount).
294-
Sets `TRACKIO_DIR` to the mount path; do not combine with `dataset_id`.
294+
Sets `TRACKIO_DIR` to the mount path.
295295
private (`bool`, *optional*):
296296
Whether to make the Space private. If `None` (default), the repo will be
297297
public unless the organization's default is private. This value is ignored
@@ -725,14 +725,13 @@ def sync(
725725
If `False`, all the steps will be run synchronously.
726726
sdk (`str`, *optional*, defaults to `"gradio"`):
727727
The type of Space to deploy. `"gradio"` deploys a Gradio Space with a live
728-
server. `"static"` deploys a static Space that reads from an HF Dataset
729-
or HF Bucket (no server needed).
728+
server. `"static"` deploys a static Space that reads from an HF Bucket
729+
(no server needed).
730730
dataset_id (`str`, *optional*):
731-
The ID of the HF Dataset to sync to. When provided, uses the legacy
732-
Dataset backend instead of Buckets.
731+
Deprecated. Use `bucket_id` instead.
733732
bucket_id (`str`, *optional*):
734733
The ID of the HF Bucket to sync to. By default, a bucket is auto-generated
735-
from the space_id. Set `dataset_id` to use the legacy Dataset backend instead.
734+
from the space_id.
736735
Returns:
737736
`str`: The Space ID of the synced project.
738737
"""

trackio/imports.py

Lines changed: 2 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -42,16 +42,7 @@ def import_csv(
4242
Space does not exist, it will be created. If the Space already exists, the
4343
project will be logged to it.
4444
dataset_id (`str`, *optional*):
45-
If provided, a persistent Hugging Face Dataset will be created and the
46-
metrics will be synced to it every 5 minutes. Should be a complete Dataset
47-
name like `"username/datasetname"` or `"orgname/datasetname"`, or just
48-
`"datasetname"` in which case the Dataset will be created in the
49-
currently-logged-in Hugging Face user's namespace. If the Dataset does not
50-
exist, it will be created. If the Dataset already exists, the project will
51-
be appended to it. If not provided, the metrics will be logged to a local
52-
SQLite database, unless a `space_id` is provided, in which case a Dataset
53-
will be automatically created with the same name as the Space but with the
54-
`"_dataset"` suffix.
45+
Deprecated. Use `bucket_id` instead.
5546
private (`bool`, *optional*):
5647
Whether to make the Space private. If None (default), the repo will be
5748
public unless the organization's default is private. This value is ignored
@@ -182,16 +173,7 @@ def import_tf_events(
182173
Space does not exist, it will be created. If the Space already exists, the
183174
project will be logged to it.
184175
dataset_id (`str`, *optional*):
185-
If provided, a persistent Hugging Face Dataset will be created and the
186-
metrics will be synced to it every 5 minutes. Should be a complete Dataset
187-
name like `"username/datasetname"` or `"orgname/datasetname"`, or just
188-
`"datasetname"` in which case the Dataset will be created in the
189-
currently-logged-in Hugging Face user's namespace. If the Dataset does not
190-
exist, it will be created. If the Dataset already exists, the project will
191-
be appended to it. If not provided, the metrics will be logged to a local
192-
SQLite database, unless a `space_id` is provided, in which case a Dataset
193-
will be automatically created with the same name as the Space but with the
194-
`"_dataset"` suffix.
176+
Deprecated. Use `bucket_id` instead.
195177
private (`bool`, *optional*):
196178
Whether to make the Space private. If None (default), the repo will be
197179
public unless the organization's default is private. This value is ignored

0 commit comments

Comments
 (0)