Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/large-ravens-decide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"trackio": minor
---

feat:Use server-side bucket copy when freezing Spaces
2 changes: 1 addition & 1 deletion docs/source/cli_commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ trackio freeze --space-id "username/my-space" --project "my-project" --new-space
| `--private` | Make the new static Space private |

> **Note:** The source must be a Gradio Space with a bucket mounted at `/data`. If the destination Space already exists and is not a Trackio static Space, `freeze` will refuse to overwrite it.
> The frozen Space is a snapshot. Later metrics synced to the original Gradio Space do not appear in the frozen static Space unless you run `freeze` again.

## List Commands

Expand Down Expand Up @@ -416,4 +417,3 @@ trackio list runs --project "my-project" --json | jq '.runs[] | select(startswit
# Export to file
trackio get run --project "my-project" --run "my-run" --json > run_summary.json
```

2 changes: 2 additions & 0 deletions docs/source/deploy_embed.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ trackio freeze --space-id "username/my-space" --project "my-project"

This creates a new static Space (by default named `{space_id}_static`) containing a snapshot of the project's data from the source Space's bucket. The original Space is not modified.

Unlike `trackio.sync(..., sdk="static")`, `freeze()` is a one-time snapshot. If new metrics are later uploaded to the original Gradio Space, the frozen static Space will not update automatically.
Comment thread
abidlabs marked this conversation as resolved.
Outdated

You can customize the destination:

```py
Expand Down
2 changes: 1 addition & 1 deletion docs/source/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,4 +128,4 @@ trackio.sync(project="my-project", space_id="username/space_id")
</hfoption>
</hfoptions>

This will create the Space if it does not already exist, and upload all runs and associated data to the Space. You can also sync to a lightweight static Space with `sdk="static"`, or create a read-only snapshot of a live Space with [`freeze`](deploy_embed.md#freezing-a-space-snapshot). See the [Deploy and Embed Dashboards](deploy_embed.md) page for more details.
This will create the Space if it does not already exist, and upload all runs and associated data to the Space. You can also sync to a lightweight static Space with `sdk="static"`, or create a read-only snapshot of a live Space with [`freeze`](deploy_embed.md#freezing-a-space-snapshot). A frozen Space is a point-in-time snapshot and will not pick up later metrics from the original Gradio Space unless you freeze again. See the [Deploy and Embed Dashboards](deploy_embed.md) page for more details.
2 changes: 2 additions & 0 deletions examples/convert-gradio-to-static.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
1. Log training metrics locally
2. Sync the project to a live Gradio Space
3. Freeze the Gradio Space into a read-only static Space (no server needed)
4. Keep the frozen Space as a point-in-time snapshot that will not auto-update

Usage:
python examples/convert-gradio-to-static.py
Expand Down Expand Up @@ -46,3 +47,4 @@
space_id=PROJECT, project=PROJECT, new_space_id=f"{PROJECT}_static"
)
print(f"Static snapshot: https://huggingface.co/spaces/{static_space_id}")
print("Future metrics synced to the Gradio Space will not appear here unless you freeze again.")
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"pandas<3.0.0",
"huggingface-hub>=1.9.2,<2",
"huggingface-hub>=1.10.0,<2",
"gradio[oauth]>=6.10.0,<7.0.0",
"numpy<3.0.0",
"pillow<12.0.0",
Expand Down
11 changes: 11 additions & 0 deletions tests/e2e-spaces/test_sync_and_freeze.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,17 @@ def test_sync_gradio_then_freeze_to_static(test_space_id, temp_dir):
assert "loss" in df.columns
assert "acc" in df.columns
assert sorted(df["loss"].tolist()) == [0.1, 0.3, 0.5]

trackio.init(project=project_name, name=run_name)
trackio.log({"loss": 0.05, "acc": 0.95})
trackio.log({"loss": 0.02, "acc": 0.97})
trackio.finish()

deploy.sync(project=project_name, space_id=test_space_id)

frozen_df_after_source_update = _download_parquet_from_bucket(frozen_bucket_id)
assert len(frozen_df_after_source_update) == 3
assert sorted(frozen_df_after_source_update["loss"].tolist()) == [0.1, 0.3, 0.5]
finally:
_cleanup_space(frozen_space_id)
_cleanup_bucket(frozen_bucket_id)
32 changes: 31 additions & 1 deletion tests/unit/test_deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
from huggingface_hub import Volume

from trackio import deploy
from trackio.bucket_storage import _list_bucket_file_paths
from trackio.bucket_storage import (
_list_bucket_file_paths,
export_from_bucket_for_static,
)


@patch("trackio.deploy.huggingface_hub.HfApi")
Expand Down Expand Up @@ -45,3 +48,30 @@ def test_list_bucket_file_paths_uses_list_bucket_tree(mock_list_bucket_tree):
prefix="trackio/media/proj/",
recursive=True,
)


@patch("trackio.bucket_storage.huggingface_hub.download_bucket_files")
@patch("trackio.bucket_storage.copy_files")
@patch("trackio.bucket_storage._export_and_upload_static")
@patch("trackio.bucket_storage._list_bucket_file_paths")
@patch("trackio.bucket_storage._download_db_from_bucket")
def test_export_from_bucket_for_static_copies_media_server_side(
mock_download_db,
mock_list_bucket_file_paths,
mock_export_and_upload_static,
mock_copy_files,
mock_download_bucket_files,
):
mock_download_db.return_value = True
mock_list_bucket_file_paths.return_value = ["trackio/media/proj/image.png"]

export_from_bucket_for_static(
"abidlabs/source-bucket", "abidlabs/dest-bucket", "proj"
)

mock_export_and_upload_static.assert_called_once()
mock_copy_files.assert_called_once_with(
"hf://buckets/abidlabs/source-bucket/trackio/media/proj/",
"hf://buckets/abidlabs/dest-bucket/media/",
)
mock_download_bucket_files.assert_not_called()
41 changes: 19 additions & 22 deletions trackio/bucket_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from pathlib import Path

import huggingface_hub
from huggingface_hub import sync_bucket
from huggingface_hub import copy_files, sync_bucket

from trackio.sqlite_storage import SQLiteStorage
from trackio.utils import MEDIA_DIR, TRACKIO_DIR
Expand Down Expand Up @@ -108,6 +108,22 @@ def _export_and_upload_static(
huggingface_hub.batch_bucket_files(dest_bucket_id, add=files_to_add)


def _copy_project_media_between_buckets(
source_bucket_id: str, dest_bucket_id: str, project: str
) -> None:
source_media_prefix = f"trackio/media/{project}/"
media_to_copy = _list_bucket_file_paths(
source_bucket_id, prefix=source_media_prefix
)
if not media_to_copy:
return

copy_files(
f"hf://buckets/{source_bucket_id}/{source_media_prefix}",
f"hf://buckets/{dest_bucket_id}/media/",
)


def upload_project_to_bucket_for_static(project: str, bucket_id: str) -> None:
if not _local_db_has_data(project):
_download_db_from_bucket(project, bucket_id)
Expand All @@ -131,24 +147,5 @@ def export_from_bucket_for_static(
f"from bucket '{source_bucket_id}'."
)

media_dest = work_path / "media"
source_media_prefix = f"trackio/media/{project}/"
media_to_download = _list_bucket_file_paths(
source_bucket_id, prefix=source_media_prefix
)
if media_to_download:
media_dest.mkdir(parents=True, exist_ok=True)
dl_pairs = []
for remote_path in media_to_download:
rel = remote_path[len(source_media_prefix) :]
local_file = media_dest / rel
local_file.parent.mkdir(parents=True, exist_ok=True)
dl_pairs.append((remote_path, str(local_file)))
huggingface_hub.download_bucket_files(source_bucket_id, files=dl_pairs)

_export_and_upload_static(
project,
dest_bucket_id,
db_path,
media_dest if media_dest.exists() else None,
)
_export_and_upload_static(project, dest_bucket_id, db_path)
_copy_project_media_between_buckets(source_bucket_id, dest_bucket_id, project)
4 changes: 2 additions & 2 deletions trackio/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ def main():

freeze_parser = subparsers.add_parser(
"freeze",
help="Create a new static Space snapshot from a project's data.",
help="Create a one-time static Space snapshot from a project's data.",
)
freeze_parser.add_argument(
"--space-id",
Expand All @@ -238,7 +238,7 @@ def main():
freeze_parser.add_argument(
"--project",
required=True,
help="The name of the project to freeze.",
help="The name of the project to freeze into a static snapshot.",
)
freeze_parser.add_argument(
"--new-space-id",
Expand Down
13 changes: 7 additions & 6 deletions trackio/deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -743,10 +743,10 @@ def sync(
Syncs a local Trackio project's database to a Hugging Face Space.
If the Space does not exist, it will be created. Local data is never deleted.

**Freezing:** Passing ``sdk="static"`` *freezes* the Space: it converts a live Gradio
Space into a static Space backed by an HF Bucket (read-only dashboard, no Gradio
server). You cannot log new metrics to a frozen Space; use a different ``space_id``
or a new Gradio Space for further training runs.
**Freezing:** Passing ``sdk="static"`` deploys a static Space backed by an HF Bucket
(read-only dashboard, no Gradio server). You can sync the same project again later to
refresh that static Space. If you want a one-time snapshot of an existing Gradio Space,
use ``freeze()`` instead.

Args:
project (`str`): The name of the project to upload.
Expand Down Expand Up @@ -863,8 +863,9 @@ def freeze(
"""
Creates a new static Hugging Face Space containing a read-only snapshot of
the data for the specified project from the source Gradio Space. The data is
read from the bucket attached to the source Space, so it always reflects the
remote state. The original Space is not modified.
read from the bucket attached to the source Space at freeze time. The original
Space is not modified, and the new static Space does not automatically reflect
metrics uploaded to the original Gradio Space after the freeze completes.

Args:
space_id (`str`):
Expand Down
Loading