Skip to content

Commit d82a7f7

Browse files
[CLI] [API] Add HfApi.copy_files method to copy files remotely and update 'hf buckets cp' (#3874)
* Add HfApi copy_files and remote hf buckets cp support * Adjust copy_files return type and cp output message * remove useless * docs * do not catch * comment * much better * Server-side copies * check remote path * simpler * docs * type * no dummy check * fix imports and types * add todo * simplified * useless * type * all good * review tests * mypy happy * aze * Update src/huggingface_hub/hf_api.py Co-authored-by: célina <hanouticelina@gmail.com> * rename to _build_copy_op * Update src/huggingface_hub/hf_api.py Co-authored-by: célina <hanouticelina@gmail.com> * type ignore * type * parallel download of small files * fix --------- Co-authored-by: célina <hanouticelina@gmail.com>
1 parent 3a8ee52 commit d82a7f7

12 files changed

Lines changed: 695 additions & 56 deletions

File tree

docs/source/en/guides/buckets.md

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -348,6 +348,21 @@ Use [`batch_bucket_files`] to upload files to a bucket. You can upload from loca
348348
... )
349349
```
350350

351+
You can also copy xet files from another bucket or repository using the `copy` parameter. This is a server-side operation — no data is downloaded or re-uploaded:
352+
353+
```python
354+
# Copy files by xet hash (source_repo_type, source_repo_id, xet_hash, destination)
355+
>>> batch_bucket_files(
356+
... "username/my-bucket",
357+
... copy=[
358+
... ("bucket", "username/source-bucket", "<xethash_1>", "models/model.safetensors"),
359+
... ("model", "username/my-model", "<xethash_2>", "models/config.safetensors"),
360+
... ],
361+
... )
362+
```
363+
364+
Xet hashes can be retrieved using `list_repo_tree`.
365+
351366
You can also delete files while uploading others.
352367

353368
```python
@@ -360,7 +375,7 @@ You can also delete files while uploading others.
360375
```
361376

362377
> [!WARNING]
363-
> Calls to [`batch_bucket_files`] are non-transactional. If an error occurs during the process, some files may have been uploaded or deleted while others haven't.
378+
> Calls to [`batch_bucket_files`] are non-transactional. If an error occurs during the process, some files may have been uploaded, copied, or deleted while others haven't.
364379
365380
### Upload a single file with the CLI
366381

@@ -470,6 +485,42 @@ Use `hf buckets sync` to download all files from a bucket to a local directory:
470485

471486
See the [Sync directories](#sync-directories) section below for the full set of sync options.
472487

488+
## Copy files to Bucket
489+
490+
Use [`copy_files`] to copy files already hosted on the Hub to a Bucket:
491+
492+
```py
493+
>>> from huggingface_hub import copy_files
494+
495+
# Bucket to bucket (same or different bucket)
496+
>>> copy_files(
497+
... "hf://buckets/username/source-bucket/checkpoints/model.safetensors",
498+
... "hf://buckets/username/destination-bucket/archive/model.safetensors",
499+
... )
500+
501+
# Repo to bucket
502+
>>> copy_files(
503+
... "hf://datasets/username/my-dataset/processed/",
504+
... "hf://buckets/username/my-bucket/datasets/processed/",
505+
... )
506+
```
507+
508+
The same is available from the CLI:
509+
510+
```bash
511+
# Bucket to bucket
512+
>>> hf buckets cp hf://buckets/username/source-bucket/logs/ hf://buckets/username/destination-bucket/logs/
513+
514+
# Repo to bucket
515+
>>> hf buckets cp hf://username/my-model/config.json hf://buckets/username/my-bucket/models/config.json
516+
```
517+
518+
Notes:
519+
520+
- Bucket-to-repo copy is not yet supported.
521+
- Files tracked with Xet (in buckets or repos) are copied server-side by hash — no data is downloaded or re-uploaded.
522+
- Small text files not tracked with Xet on repo sources are downloaded and re-uploaded to the destination bucket.
523+
473524
## Sync directories
474525

475526
The `hf buckets sync` command (and its API equivalent [`sync_bucket`]) is the most powerful way to transfer files between a local directory and a bucket. It compares source and destination, and only transfers files that have changed.

docs/source/en/guides/cli.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -673,7 +673,7 @@ To filter by prefix, append the prefix to the bucket path:
673673

674674
### Copy single files
675675

676-
Use `hf buckets cp` to copy individual files to and from a bucket. Bucket paths use the `hf://buckets/` prefix.
676+
Use `hf buckets cp` to copy individual files to and from a bucket, or to copy any file hosted on the Hub to a Bucket.
677677

678678
To upload a file:
679679

@@ -703,6 +703,20 @@ You can also stream to stdout or from stdin using `-`:
703703
>>> echo "hello" | hf buckets cp - hf://buckets/username/my-bucket/hello.txt
704704
```
705705

706+
To copy from a repo or a bucket on the Hub:
707+
708+
```bash
709+
# Bucket to bucket
710+
>>> hf buckets cp hf://buckets/username/source-bucket/logs/ hf://buckets/username/archive-bucket/logs/
711+
712+
# Repo to bucket
713+
>>> hf buckets cp hf://datasets/username/my-dataset/data/train/ hf://buckets/username/my-bucket/datasets/train/
714+
```
715+
716+
Notes:
717+
718+
- Bucket-to-repo copy is not supported.
719+
706720
### Sync directories
707721

708722
Use `hf buckets sync` to synchronize directories between your local machine and a bucket. It compares source and destination and transfers only changed files.

docs/source/en/package_reference/cli.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ $ hf buckets [OPTIONS] COMMAND [ARGS]...
208208

209209
**Commands**:
210210

211-
* `cp`: Copy a single file to or from a bucket.
211+
* `cp`: Copy files to or from buckets.
212212
* `create`: Create a new bucket.
213213
* `delete`: Delete a bucket.
214214
* `info`: Get info about a bucket.
@@ -219,7 +219,7 @@ $ hf buckets [OPTIONS] COMMAND [ARGS]...
219219

220220
### `hf buckets cp`
221221

222-
Copy a single file to or from a bucket.
222+
Copy files to or from buckets.
223223

224224
**Usage**:
225225

@@ -229,8 +229,8 @@ $ hf buckets cp [OPTIONS] SRC [DST]
229229

230230
**Arguments**:
231231

232-
* `SRC`: Source: local file, hf://buckets/... path, or - for stdin [required]
233-
* `[DST]`: Destination: local path, hf://buckets/... path, or - for stdout
232+
* `SRC`: Source: local file, HF handle (hf://...), or - for stdin [required]
233+
* `[DST]`: Destination: local path, HF handle (hf://...), or - for stdout
234234

235235
**Options**:
236236

@@ -247,6 +247,8 @@ Examples
247247
$ hf buckets cp my-config.json hf://buckets/user/my-bucket/logs/
248248
$ hf buckets cp my-config.json hf://buckets/user/my-bucket/remote-config.json
249249
$ hf buckets cp - hf://buckets/user/my-bucket/config.json
250+
$ hf buckets cp hf://buckets/user/my-bucket/logs/ hf://buckets/user/archive-bucket/logs/
251+
$ hf buckets cp hf://datasets/user/my-dataset/processed/ hf://buckets/user/my-bucket/dataset/processed/
250252

251253
Learn more
252254
Use `hf <command> --help` for more information about a command.

src/huggingface_hub/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,7 @@
203203
"cancel_job",
204204
"change_discussion_status",
205205
"comment_discussion",
206+
"copy_files",
206207
"create_branch",
207208
"create_bucket",
208209
"create_collection",
@@ -903,6 +904,7 @@
903904
"check_cli_update",
904905
"close_session",
905906
"comment_discussion",
907+
"copy_files",
906908
"create_branch",
907909
"create_bucket",
908910
"create_collection",
@@ -1327,6 +1329,7 @@ def __dir__():
13271329
cancel_job, # noqa: F401
13281330
change_discussion_status, # noqa: F401
13291331
comment_discussion, # noqa: F401
1332+
copy_files, # noqa: F401
13301333
create_branch, # noqa: F401
13311334
create_bucket, # noqa: F401
13321335
create_collection, # noqa: F401

src/huggingface_hub/_buckets.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,21 @@ def __post_init__(self) -> None:
119119
)
120120

121121

122+
@dataclass
123+
class _BucketCopyFile:
124+
destination: str
125+
xet_hash: str
126+
source_repo_type: str # "model", "dataset", "space", "bucket"
127+
source_repo_id: str
128+
size: int | None = field(default=None)
129+
mtime: int = field(init=False)
130+
content_type: str | None = field(init=False)
131+
132+
def __post_init__(self) -> None:
133+
self.content_type = mimetypes.guess_type(self.destination)[0]
134+
self.mtime = int(time.time() * 1000)
135+
136+
122137
@dataclass
123138
class _BucketDeleteFile:
124139
path: str

src/huggingface_hub/cli/buckets.py

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,10 @@
5757
buckets_cli = typer_factory(help="Commands to interact with buckets.")
5858

5959

60+
def _is_hf_handle(path: str) -> bool:
61+
return path.startswith("hf://")
62+
63+
6064
def _parse_bucket_argument(argument: str) -> tuple[str, str]:
6165
"""Parse a bucket argument accepting both 'namespace/name(/prefix)' and 'hf://buckets/namespace/name(/prefix)'.
6266
@@ -928,28 +932,44 @@ def sync(
928932
"hf buckets cp my-config.json hf://buckets/user/my-bucket/logs/",
929933
"hf buckets cp my-config.json hf://buckets/user/my-bucket/remote-config.json",
930934
"hf buckets cp - hf://buckets/user/my-bucket/config.json",
935+
"hf buckets cp hf://buckets/user/my-bucket/logs/ hf://buckets/user/archive-bucket/logs/",
936+
"hf buckets cp hf://datasets/user/my-dataset/processed/ hf://buckets/user/my-bucket/dataset/processed/",
931937
],
932938
)
933939
def cp(
934-
src: Annotated[str, typer.Argument(help="Source: local file, hf://buckets/... path, or - for stdin")],
940+
src: Annotated[str, typer.Argument(help="Source: local file, HF handle (hf://...), or - for stdin")],
935941
dst: Annotated[
936-
str | None, typer.Argument(help="Destination: local path, hf://buckets/... path, or - for stdout")
942+
str | None, typer.Argument(help="Destination: local path, HF handle (hf://...), or - for stdout")
937943
] = None,
938944
quiet: QuietOpt = False,
939945
token: TokenOpt = None,
940946
) -> None:
941-
"""Copy a single file to or from a bucket."""
947+
"""Copy files to or from buckets."""
942948
api = get_hf_api(token=token)
943949

950+
src_is_hf = _is_hf_handle(src)
951+
dst_is_hf = dst is not None and _is_hf_handle(dst)
944952
src_is_bucket = _is_bucket_path(src)
945953
dst_is_bucket = dst is not None and _is_bucket_path(dst)
946954
src_is_stdin = src == "-"
947955
dst_is_stdout = dst == "-"
948956

949-
# --- Validation ---
950-
if src_is_bucket and dst_is_bucket:
951-
raise typer.BadParameter("Remote-to-remote copy not supported.")
957+
# Remote to remote copy
958+
if src_is_hf and dst_is_hf:
959+
if quiet:
960+
disable_progress_bars()
961+
try:
962+
api.copy_files(src, dst) # type: ignore
963+
finally:
964+
if quiet:
965+
enable_progress_bars()
966+
967+
if not quiet:
968+
print(f"Copied: {src} -> {dst}")
969+
return
952970

971+
# Local to remote copy
972+
# --- Validation ---
953973
if not src_is_bucket and not dst_is_bucket and not src_is_stdin:
954974
if dst is None:
955975
raise typer.BadParameter("Missing destination. Provide a bucket path as DST.")

0 commit comments

Comments
 (0)