Skip to content

Commit ea1f4b7

Browse files
Wauplinclaude
andauthored
Support volumes at repo creation and duplication (#4035)
* Support volumes at repo creation and duplication Add `space_volumes` parameter to `create_repo` and `duplicate_repo` in `HfApi`, and wire it up as `--volume`/`-v` in the CLI `repos create` and `repos duplicate` commands. Shared volume parsing logic moved from `jobs.py` to `_cli_utils.py` to avoid duplication. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix tests * update deprecation warnings * unrelevant json key * revert manually added bug * define a space_args list of tuples * some parse_volumes optim --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 993d645 commit ea1f4b7

7 files changed

Lines changed: 238 additions & 166 deletions

File tree

docs/source/en/guides/manage-spaces.md

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -201,16 +201,27 @@ Upgraded hardware will be automatically assigned to your Space once it's built.
201201

202202
**6. Mount volumes in your Space**
203203

204-
You can mount Hub resources (models, datasets, or storage buckets) as volumes in your Space's container. This gives your Space direct filesystem access to these resources without having to download them in your code.
204+
You can mount Hub resources (models, datasets, or storage buckets) as volumes in your Space's container. This gives your Space direct filesystem access to these resources without having to download them in your code. Volumes can be set directly when creating or duplicating a Space:
205205

206206
```py
207207
>>> from huggingface_hub import Volume
208-
>>> api.set_space_volumes(
208+
>>> api.create_repo(
209209
... repo_id=repo_id,
210-
... volumes=[
210+
... repo_type="space",
211+
... space_sdk="gradio",
212+
... space_volumes=[
211213
... Volume(type="model", source="username/my-model", mount_path="/models", read_only=True),
212-
... Volume(type="dataset", source="username/my-dataset", mount_path="/data", read_only=True),
213-
... Volume(type="bucket", source="username/my-bucket", mount_path="/output"),
214+
... Volume(type="bucket", source="username/my-bucket", mount_path="/data"),
215+
... ],
216+
... )
217+
```
218+
```py
219+
>>> api.duplicate_repo(
220+
... from_id=repo_id,
221+
... repo_type="space",
222+
... space_volumes=[
223+
... Volume(type="model", source="username/my-model", mount_path="/models", read_only=True),
224+
... Volume(type="bucket", source="username/my-bucket", mount_path="/data"),
214225
... ],
215226
... )
216227
```
@@ -223,6 +234,19 @@ You can check which volumes are currently mounted via the Space runtime:
223234
[Volume(type='model', source='username/my-model', mount_path='/models', read_only=True), ...]
224235
```
225236

237+
If you need to update volumes on an existing Space, use [`set_space_volumes`]. Note that this replaces all previously mounted volumes.
238+
239+
```py
240+
>>> api.set_space_volumes(
241+
... repo_id=repo_id,
242+
... volumes=[
243+
... Volume(type="model", source="username/my-model", mount_path="/models", read_only=True),
244+
... Volume(type="dataset", source="username/my-dataset", mount_path="/data", read_only=True),
245+
... Volume(type="bucket", source="username/my-bucket", mount_path="/output"),
246+
... ],
247+
... )
248+
```
249+
226250
To remove all volumes from your Space:
227251

228252
```py

docs/source/en/package_reference/cli.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2934,12 +2934,14 @@ $ hf repos create [OPTIONS] REPO_ID
29342934
* `--secrets-file TEXT`: Read in a file of secret environment variables.
29352935
* `-e, --env TEXT`: Set environment variables. E.g. --env ENV=value
29362936
* `--env-file TEXT`: Read in a file of environment variables.
2937+
* `-v, --volume TEXT`: Mount a volume. Format: hf://[TYPE/]SOURCE:/MOUNT_PATH[:ro]. TYPE is one of: models, datasets, spaces, buckets. TYPE defaults to models if omitted. models, datasets and spaces are always mounted read-only. buckets are read+write by default.E.g. -v hf://gpt2:/data or -v hf://datasets/org/ds:/data or -v hf://buckets/org/b:/mnt:ro
29372938
* `--help`: Show this message and exit.
29382939

29392940
Examples
29402941
$ hf repos create my-model
29412942
$ hf repos create my-dataset --repo-type dataset --private
29422943
$ hf repos create my-space --type space --space-sdk gradio --flavor t4-medium --secrets HF_TOKEN -e THEME=dark --protected
2944+
$ hf repos create my-space --type space --space-sdk gradio -v hf://gpt2:/models -v hf://buckets/org/b:/data
29432945

29442946
Learn more
29452947
Use `hf <command> --help` for more information about a command.
@@ -3040,11 +3042,13 @@ $ hf repos duplicate [OPTIONS] FROM_ID [TO_ID]
30403042
* `--secrets-file TEXT`: Read in a file of secret environment variables.
30413043
* `-e, --env TEXT`: Set environment variables. E.g. --env ENV=value
30423044
* `--env-file TEXT`: Read in a file of environment variables.
3045+
* `-v, --volume TEXT`: Mount a volume. Format: hf://[TYPE/]SOURCE:/MOUNT_PATH[:ro]. TYPE is one of: models, datasets, spaces, buckets. TYPE defaults to models if omitted. models, datasets and spaces are always mounted read-only. buckets are read+write by default.E.g. -v hf://gpt2:/data or -v hf://datasets/org/ds:/data or -v hf://buckets/org/b:/mnt:ro
30433046
* `--help`: Show this message and exit.
30443047

30453048
Examples
30463049
$ hf repos duplicate openai/gdpval --type dataset
30473050
$ hf repos duplicate multimodalart/dreambooth-training my-dreambooth --type space --flavor l4x4 --secrets HF_TOKEN --private
3051+
$ hf repos duplicate org/my-space my-space --type space -v hf://gpt2:/models -v hf://buckets/org/b:/data
30483052

30493053
Learn more
30503054
Use `hf <command> --help` for more information about a command.

src/huggingface_hub/cli/_cli_utils.py

Lines changed: 118 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,8 @@
3131
import typer
3232
from typer.core import TyperCommand, TyperGroup
3333

34-
from huggingface_hub import __version__, constants
34+
from huggingface_hub import Volume, __version__, constants
35+
from huggingface_hub.errors import CLIError
3536
from huggingface_hub.utils import ANSI, get_session, hf_raise_for_status, installation_method, logging, tabulate
3637
from huggingface_hub.utils._dotenv import load_dotenv
3738

@@ -575,6 +576,122 @@ def env_map_to_key_value_list(env_map: dict[str, str | None]) -> list[dict[str,
575576
return [{"key": k, "value": v or ""} for k, v in env_map.items()]
576577

577578

579+
VolumesOpt = Annotated[
580+
list[str] | None,
581+
typer.Option(
582+
"-v",
583+
"--volume",
584+
help="Mount a volume. Format: hf://[TYPE/]SOURCE:/MOUNT_PATH[:ro]. "
585+
"TYPE is one of: models, datasets, spaces, buckets. "
586+
"TYPE defaults to models if omitted. "
587+
"models, datasets and spaces are always mounted read-only. buckets are read+write by default."
588+
"E.g. -v hf://gpt2:/data or -v hf://datasets/org/ds:/data or -v hf://buckets/org/b:/mnt:ro",
589+
),
590+
]
591+
592+
_HF_PREFIX = "hf://"
593+
_HF_VOLUME_TYPES = {
594+
"models": constants.REPO_TYPE_MODEL,
595+
"datasets": constants.REPO_TYPE_DATASET,
596+
"spaces": constants.REPO_TYPE_SPACE,
597+
"buckets": "bucket",
598+
}
599+
600+
601+
def parse_volumes(volumes: list[str] | None) -> "list[Volume] | None":
602+
"""Parse volume specs from CLI arguments.
603+
604+
Format: hf://[TYPE/]SOURCE[/PATH]:/MOUNT_PATH[:ro|:rw]
605+
Where TYPE is one of: models, datasets, spaces, buckets (defaults to models if omitted).
606+
SOURCE is the repo/bucket identifier (e.g. 'username/my-model').
607+
PATH is an optional subfolder inside the repo/bucket.
608+
MOUNT_PATH starts with '/'.
609+
Optional ':ro' or ':rw' suffix for read-only or read-write.
610+
611+
Examples:
612+
hf://gpt2:/data (model, implicit type)
613+
hf://my-org/my-model:/data (model, implicit type)
614+
hf://models/my-org/my-model:/data (model, explicit type)
615+
hf://datasets/my-org/my-dataset:/data:ro
616+
hf://buckets/my-org/my-bucket:/mnt
617+
hf://spaces/my-org/my-space:/app
618+
hf://datasets/org/ds/train:/data (with path inside repo)
619+
hf://buckets/org/b/sub/dir:/mnt (with path inside bucket)
620+
"""
621+
622+
if not volumes:
623+
return None
624+
625+
result: list[Volume] = []
626+
for raw_spec in volumes:
627+
# Strip :ro/:rw suffix
628+
spec = raw_spec
629+
read_only = None
630+
if spec.endswith(":ro"):
631+
read_only = True
632+
spec = spec[:-3]
633+
elif spec.endswith(":rw"):
634+
read_only = False
635+
spec = spec[:-3]
636+
637+
# Validate hf:// prefix
638+
if not spec.startswith(_HF_PREFIX):
639+
raise CLIError(
640+
f"Invalid volume format: '{raw_spec}'. Source must start with 'hf://'. "
641+
f"Expected hf://[TYPE/]SOURCE:/MOUNT_PATH[:ro]. E.g. hf://gpt2:/data"
642+
)
643+
spec = spec[len(_HF_PREFIX) :]
644+
645+
# Find the mount path: look for :/ pattern
646+
colon_slash_idx = spec.find(":/")
647+
if colon_slash_idx == -1:
648+
raise CLIError(
649+
f"Invalid volume format: '{raw_spec}'. Expected hf://[TYPE/]SOURCE:/MOUNT_PATH[:ro]. E.g. hf://gpt2:/data"
650+
)
651+
source_part = spec[:colon_slash_idx]
652+
mount_path = spec[colon_slash_idx + 1 :]
653+
654+
# Parse type from source_part (first segment before /)
655+
# Then split remaining into source (namespace/name or name) and optional path.
656+
slash_idx = source_part.find("/")
657+
if slash_idx == -1:
658+
# No slash: bare source like "gpt2" -> model type
659+
vol_type_str = constants.REPO_TYPE_MODEL
660+
source = source_part
661+
path = None
662+
else:
663+
first_segment = source_part[:slash_idx]
664+
if first_segment in _HF_VOLUME_TYPES:
665+
vol_type_str = _HF_VOLUME_TYPES[first_segment]
666+
remaining = source_part[slash_idx + 1 :]
667+
else:
668+
# First segment isn't a known type -> model type
669+
vol_type_str = constants.REPO_TYPE_MODEL
670+
remaining = source_part
671+
672+
# Split remaining into source (namespace/name) and optional path.
673+
# Repo/bucket IDs are "namespace/name" (2 segments) or "name" (1 segment).
674+
# Any extra segments are the path inside the repo/bucket.
675+
parts = remaining.split("/", 2)
676+
if len(parts) >= 3:
677+
source = parts[0] + "/" + parts[1]
678+
path = parts[2]
679+
else:
680+
source = remaining
681+
path = None
682+
683+
result.append(
684+
Volume(
685+
type=vol_type_str,
686+
source=source,
687+
mount_path=mount_path,
688+
read_only=read_only,
689+
path=path,
690+
)
691+
)
692+
return result
693+
694+
578695
class OutputFormat(str, Enum):
579696
"""Output format for CLI list commands."""
580697

src/huggingface_hub/cli/jobs.py

Lines changed: 7 additions & 118 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@
7575

7676
import typer
7777

78-
from huggingface_hub import SpaceHardware, Volume, constants
78+
from huggingface_hub import SpaceHardware
7979
from huggingface_hub.errors import CLIError, HfHubHTTPError
8080
from huggingface_hub.utils import logging
8181
from huggingface_hub.utils._cache_manager import _format_size
@@ -88,10 +88,12 @@
8888
SecretsFileOpt,
8989
SecretsOpt,
9090
TokenOpt,
91+
VolumesOpt,
9192
_format_cell,
9293
api_object_to_dict,
9394
get_hf_api,
9495
parse_env_map,
96+
parse_volumes,
9597
print_list_output,
9698
typer_factory,
9799
)
@@ -237,18 +239,6 @@ def _parse_namespace_from_job_id(job_id: str, namespace: str | None) -> tuple[st
237239
),
238240
]
239241

240-
VolumesOpt = Annotated[
241-
list[str] | None,
242-
typer.Option(
243-
"-v",
244-
"--volume",
245-
help="Mount a volume. Format: hf://[TYPE/]SOURCE:/MOUNT_PATH[:ro]. "
246-
"TYPE is one of: models, datasets, spaces, buckets. "
247-
"TYPE defaults to models if omitted. "
248-
"models, datasets and spaces are always mounted read-only. buckets are read+write by default."
249-
"E.g. -v hf://gpt2:/data or -v hf://datasets/org/ds:/data or -v hf://buckets/org/b:/mnt:ro",
250-
),
251-
]
252242

253243
CommandArg = Annotated[
254244
list[str],
@@ -318,7 +308,7 @@ def jobs_run(
318308
env=env_map,
319309
secrets=secrets_map,
320310
labels=_parse_labels_map(label),
321-
volumes=_parse_volumes(volume),
311+
volumes=parse_volumes(volume),
322312
flavor=flavor,
323313
timeout=timeout,
324314
namespace=namespace,
@@ -783,7 +773,7 @@ def jobs_uv_run(
783773
env=env_map,
784774
secrets=secrets_map,
785775
labels=_parse_labels_map(label),
786-
volumes=_parse_volumes(volume),
776+
volumes=parse_volumes(volume),
787777
flavor=flavor, # type: ignore[arg-type,misc]
788778
timeout=timeout,
789779
namespace=namespace,
@@ -838,7 +828,7 @@ def scheduled_run(
838828
env=env_map,
839829
secrets=secrets_map,
840830
labels=_parse_labels_map(label),
841-
volumes=_parse_volumes(volume),
831+
volumes=parse_volumes(volume),
842832
flavor=flavor,
843833
timeout=timeout,
844834
namespace=namespace,
@@ -1062,7 +1052,7 @@ def scheduled_uv_run(
10621052
env=env_map,
10631053
secrets=secrets_map,
10641054
labels=_parse_labels_map(label),
1065-
volumes=_parse_volumes(volume),
1055+
volumes=parse_volumes(volume),
10661056
flavor=flavor, # type: ignore[arg-type,misc]
10671057
timeout=timeout,
10681058
namespace=namespace,
@@ -1073,107 +1063,6 @@ def scheduled_uv_run(
10731063
### UTILS
10741064

10751065

1076-
def _parse_volumes(volumes: list[str] | None) -> list[Volume] | None:
1077-
"""Parse volume specs from CLI arguments.
1078-
1079-
Format: hf://[TYPE/]SOURCE[/PATH]:/MOUNT_PATH[:ro|:rw]
1080-
Where TYPE is one of: models, datasets, spaces, buckets (defaults to models if omitted).
1081-
SOURCE is the repo/bucket identifier (e.g. 'username/my-model').
1082-
PATH is an optional subfolder inside the repo/bucket.
1083-
MOUNT_PATH starts with '/'.
1084-
Optional ':ro' or ':rw' suffix for read-only or read-write.
1085-
1086-
Examples:
1087-
hf://gpt2:/data (model, implicit type)
1088-
hf://my-org/my-model:/data (model, implicit type)
1089-
hf://models/my-org/my-model:/data (model, explicit type)
1090-
hf://datasets/my-org/my-dataset:/data:ro
1091-
hf://buckets/my-org/my-bucket:/mnt
1092-
hf://spaces/my-org/my-space:/app
1093-
hf://datasets/org/ds/train:/data (with path inside repo)
1094-
hf://buckets/org/b/sub/dir:/mnt (with path inside bucket)
1095-
"""
1096-
if not volumes:
1097-
return None
1098-
1099-
HF_PREFIX = "hf://"
1100-
HF_TYPES_MAPPING = {
1101-
"models": constants.REPO_TYPE_MODEL,
1102-
"datasets": constants.REPO_TYPE_DATASET,
1103-
"spaces": constants.REPO_TYPE_SPACE,
1104-
"buckets": "bucket",
1105-
}
1106-
1107-
result: list[Volume] = []
1108-
for raw_spec in volumes:
1109-
# Strip :ro/:rw suffix
1110-
spec = raw_spec
1111-
read_only = None
1112-
if spec.endswith(":ro"):
1113-
read_only = True
1114-
spec = spec[:-3]
1115-
elif spec.endswith(":rw"):
1116-
read_only = False
1117-
spec = spec[:-3]
1118-
1119-
# Validate hf:// prefix
1120-
if not spec.startswith(HF_PREFIX):
1121-
raise CLIError(
1122-
f"Invalid volume format: '{raw_spec}'. Source must start with 'hf://'. "
1123-
f"Expected hf://[TYPE/]SOURCE:/MOUNT_PATH[:ro]. E.g. hf://gpt2:/data"
1124-
)
1125-
spec = spec[len(HF_PREFIX) :]
1126-
1127-
# Find the mount path: look for :/ pattern
1128-
colon_slash_idx = spec.find(":/")
1129-
if colon_slash_idx == -1:
1130-
raise CLIError(
1131-
f"Invalid volume format: '{raw_spec}'. Expected hf://[TYPE/]SOURCE:/MOUNT_PATH[:ro]. E.g. hf://gpt2:/data"
1132-
)
1133-
source_part = spec[:colon_slash_idx]
1134-
mount_path = spec[colon_slash_idx + 1 :]
1135-
1136-
# Parse type from source_part (first segment before /)
1137-
# Then split remaining into source (namespace/name or name) and optional path.
1138-
slash_idx = source_part.find("/")
1139-
if slash_idx == -1:
1140-
# No slash: bare source like "gpt2" -> model type
1141-
vol_type_str = constants.REPO_TYPE_MODEL
1142-
source = source_part
1143-
path = None
1144-
else:
1145-
first_segment = source_part[:slash_idx]
1146-
if first_segment in HF_TYPES_MAPPING:
1147-
vol_type_str = HF_TYPES_MAPPING[first_segment]
1148-
remaining = source_part[slash_idx + 1 :]
1149-
else:
1150-
# First segment isn't a known type -> model type
1151-
vol_type_str = constants.REPO_TYPE_MODEL
1152-
remaining = source_part
1153-
1154-
# Split remaining into source (namespace/name) and optional path.
1155-
# Repo/bucket IDs are "namespace/name" (2 segments) or "name" (1 segment).
1156-
# Any extra segments are the path inside the repo/bucket.
1157-
parts = remaining.split("/", 2)
1158-
if len(parts) >= 3:
1159-
source = parts[0] + "/" + parts[1]
1160-
path = parts[2]
1161-
else:
1162-
source = remaining
1163-
path = None
1164-
1165-
result.append(
1166-
Volume(
1167-
type=vol_type_str,
1168-
source=source,
1169-
mount_path=mount_path,
1170-
read_only=read_only,
1171-
path=path,
1172-
)
1173-
)
1174-
return result
1175-
1176-
11771066
def _parse_labels_map(labels: list[str] | None) -> dict[str, str] | None:
11781067
"""Parse label key-value pairs from CLI arguments.
11791068

0 commit comments

Comments
 (0)