@@ -103,3 +103,118 @@ When a new tool is installed that uses a data table a new entry is added to
103103subdirectory in ` tool_data_path ` (in a subdirectory that has the name of the
104104toolshed). By default this is ` tool-data/toolshed.g2.bx.psu.edu/ ` . Note that
105105these directories will also contain tool data table config files, but they are unused.
106+
107+ ## The ` huggingface ` shared data table
108+
109+ Galaxy tools that consume pre-downloaded Hugging Face models share a single
110+ data table named ` huggingface ` . Using one shared table means admins maintain
111+ one ` .loc ` file and all tools benefit from every registered model entry.
112+
113+ ### Declaring the table
114+
115+ Add the following block to ` tool_data_table_conf.xml ` :
116+
117+ ``` xml
118+ <!-- Hugging Face models -->
119+ <table name =" huggingface" comment_char =" #" allow_duplicate_entries =" False" >
120+ <columns >value, name, pipeline_tag, domain, free_tag, version, path</columns >
121+ <file path =" /opt/galaxy/tool-data/huggingface.loc" />
122+ </table >
123+ ```
124+
125+ Each tool ships a ` tool-data/huggingface.loc.sample ` that uses the same
126+ 7-column layout.
127+
128+ ### Column reference
129+
130+ | # | Column | Purpose |
131+ | ---| --------| ---------|
132+ | 0 | ` value ` | Unique row ID across the whole table |
133+ | 1 | ` name ` | Human-readable label shown in the Galaxy select widget |
134+ | 2 | ` pipeline_tag ` | Model role — see controlled vocabulary below |
135+ | 3 | ` domain ` | Coarse data domain — see controlled vocabulary below |
136+ | 4 | ` free_tag ` | Optional narrowing tag; fallback filter when ` pipeline_tag ` /` domain ` alone are not specific enough |
137+ | 5 | ` version ` | Model version string |
138+ | 6 | ` path ` | Path to the model data, a directory or a specific file, depending on the model structure |
139+
140+ ** ` value ` (column 0)**
141+
142+ Must be globally unique across every row in ` huggingface.loc ` , regardless of
143+ which tool added it. Use the Hugging Face model ID (` <owner>/<model-name> ` )
144+ directly — it is stable and unambiguous. If the same model is registered at
145+ more than one version, append the version:
146+
147+ ```
148+ black-forest-labs/FLUX.1-dev
149+ black-forest-labs/FLUX.1-dev_2
150+ ```
151+
152+ ** ` pipeline_tag ` (column 2)**
153+
154+ Use the official [ Hugging Face pipeline tag] ( https://huggingface.co/models ) .
155+ Common values:
156+
157+ | Value | When to use |
158+ | -------| -------------|
159+ | ` text-to-image ` | Image generation models |
160+ | ` automatic-speech-recognition ` | ASR / transcription models |
161+ | ` feature-extraction ` | Sentence / document embedding models |
162+ | ` tabular-classification ` | Tabular ML classifiers |
163+ | ` tabular-regression ` | Tabular ML regressors |
164+ | ` text-generation ` | Causal / instruction-tuned LLMs |
165+
166+ Do not invent synonyms for existing Hugging Face tags.
167+
168+ ** ` domain ` (column 3)**
169+
170+ A broad category for the data type the model works with:
171+ ` image ` · ` text ` · ` audio ` · ` tabular ` · ` sequence ` · ` video ` · ` multimodal `
172+
173+ ** ` free_tag ` (column 4)**
174+
175+ An optional short identifier used as a fallback narrowing filter when
176+ ` pipeline_tag ` and ` domain ` alone are not specific enough. Because a model
177+ can be consumed by multiple tools, ` free_tag ` must not encode a specific tool
178+ name. Choose a short, lowercase, descriptive value and document it alongside
179+ the tool that introduces it.
180+
181+ ** ` version ` (column 5)**
182+
183+ The model version string. A tool declares in its XML which version(s) it
184+ accepts, allowing multiple versions of the same model to coexist. Where
185+ possible, rows are only added, never removed or edited.
186+
187+ ** ` path ` (column 6)**
188+
189+ The path to the model data on the production server (maintained by admins).
190+ Can be a directory (when the tool reads the whole Hugging Face cache layout)
191+ or a specific file (e.g. a ` .ckpt ` checkpoint).
192+
193+ ### XML filter convention
194+
195+ Filter primarily by ` pipeline_tag ` (column 2) and/or ` domain ` (column 3) so
196+ only relevant model types are shown to the user. Add a ` version ` or
197+ ` free_tag ` filter only when you need to narrow the selection further:
198+
199+ ``` xml
200+ <param name =" model" type =" select" label =" Model" >
201+ <options from_data_table =" huggingface" >
202+ <filter type =" static_value" column =" 2" value =" <pipeline_tag>" />
203+ <filter type =" static_value" column =" 3" value =" <domain>" />
204+ <!-- optional: narrow further by version or free_tag -->
205+ <!-- <filter type="static_value" column="5" value="<version>"/> -->
206+ <!-- <filter type="static_value" column="4" value="<free_tag>"/> -->
207+ </options >
208+ </param >
209+ ```
210+
211+ ### Example ` .loc ` entry
212+
213+ Each row is TAB-separated (7 columns):
214+
215+ ```
216+ # Columns: value <TAB> name <TAB> pipeline_tag <TAB> domain <TAB> free_tag <TAB> version <TAB> path
217+ #
218+ # Flux
219+ black-forest-labs/FLUX.1-dev FLUX.1 [dev] text-to-image image flux 1 /data/hf_models
220+ ```
0 commit comments