@@ -103,3 +103,128 @@ When a new tool is installed that uses a data table a new entry is added to
103103subdirectory in ` tool_data_path ` (in a subdirectory that has the name of the
104104toolshed). By default this is ` tool-data/toolshed.g2.bx.psu.edu/ ` . Note that
105105these directories will also contain tool data table config files, but they are unused.
106+
107+ ## The ` huggingface ` shared data table
108+
109+ Galaxy tools that consume pre-downloaded Hugging Face models share a single
110+ data table named ` huggingface ` . Using one shared table means admins maintain
111+ one ` .loc ` file and all tools benefit from every registered model entry.
112+
113+ ### Declaring the table
114+
115+ Add the following block to ` tool_data_table_conf.xml ` :
116+
117+ ``` xml
118+ <!-- Hugging Face models -->
119+ <table name =" huggingface" comment_char =" #" allow_duplicate_entries =" False" >
120+ <columns >value, name, pipeline_tag, domain, free_tag, version, path</columns >
121+ <file path =" /opt/galaxy/tool-data/huggingface.loc" />
122+ </table >
123+ ```
124+
125+ Each tool ships a ` tool-data/huggingface.loc.sample ` that uses the same
126+ 7-column layout.
127+
128+ ### Column reference
129+
130+ | # | Column | Purpose |
131+ | ---| --------| ---------|
132+ | 0 | ` value ` | Unique row ID across the whole table |
133+ | 1 | ` name ` | Human-readable label shown in the Galaxy select widget |
134+ | 2 | ` pipeline_tag ` | Model role — see controlled vocabulary below |
135+ | 3 | ` domain ` | Coarse data domain — see controlled vocabulary below |
136+ | 4 | ` free_tag ` | Optional narrowing tag; fallback filter when ` pipeline_tag ` /` domain ` alone are not specific enough |
137+ | 5 | ` version ` | Model version string |
138+ | 6 | ` path ` | Path to the model data, a directory or a specific file, depending on the model structure |
139+
140+ ** ` value ` (column 0)**
141+
142+ Must be globally unique across every row in ` huggingface.loc ` , regardless of
143+ which tool added it. Use the Hugging Face model ID (` <owner>/<model-name> ` )
144+ directly — it is stable and unambiguous. If the same model is registered at
145+ more than one version, append the version:
146+
147+ ```
148+ black-forest-labs/FLUX.1-dev
149+ sentence-transformers/all-MiniLM-L6-v2
150+ openai/whisper-large-v3_3.0
151+ ```
152+
153+ ** ` pipeline_tag ` (column 2)**
154+
155+ Use the official [ Hugging Face pipeline tag] ( https://huggingface.co/models ) .
156+ Common values:
157+
158+ | Value | When to use |
159+ | -------| -------------|
160+ | ` text-to-image ` | Image generation models |
161+ | ` automatic-speech-recognition ` | ASR / transcription models |
162+ | ` feature-extraction ` | Sentence / document embedding models |
163+ | ` tabular-classification ` | Tabular ML classifiers |
164+ | ` tabular-regression ` | Tabular ML regressors |
165+ | ` text-generation ` | Causal / instruction-tuned LLMs |
166+
167+ Do not invent synonyms for existing Hugging Face tags.
168+
169+ ** ` domain ` (column 3)**
170+
171+ A broad category for the data type the model works with:
172+ ` image ` · ` text ` · ` audio ` · ` tabular ` · ` sequence ` · ` video ` · ` multimodal `
173+
174+ ** ` free_tag ` (column 4)**
175+
176+ An optional short identifier used as a fallback narrowing filter when
177+ ` pipeline_tag ` and ` domain ` alone are not specific enough. Because a model
178+ can be consumed by multiple tools, ` free_tag ` must not encode a specific tool
179+ name. Choose a short, lowercase, descriptive value and document it alongside
180+ the tool that introduces it.
181+
182+ ** ` version ` (column 5)**
183+
184+ The model version string. A tool declares in its XML which version(s) it
185+ accepts, allowing multiple versions of the same model to coexist. Where
186+ possible, rows are only added, never removed or edited.
187+
188+ ** ` path ` (column 6)**
189+
190+ The path to the model data on the production server (maintained by admins).
191+ Can be a directory (when the tool reads the whole Hugging Face cache layout)
192+ or a specific file (e.g. a ` .ckpt ` checkpoint).
193+
194+ ### XML filter convention
195+
196+ Filter primarily by ` pipeline_tag ` (column 2) and/or ` domain ` (column 3) so
197+ only relevant model types are shown to the user. Add a ` version ` or
198+ ` free_tag ` filter only when you need to narrow the selection further:
199+
200+ ``` xml
201+ <param name =" model" type =" select" label =" Model" >
202+ <options from_data_table =" huggingface" >
203+ <filter type =" static_value" column =" 2" value =" <pipeline_tag>" />
204+ <filter type =" static_value" column =" 3" value =" <domain>" />
205+ <!-- optional: narrow further by version or free_tag -->
206+ <!-- <filter type="static_value" column="5" value="<version>"/> -->
207+ <!-- <filter type="static_value" column="4" value="<free_tag>"/> -->
208+ </options >
209+ </param >
210+ ```
211+
212+ Example from the Flux tool (filters by ` free_tag ` to restrict to Flux-specific model variants):
213+
214+ ``` xml
215+ <options from_data_table =" huggingface" >
216+ <filter type =" static_value" column =" 4" value =" flux" />
217+ <filter type =" static_value" column =" 5" value =" 1" />
218+ </options >
219+ ```
220+
221+ ### Example ` .loc ` entry
222+
223+ Each row is TAB-separated (7 columns):
224+
225+ ```
226+ # Columns: value <TAB> name <TAB> pipeline_tag <TAB> domain <TAB> free_tag <TAB> version <TAB> path
227+ #
228+ # Flux
229+ black-forest-labs/FLUX.1-dev FLUX.1 [dev] text-to-image image flux 1 /data/hf_models
230+ ```
0 commit comments