|
| 1 | +Dependencies and Conda |
| 2 | +=========================================== |
| 3 | + |
| 4 | +---------------------------------------------------------------- |
| 5 | +Specifying and Using Tool Requirements |
| 6 | +---------------------------------------------------------------- |
| 7 | + |
| 8 | +.. note:: This document discusses using Conda to satisfy tool dependencies from a tool developer |
| 9 | + perspective. An in depth discussion of using Conda to satisfy dependencies from an |
| 10 | + admistrator's perspective can be found `here <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html>`__. |
| 11 | + That document also serves as good background for this discussion. |
| 12 | + |
| 13 | +.. note:: Planemo requires a Conda installation to target with its various Conda |
| 14 | + related commands. A properly configured Conda installation can be initialized |
| 15 | + with the ``conda_init`` command. This should only need to be executed once |
| 16 | + per development machine. |
| 17 | + |
| 18 | + :: |
| 19 | + |
| 20 | + $ planemo conda_init |
| 21 | + wget -q --recursive -O '/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/conda_installLuGDHE.sh' 'https://repo.continuum.io/miniconda/Miniconda3-4.2.12-MacOSX-x86_64.sh' && bash '/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/conda_installLuGDHE.sh' -b -p '/Users/john/miniconda2' && /Users/john/miniconda2/bin/conda install -y -q conda=4.2.13 |
| 22 | + PREFIX=/Users/john/miniconda2 |
| 23 | + installing: python-3.5.2-0 ... |
| 24 | + installing: conda-env-2.6.0-0 ... |
| 25 | + installing: openssl-1.0.2j-0 ... |
| 26 | + installing: pycosat-0.6.1-py35_1 ... |
| 27 | + installing: readline-6.2-2 ... |
| 28 | + installing: requests-2.11.1-py35_0 ... |
| 29 | + installing: ruamel_yaml-0.11.14-py35_0 ... |
| 30 | + installing: sqlite-3.13.0-0 ... |
| 31 | + installing: tk-8.5.18-0 ... |
| 32 | + installing: xz-5.2.2-0 ... |
| 33 | + installing: yaml-0.1.6-0 ... |
| 34 | + installing: zlib-1.2.8-3 ... |
| 35 | + installing: conda-4.2.12-py35_0 ... |
| 36 | + installing: pycrypto-2.6.1-py35_4 ... |
| 37 | + installing: pip-8.1.2-py35_0 ... |
| 38 | + installing: wheel-0.29.0-py35_0 ... |
| 39 | + installing: setuptools-27.2.0-py35_0 ... |
| 40 | + Python 3.5.2 :: Continuum Analytics, Inc. |
| 41 | + creating default environment... |
| 42 | + installation finished. |
| 43 | + Fetching package metadata ....... |
| 44 | + Solving package specifications: .......... |
| 45 | + |
| 46 | + Package plan for installation in environment /Users/john/miniconda2: |
| 47 | + |
| 48 | + The following packages will be downloaded: |
| 49 | + |
| 50 | + package | build |
| 51 | + ---------------------------|----------------- |
| 52 | + conda-4.2.13 | py35_0 389 KB |
| 53 | + |
| 54 | + The following packages will be UPDATED: |
| 55 | + |
| 56 | + conda: 4.2.12-py35_0 --> 4.2.13-py35_0 |
| 57 | + |
| 58 | + Conda installation succeeded - Conda is available at '/Users/john/miniconda2/bin/conda' |
| 59 | + |
| 60 | +While Galaxy can be configured to resolve dependencies various ways, Planemo |
| 61 | +is configured with opinionated defaults geared at making building tools that |
| 62 | +target Conda_ as easy as possible. |
| 63 | + |
| 64 | +During the introductory tool development tutorial, we called ``planemo tool_init`` |
| 65 | +with the argument ``--requirement seqtk@1.2`` and the resulting tool contained |
| 66 | +the XML:: |
| 67 | + |
| 68 | + <requirements> |
| 69 | + <requirement type="package" version="1.2">seqtk</requirement> |
| 70 | + </requirements> |
| 71 | + |
| 72 | +As configured by Planemo, when Galaxy encounters these ``requirement`` tags it |
| 73 | +will attempt to install Conda, check for referenced packages (such as |
| 74 | +``seqtk``), and install them as needed for tool testing. |
| 75 | + |
| 76 | +We can check if the requirements on a tool are available in best practice |
| 77 | +Conda channels using an extended form of the ``planemo lint`` command. Passing |
| 78 | +``--conda_requirements`` flag will ensure all listed requirements are found. |
| 79 | + |
| 80 | +:: |
| 81 | + |
| 82 | + $ planemo lint --conda_requirements seqtk_seq.xml |
| 83 | + Linting tool /Users/john/workspace/planemo/docs/writing/seqtk_seq_v6.xml |
| 84 | + ... |
| 85 | + Applying linter requirements_in_conda... CHECK |
| 86 | + .. INFO: Requirement [seqtk@1.2] matches target in best practice Conda channel [bioconda]. |
| 87 | + |
| 88 | + |
| 89 | +.. note:: You can download the final version of the seqtk from the Planemo tutorial using |
| 90 | + the command:: |
| 91 | + |
| 92 | + $ planemo project_init --template=seqtk_complete seqtk_example |
| 93 | + $ cd seqtk_example |
| 94 | + |
| 95 | +We can verify these tool requirements install with the ``conda_install`` command. With |
| 96 | +its default parameters ``conda_install`` processes tools and creates isolated environments |
| 97 | +for their declared requirements. |
| 98 | + |
| 99 | +:: |
| 100 | + |
| 101 | + $ planemo conda_install seqtk_seq.xml |
| 102 | + Install conda target CondaTarget[seqtk,version=1.2] |
| 103 | + /home/john/miniconda2/bin/conda create -y --name __seqtk@1.2 seqtk=1.2 |
| 104 | + Fetching package metadata ............... |
| 105 | + Solving package specifications: .......... |
| 106 | + |
| 107 | + Package plan for installation in environment /home/john/miniconda2/envs/__seqtk@1.2: |
| 108 | + |
| 109 | + The following packages will be downloaded: |
| 110 | + |
| 111 | + package | build |
| 112 | + ---------------------------|----------------- |
| 113 | + seqtk-1.2 | 0 29 KB bioconda |
| 114 | + |
| 115 | + The following NEW packages will be INSTALLED: |
| 116 | + |
| 117 | + seqtk: 1.2-0 bioconda |
| 118 | + zlib: 1.2.8-3 |
| 119 | + |
| 120 | + Fetching packages ... |
| 121 | + seqtk-1.2-0.ta 100% |#############################################################| Time: 0:00:00 444.71 kB/s |
| 122 | + Extracting packages ... |
| 123 | + [ COMPLETE ]|################################################################################| 100% |
| 124 | + Linking packages ... |
| 125 | + [ COMPLETE ]|################################################################################| 100% |
| 126 | + # |
| 127 | + # To activate this environment, use: |
| 128 | + # > source activate __seqtk@1.2 |
| 129 | + # |
| 130 | + # To deactivate this environment, use: |
| 131 | + # > source deactivate __seqtk@1.2 |
| 132 | + # |
| 133 | + $ which seqtk |
| 134 | + seqtk not found |
| 135 | + $ |
| 136 | + |
| 137 | +The above install worked properly, but seqtk is not on your ``PATH`` because this merely |
| 138 | +created an environment within the Conda directory for the seqtk installation. Planemo |
| 139 | +will configure Galaxy to exploit this installation. If you wish to interactively explore |
| 140 | +the resulting enviornment to explore the installed tool or produce test data the output |
| 141 | +of the ``conda_env`` command can be sourced. |
| 142 | + |
| 143 | +:: |
| 144 | + |
| 145 | + $ . <(planemo conda_env seqtk_seq.xml) |
| 146 | + Deactivate environment with conda_env_deactivate |
| 147 | + (seqtk_seq) $ which seqtk |
| 148 | + /home/planemo/miniconda2/envs/jobdepsiJClEUfecc6d406196737781ff4456ec60975c137e04884e4f4b05dc68192f7cec4656/bin/seqtk |
| 149 | + (seqtk_seq) $ seqtk seq |
| 150 | + |
| 151 | + Usage: seqtk seq [options] <in.fq>|<in.fa> |
| 152 | + |
| 153 | + Options: -q INT mask bases with quality lower than INT [0] |
| 154 | + -X INT mask bases with quality higher than INT [255] |
| 155 | + -n CHAR masked bases converted to CHAR; 0 for lowercase [0] |
| 156 | + -l INT number of residues per line; 0 for 2^32-1 [0] |
| 157 | + -Q INT quality shift: ASCII-INT gives base quality [33] |
| 158 | + -s INT random seed (effective with -f) [11] |
| 159 | + -f FLOAT sample FLOAT fraction of sequences [1] |
| 160 | + -M FILE mask regions in BED or name list FILE [null] |
| 161 | + -L INT drop sequences with length shorter than INT [0] |
| 162 | + -c mask complement region (effective with -M) |
| 163 | + -r reverse complement |
| 164 | + -A force FASTA output (discard quality) |
| 165 | + -C drop comments at the header lines |
| 166 | + -N drop sequences containing ambiguous bases |
| 167 | + -1 output the 2n-1 reads only |
| 168 | + -2 output the 2n reads only |
| 169 | + -V shift quality by '(-Q) - 33' |
| 170 | + -U convert all bases to uppercases |
| 171 | + -S strip of white spaces in sequences |
| 172 | + (seqtk_seq) $ conda_env_deactivate |
| 173 | + $ |
| 174 | + |
| 175 | +As shown above the ``conda_env_deactivate`` will be created in this environment and can |
| 176 | +be used to restore your initial shell configuration. |
| 177 | + |
| 178 | +Confident the underlying application works, we can now use ``planemo test`` or |
| 179 | +``planemo serve`` and it will reuse this environment and find our dependency (in this |
| 180 | +case ``seqtk`` as needed). |
| 181 | + |
| 182 | +Here is a portion of the output from the testing command ``planemo test seqtk_seq.xml`` |
| 183 | +demonstrating using this tool. |
| 184 | + |
| 185 | +:: |
| 186 | + |
| 187 | + $ planemo test seqtk_seq.xml |
| 188 | + ... |
| 189 | + 2017-02-22 10:13:28,902 INFO [galaxy.tools.actions] Handled output named output1 for tool seqtk_seq (20.136 ms) |
| 190 | + 2017-02-22 10:13:28,914 INFO [galaxy.tools.actions] Added output datasets to history (12.782 ms) |
| 191 | + 2017-02-22 10:13:28,935 INFO [galaxy.tools.actions] Verified access to datasets for Job[unflushed,tool_id=seqtk_seq] (10.954 ms) |
| 192 | + 2017-02-22 10:13:28,936 INFO [galaxy.tools.actions] Setup for job Job[unflushed,tool_id=seqtk_seq] complete, ready to flush (21.053 ms) |
| 193 | + 2017-02-22 10:13:28,962 INFO [galaxy.tools.actions] Flushed transaction for job Job[id=2,tool_id=seqtk_seq] (26.510 ms) |
| 194 | + 2017-02-22 10:13:29,064 INFO [galaxy.jobs.handler] (2) Job dispatched |
| 195 | + 2017-02-22 10:13:29,281 DEBUG [galaxy.tools.deps] Using dependency seqtk version 1.2 of type conda |
| 196 | + 2017-02-22 10:13:29,282 DEBUG [galaxy.tools.deps] Using dependency seqtk version 1.2 of type conda |
| 197 | + 2017-02-22 10:13:29,317 INFO [galaxy.jobs.command_factory] Built script [/tmp/tmpLvKwta/job_working_directory/000/2/tool_script.sh] for tool command [[ "$CONDA_DEFAULT_ENV" = "/Users/john/miniconda2/envs/__seqtk@1.2" ] || . /Users/john/miniconda2/bin/activate '/Users/john/miniconda2/envs/__seqtk@1.2' >conda_activate.log 2>&1 ; seqtk seq -a '/tmp/tmpLvKwta/files/000/dataset_1.dat' > '/tmp/tmpLvKwta/files/000/dataset_2.dat'] |
| 198 | + 2017-02-22 10:13:29,516 DEBUG [galaxy.tools.deps] Using dependency samtools version None of type conda |
| 199 | + 2017-02-22 10:13:29,516 DEBUG [galaxy.tools.deps] Using dependency samtools version None of type conda |
| 200 | + ok |
| 201 | + |
| 202 | + ---------------------------------------------------------------------- |
| 203 | + XML: /private/tmp/tmpLvKwta/xunit.xml |
| 204 | + ---------------------------------------------------------------------- |
| 205 | + Ran 1 test in 15.936s |
| 206 | + |
| 207 | + OK |
| 208 | + 2017-02-22 10:13:37,014 INFO [test_driver] Shutting down |
| 209 | + 2017-02-22 10:13:37,014 INFO [test_driver] Shutting down embedded galaxy web server |
| 210 | + 2017-02-22 10:13:37,016 INFO [test_driver] Embedded web server galaxy stopped |
| 211 | + 2017-02-22 10:13:37,017 INFO [test_driver] Stopping application galaxy |
| 212 | + .... |
| 213 | + 2017-02-22 10:13:37,018 INFO [galaxy.jobs.handler] sending stop signal to worker thread |
| 214 | + 2017-02-22 10:13:37,018 INFO [galaxy.jobs.handler] job handler stop queue stopped |
| 215 | + Testing complete. HTML report is in "/Users/john/workspace/planemo/project_templates/seqtk_complete/tool_test_output.html". |
| 216 | + All 1 test(s) executed passed. |
| 217 | + seqtk_seq[0]: passed |
| 218 | + |
| 219 | +In this case the tests passed and the line containing ``[galaxy.tools.deps] Using dependency seqtk version 1.2 of type conda`` |
| 220 | +indicates Galaxy dependency resolution was successful and it found the environment we previously installed with ``conda_install``. |
| 221 | + |
| 222 | +---------------------------------------------------------------- |
| 223 | +Finding Existing Conda Packages |
| 224 | +---------------------------------------------------------------- |
| 225 | + |
| 226 | +How did we know what software name and software version to use? We found the existing |
| 227 | +packages available for Conda and referenced them. To do this yourself, you can simply |
| 228 | +use the planemo command ``conda_search``. If we do a search for ``seqt`` it will show |
| 229 | +all the software and all the versions available matching that search term - including |
| 230 | +``seqtk``. |
| 231 | + |
| 232 | +:: |
| 233 | + |
| 234 | + $ planemo conda_search seqt |
| 235 | + Fetching package metadata ............... |
| 236 | + seqtk r75 0 bioconda |
| 237 | + r82 0 bioconda |
| 238 | + r93 0 bioconda |
| 239 | + 1.2 0 bioconda |
| 240 | + |
| 241 | +.. note:: The Planemo command ``conda_search`` is a light wrapper around the underlying |
| 242 | + ``conda search`` command but configured to use the same channels and other options as |
| 243 | + Planemo and Galaxy. The following Conda command would also work to search:: |
| 244 | + |
| 245 | + $ $HOME/miniconda3/bin/conda -c bioconda -c conda-forge -c iuc seqt |
| 246 | + |
| 247 | +Alternatively the Anaconda_ website can be used to search for packages. Typing ``seqtk`` |
| 248 | +into the search form on that page and clicking the top result will bring on to `this page |
| 249 | +https://anaconda.org/bioconda/seqtk`__ with information about the Bioconda package. |
| 250 | + |
| 251 | +When using the website to search though, you need to aware of what channel you are using. By |
| 252 | +default, Planemo and Galaxy will search a few different Conda channels. While it is possible |
| 253 | +to configure a local Planemo or Galaxy to target different channels - the current best practice |
| 254 | +it to add tools to the existing channels. |
| 255 | + |
| 256 | +The existing channels include: |
| 257 | + |
| 258 | +* Bioconda (`github <https://github.com/bioconda/bioconda-recipes>`__ | `conda <https://anaconda.org/bioconda>`__) - best practice channel for various bioinformatics packages. |
| 259 | +* Conda-Forge (`github <https://github.com/conda-forge/staged-recipes>`__ | `conda <https://anaconda.org/conda-forge>`__) - best practice channel for general purpose and widely useful computing packages and libraries. |
| 260 | +* iuc (`github <https://github.com/galaxyproject/conda-iuc>`__ | `conda <https://anaconda.org/iuc>`__) - best practice channel for other more Galaxy specific packages. |
| 261 | + |
| 262 | +---------------------------------------------------------------- |
| 263 | +Exercise - Leveraging Bioconda |
| 264 | +---------------------------------------------------------------- |
| 265 | + |
| 266 | +Use the ``project_init`` command to download this exercise. |
| 267 | + |
| 268 | +:: |
| 269 | + |
| 270 | + $ planemo project_init --template conda_exercise conda_exercise |
| 271 | + $ cd conda_exercise |
| 272 | + $ ls |
| 273 | + pear.xml test-data |
| 274 | + |
| 275 | +This will download a tool for `PEAR - Paired-End reAd mergeR |
| 276 | +<http://sco.h-its.org/exelixis/web/software/pear/>`__. This tool however has |
| 277 | +``requirement`` tags and so will not work properly. |
| 278 | + |
| 279 | +1. Run ``planemo test pear.xml`` to verify the tool does not function |
| 280 | + without dependencies defined. |
| 281 | +1. Use ``--conda_requirements`` flag with ``planemo lint`` to verify it does |
| 282 | + indeed lack requirements. |
| 283 | +1. Use ``planemo conda_search`` or the Anaconda_ website to search for the |
| 284 | + correct package and version in a best practice channel. |
| 285 | +1. Update ``pear.xml`` with the correct ``requirement`` tags. |
| 286 | +1. Re-run the ``lint`` command from above to verify the tool now has the |
| 287 | + correct dependency definition. |
| 288 | +1. Re-run the ``test`` command from above to verify the tool test now |
| 289 | + works properly. |
| 290 | + |
| 291 | +---------------------------------------------------------------- |
| 292 | +Building New Conda Packages |
| 293 | +---------------------------------------------------------------- |
| 294 | + |
| 295 | +Frequently packages your tool will require are not found in Bioconda_ |
| 296 | +or conda-forge yet. In these cases, it is likely best to contribute |
| 297 | +your package to one of these projects. Unless the tool is exceedingly |
| 298 | +general Bioconda_ is usually the correct starting point. |
| 299 | + |
| 300 | +.. note:: Many things that are not strictly or even remotely "bio" have |
| 301 | + been accepted into Bioconda_ - including tools for image analysis, |
| 302 | + natural language processing, and cheminformatics. |
| 303 | + |
| 304 | +At this time, the most relevant source for information on building Conda packages for Galaxy |
| 305 | +is probably the Bioconda_ documentation - in particular check out the `contributing documentation |
| 306 | +<https://bioconda.github.io/contributing.html>`__. |
| 307 | + |
| 308 | +---------------------------------------------------------------- |
| 309 | +Exercise - Package a Tool |
| 310 | +---------------------------------------------------------------- |
| 311 | + |
| 312 | +1. Package a tool for Bioconda. |
| 313 | + |
| 314 | +.. _Bioconda: https://github.com/bioconda/bioconda-recipes |
| 315 | +.. _Conda: https://conda.io/docs/ |
| 316 | +.. _Anaconda: https://anaconda.org/ |
0 commit comments