Skip to content

Commit 055202c

Browse files
committed
Initial pass at full documentation for Conda-based dependency development.
- Add a new advanced topic for developing tools with Conda dependencies with 4 main sections: 1. Using the planemo conda commands such conda_init, conda_install, conda_env, lint --conda_requirements, and test. 2. Finding existing Conda packages - using the Anaconda site, conda search, or a new "planemo conda_search" command for searching best practice channels. 3. A formal exercise based on the tools-iuc pear.xml tool. 4. A short stub of a section containing resources on building and contributing Conda recipes. - Add a project template for downloading the completed intro seqtk_seq.xml example for testing out planemo conda commands. - Add a project template for a Conda exercise based on the pear tool from tools-iuc. - Add a ``planemo conda_search`` command for searching best practice channels from the command line. I think this can serve as the template and example for the first third of the "Conda and Containers - A Developer Perspective" workshop at the GCC 2017.
1 parent db73773 commit 055202c

18 files changed

Lines changed: 870 additions & 3 deletions
Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
Dependencies and Conda
2+
===========================================
3+
4+
----------------------------------------------------------------
5+
Specifying and Using Tool Requirements
6+
----------------------------------------------------------------
7+
8+
.. note:: This document discusses using Conda to satisfy tool dependencies from a tool developer
9+
perspective. An in depth discussion of using Conda to satisfy dependencies from an
10+
admistrator's perspective can be found `here <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html>`__.
11+
That document also serves as good background for this discussion.
12+
13+
.. note:: Planemo requires a Conda installation to target with its various Conda
14+
related commands. A properly configured Conda installation can be initialized
15+
with the ``conda_init`` command. This should only need to be executed once
16+
per development machine.
17+
18+
::
19+
20+
$ planemo conda_init
21+
wget -q --recursive -O '/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/conda_installLuGDHE.sh' 'https://repo.continuum.io/miniconda/Miniconda3-4.2.12-MacOSX-x86_64.sh' && bash '/var/folders/78/zxz5mz4d0jn53xf0l06j7ppc0000gp/T/conda_installLuGDHE.sh' -b -p '/Users/john/miniconda2' && /Users/john/miniconda2/bin/conda install -y -q conda=4.2.13
22+
PREFIX=/Users/john/miniconda2
23+
installing: python-3.5.2-0 ...
24+
installing: conda-env-2.6.0-0 ...
25+
installing: openssl-1.0.2j-0 ...
26+
installing: pycosat-0.6.1-py35_1 ...
27+
installing: readline-6.2-2 ...
28+
installing: requests-2.11.1-py35_0 ...
29+
installing: ruamel_yaml-0.11.14-py35_0 ...
30+
installing: sqlite-3.13.0-0 ...
31+
installing: tk-8.5.18-0 ...
32+
installing: xz-5.2.2-0 ...
33+
installing: yaml-0.1.6-0 ...
34+
installing: zlib-1.2.8-3 ...
35+
installing: conda-4.2.12-py35_0 ...
36+
installing: pycrypto-2.6.1-py35_4 ...
37+
installing: pip-8.1.2-py35_0 ...
38+
installing: wheel-0.29.0-py35_0 ...
39+
installing: setuptools-27.2.0-py35_0 ...
40+
Python 3.5.2 :: Continuum Analytics, Inc.
41+
creating default environment...
42+
installation finished.
43+
Fetching package metadata .......
44+
Solving package specifications: ..........
45+
46+
Package plan for installation in environment /Users/john/miniconda2:
47+
48+
The following packages will be downloaded:
49+
50+
package | build
51+
---------------------------|-----------------
52+
conda-4.2.13 | py35_0 389 KB
53+
54+
The following packages will be UPDATED:
55+
56+
conda: 4.2.12-py35_0 --> 4.2.13-py35_0
57+
58+
Conda installation succeeded - Conda is available at '/Users/john/miniconda2/bin/conda'
59+
60+
While Galaxy can be configured to resolve dependencies various ways, Planemo
61+
is configured with opinionated defaults geared at making building tools that
62+
target Conda_ as easy as possible.
63+
64+
During the introductory tool development tutorial, we called ``planemo tool_init``
65+
with the argument ``--requirement seqtk@1.2`` and the resulting tool contained
66+
the XML::
67+
68+
<requirements>
69+
<requirement type="package" version="1.2">seqtk</requirement>
70+
</requirements>
71+
72+
As configured by Planemo, when Galaxy encounters these ``requirement`` tags it
73+
will attempt to install Conda, check for referenced packages (such as
74+
``seqtk``), and install them as needed for tool testing.
75+
76+
We can check if the requirements on a tool are available in best practice
77+
Conda channels using an extended form of the ``planemo lint`` command. Passing
78+
``--conda_requirements`` flag will ensure all listed requirements are found.
79+
80+
::
81+
82+
$ planemo lint --conda_requirements seqtk_seq.xml
83+
Linting tool /Users/john/workspace/planemo/docs/writing/seqtk_seq_v6.xml
84+
...
85+
Applying linter requirements_in_conda... CHECK
86+
.. INFO: Requirement [seqtk@1.2] matches target in best practice Conda channel [bioconda].
87+
88+
89+
.. note:: You can download the final version of the seqtk from the Planemo tutorial using
90+
the command::
91+
92+
$ planemo project_init --template=seqtk_complete seqtk_example
93+
$ cd seqtk_example
94+
95+
We can verify these tool requirements install with the ``conda_install`` command. With
96+
its default parameters ``conda_install`` processes tools and creates isolated environments
97+
for their declared requirements.
98+
99+
::
100+
101+
$ planemo conda_install seqtk_seq.xml
102+
Install conda target CondaTarget[seqtk,version=1.2]
103+
/home/john/miniconda2/bin/conda create -y --name __seqtk@1.2 seqtk=1.2
104+
Fetching package metadata ...............
105+
Solving package specifications: ..........
106+
107+
Package plan for installation in environment /home/john/miniconda2/envs/__seqtk@1.2:
108+
109+
The following packages will be downloaded:
110+
111+
package | build
112+
---------------------------|-----------------
113+
seqtk-1.2 | 0 29 KB bioconda
114+
115+
The following NEW packages will be INSTALLED:
116+
117+
seqtk: 1.2-0 bioconda
118+
zlib: 1.2.8-3
119+
120+
Fetching packages ...
121+
seqtk-1.2-0.ta 100% |#############################################################| Time: 0:00:00 444.71 kB/s
122+
Extracting packages ...
123+
[ COMPLETE ]|################################################################################| 100%
124+
Linking packages ...
125+
[ COMPLETE ]|################################################################################| 100%
126+
#
127+
# To activate this environment, use:
128+
# > source activate __seqtk@1.2
129+
#
130+
# To deactivate this environment, use:
131+
# > source deactivate __seqtk@1.2
132+
#
133+
$ which seqtk
134+
seqtk not found
135+
$
136+
137+
The above install worked properly, but seqtk is not on your ``PATH`` because this merely
138+
created an environment within the Conda directory for the seqtk installation. Planemo
139+
will configure Galaxy to exploit this installation. If you wish to interactively explore
140+
the resulting enviornment to explore the installed tool or produce test data the output
141+
of the ``conda_env`` command can be sourced.
142+
143+
::
144+
145+
$ . <(planemo conda_env seqtk_seq.xml)
146+
Deactivate environment with conda_env_deactivate
147+
(seqtk_seq) $ which seqtk
148+
/home/planemo/miniconda2/envs/jobdepsiJClEUfecc6d406196737781ff4456ec60975c137e04884e4f4b05dc68192f7cec4656/bin/seqtk
149+
(seqtk_seq) $ seqtk seq
150+
151+
Usage: seqtk seq [options] <in.fq>|<in.fa>
152+
153+
Options: -q INT mask bases with quality lower than INT [0]
154+
-X INT mask bases with quality higher than INT [255]
155+
-n CHAR masked bases converted to CHAR; 0 for lowercase [0]
156+
-l INT number of residues per line; 0 for 2^32-1 [0]
157+
-Q INT quality shift: ASCII-INT gives base quality [33]
158+
-s INT random seed (effective with -f) [11]
159+
-f FLOAT sample FLOAT fraction of sequences [1]
160+
-M FILE mask regions in BED or name list FILE [null]
161+
-L INT drop sequences with length shorter than INT [0]
162+
-c mask complement region (effective with -M)
163+
-r reverse complement
164+
-A force FASTA output (discard quality)
165+
-C drop comments at the header lines
166+
-N drop sequences containing ambiguous bases
167+
-1 output the 2n-1 reads only
168+
-2 output the 2n reads only
169+
-V shift quality by '(-Q) - 33'
170+
-U convert all bases to uppercases
171+
-S strip of white spaces in sequences
172+
(seqtk_seq) $ conda_env_deactivate
173+
$
174+
175+
As shown above the ``conda_env_deactivate`` will be created in this environment and can
176+
be used to restore your initial shell configuration.
177+
178+
Confident the underlying application works, we can now use ``planemo test`` or
179+
``planemo serve`` and it will reuse this environment and find our dependency (in this
180+
case ``seqtk`` as needed).
181+
182+
Here is a portion of the output from the testing command ``planemo test seqtk_seq.xml``
183+
demonstrating using this tool.
184+
185+
::
186+
187+
$ planemo test seqtk_seq.xml
188+
...
189+
2017-02-22 10:13:28,902 INFO [galaxy.tools.actions] Handled output named output1 for tool seqtk_seq (20.136 ms)
190+
2017-02-22 10:13:28,914 INFO [galaxy.tools.actions] Added output datasets to history (12.782 ms)
191+
2017-02-22 10:13:28,935 INFO [galaxy.tools.actions] Verified access to datasets for Job[unflushed,tool_id=seqtk_seq] (10.954 ms)
192+
2017-02-22 10:13:28,936 INFO [galaxy.tools.actions] Setup for job Job[unflushed,tool_id=seqtk_seq] complete, ready to flush (21.053 ms)
193+
2017-02-22 10:13:28,962 INFO [galaxy.tools.actions] Flushed transaction for job Job[id=2,tool_id=seqtk_seq] (26.510 ms)
194+
2017-02-22 10:13:29,064 INFO [galaxy.jobs.handler] (2) Job dispatched
195+
2017-02-22 10:13:29,281 DEBUG [galaxy.tools.deps] Using dependency seqtk version 1.2 of type conda
196+
2017-02-22 10:13:29,282 DEBUG [galaxy.tools.deps] Using dependency seqtk version 1.2 of type conda
197+
2017-02-22 10:13:29,317 INFO [galaxy.jobs.command_factory] Built script [/tmp/tmpLvKwta/job_working_directory/000/2/tool_script.sh] for tool command [[ "$CONDA_DEFAULT_ENV" = "/Users/john/miniconda2/envs/__seqtk@1.2" ] || . /Users/john/miniconda2/bin/activate '/Users/john/miniconda2/envs/__seqtk@1.2' >conda_activate.log 2>&1 ; seqtk seq -a '/tmp/tmpLvKwta/files/000/dataset_1.dat' > '/tmp/tmpLvKwta/files/000/dataset_2.dat']
198+
2017-02-22 10:13:29,516 DEBUG [galaxy.tools.deps] Using dependency samtools version None of type conda
199+
2017-02-22 10:13:29,516 DEBUG [galaxy.tools.deps] Using dependency samtools version None of type conda
200+
ok
201+
202+
----------------------------------------------------------------------
203+
XML: /private/tmp/tmpLvKwta/xunit.xml
204+
----------------------------------------------------------------------
205+
Ran 1 test in 15.936s
206+
207+
OK
208+
2017-02-22 10:13:37,014 INFO [test_driver] Shutting down
209+
2017-02-22 10:13:37,014 INFO [test_driver] Shutting down embedded galaxy web server
210+
2017-02-22 10:13:37,016 INFO [test_driver] Embedded web server galaxy stopped
211+
2017-02-22 10:13:37,017 INFO [test_driver] Stopping application galaxy
212+
....
213+
2017-02-22 10:13:37,018 INFO [galaxy.jobs.handler] sending stop signal to worker thread
214+
2017-02-22 10:13:37,018 INFO [galaxy.jobs.handler] job handler stop queue stopped
215+
Testing complete. HTML report is in "/Users/john/workspace/planemo/project_templates/seqtk_complete/tool_test_output.html".
216+
All 1 test(s) executed passed.
217+
seqtk_seq[0]: passed
218+
219+
In this case the tests passed and the line containing ``[galaxy.tools.deps] Using dependency seqtk version 1.2 of type conda``
220+
indicates Galaxy dependency resolution was successful and it found the environment we previously installed with ``conda_install``.
221+
222+
----------------------------------------------------------------
223+
Finding Existing Conda Packages
224+
----------------------------------------------------------------
225+
226+
How did we know what software name and software version to use? We found the existing
227+
packages available for Conda and referenced them. To do this yourself, you can simply
228+
use the planemo command ``conda_search``. If we do a search for ``seqt`` it will show
229+
all the software and all the versions available matching that search term - including
230+
``seqtk``.
231+
232+
::
233+
234+
$ planemo conda_search seqt
235+
Fetching package metadata ...............
236+
seqtk r75 0 bioconda
237+
r82 0 bioconda
238+
r93 0 bioconda
239+
1.2 0 bioconda
240+
241+
.. note:: The Planemo command ``conda_search`` is a light wrapper around the underlying
242+
``conda search`` command but configured to use the same channels and other options as
243+
Planemo and Galaxy. The following Conda command would also work to search::
244+
245+
$ $HOME/miniconda3/bin/conda -c bioconda -c conda-forge -c iuc seqt
246+
247+
Alternatively the Anaconda_ website can be used to search for packages. Typing ``seqtk``
248+
into the search form on that page and clicking the top result will bring on to `this page
249+
https://anaconda.org/bioconda/seqtk`__ with information about the Bioconda package.
250+
251+
When using the website to search though, you need to aware of what channel you are using. By
252+
default, Planemo and Galaxy will search a few different Conda channels. While it is possible
253+
to configure a local Planemo or Galaxy to target different channels - the current best practice
254+
it to add tools to the existing channels.
255+
256+
The existing channels include:
257+
258+
* Bioconda (`github <https://github.com/bioconda/bioconda-recipes>`__ | `conda <https://anaconda.org/bioconda>`__) - best practice channel for various bioinformatics packages.
259+
* Conda-Forge (`github <https://github.com/conda-forge/staged-recipes>`__ | `conda <https://anaconda.org/conda-forge>`__) - best practice channel for general purpose and widely useful computing packages and libraries.
260+
* iuc (`github <https://github.com/galaxyproject/conda-iuc>`__ | `conda <https://anaconda.org/iuc>`__) - best practice channel for other more Galaxy specific packages.
261+
262+
----------------------------------------------------------------
263+
Exercise - Leveraging Bioconda
264+
----------------------------------------------------------------
265+
266+
Use the ``project_init`` command to download this exercise.
267+
268+
::
269+
270+
$ planemo project_init --template conda_exercise conda_exercise
271+
$ cd conda_exercise
272+
$ ls
273+
pear.xml test-data
274+
275+
This will download a tool for `PEAR - Paired-End reAd mergeR
276+
<http://sco.h-its.org/exelixis/web/software/pear/>`__. This tool however has
277+
``requirement`` tags and so will not work properly.
278+
279+
1. Run ``planemo test pear.xml`` to verify the tool does not function
280+
without dependencies defined.
281+
1. Use ``--conda_requirements`` flag with ``planemo lint`` to verify it does
282+
indeed lack requirements.
283+
1. Use ``planemo conda_search`` or the Anaconda_ website to search for the
284+
correct package and version in a best practice channel.
285+
1. Update ``pear.xml`` with the correct ``requirement`` tags.
286+
1. Re-run the ``lint`` command from above to verify the tool now has the
287+
correct dependency definition.
288+
1. Re-run the ``test`` command from above to verify the tool test now
289+
works properly.
290+
291+
----------------------------------------------------------------
292+
Building New Conda Packages
293+
----------------------------------------------------------------
294+
295+
Frequently packages your tool will require are not found in Bioconda_
296+
or conda-forge yet. In these cases, it is likely best to contribute
297+
your package to one of these projects. Unless the tool is exceedingly
298+
general Bioconda_ is usually the correct starting point.
299+
300+
.. note:: Many things that are not strictly or even remotely "bio" have
301+
been accepted into Bioconda_ - including tools for image analysis,
302+
natural language processing, and cheminformatics.
303+
304+
At this time, the most relevant source for information on building Conda packages for Galaxy
305+
is probably the Bioconda_ documentation - in particular check out the `contributing documentation
306+
<https://bioconda.github.io/contributing.html>`__.
307+
308+
----------------------------------------------------------------
309+
Exercise - Package a Tool
310+
----------------------------------------------------------------
311+
312+
1. Package a tool for Bioconda.
313+
314+
.. _Bioconda: https://github.com/bioconda/bioconda-recipes
315+
.. _Conda: https://conda.io/docs/
316+
.. _Anaconda: https://anaconda.org/
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Dependencies and Docker
2+
===========================================
3+

docs/_writing_using_seqtk.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,7 @@ install Seqtk - but however you obtain it should be fine.
77

88
::
99

10-
$ conda config --add channels r
11-
$ conda config --add channels bioconda
12-
$ conda install seqtk
10+
$ conda install -c bioconda seqtk=1.2
1311
... seqtk installation ...
1412
$ seqtk seq
1513
Usage: seqtk seq [options] <in.fq>|<in.fa>

docs/commands/conda_search.rst

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
2+
``conda_search`` command
3+
======================================
4+
5+
This section is auto-generated from the help text for the planemo command
6+
``conda_search``. This help message can be generated with ``planemo conda_search
7+
--help``.
8+
9+
**Usage**::
10+
11+
planemo conda_search [OPTIONS] TERM
12+
13+
**Help**
14+
15+
Perform conda search with Planemo's conda.
16+
17+
Implicitly adds channels Planemo is configured with.
18+
19+
**Options**::
20+
21+
22+
--conda_prefix DIRECTORY Conda prefix to use for conda dependency
23+
commands.
24+
--conda_exec PATH Location of conda executable.
25+
--conda_debug Enable more verbose conda logging.
26+
--conda_channels, --conda_ensure_channels TEXT
27+
Ensure conda is configured with specified
28+
comma separated list of channels.
29+
--help Show this message and exit.
30+

docs/planemo.commands.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,14 @@ planemo.commands.cmd_conda_lint module
100100
:undoc-members:
101101
:show-inheritance:
102102

103+
planemo.commands.cmd_conda_search module
104+
----------------------------------------
105+
106+
.. automodule:: planemo.commands.cmd_conda_search
107+
:members:
108+
:undoc-members:
109+
:show-inheritance:
110+
103111
planemo.commands.cmd_config_init module
104112
---------------------------------------
105113

@@ -204,6 +212,14 @@ planemo.commands.cmd_normalize module
204212
:undoc-members:
205213
:show-inheritance:
206214

215+
planemo.commands.cmd_open module
216+
--------------------------------
217+
218+
.. automodule:: planemo.commands.cmd_open
219+
:members:
220+
:undoc-members:
221+
:show-inheritance:
222+
207223
planemo.commands.cmd_profile_create module
208224
------------------------------------------
209225

0 commit comments

Comments
 (0)