Merge pull request #206 from nf-core/indices-documentation-update

apeltzer · web-flow · commit 3d42d0c689fe · 2019-04-16T10:31:26.000+02:00
Documented new reference indices flag system
diff --git a/docs/usage.md b/docs/usage.md
@@ -5,7 +5,7 @@
 <!-- Install Atom plugin markdown-toc-auto for this ToC to auto-update on save -->
 <!-- TOC START min:2 max:3 link:true asterisk:true update:true -->
 * [Table of contents](#table-of-contents)
-* [Introduction](#introduction)
+* [Introduction](#general-nextflow-info)
 * [Running the pipeline](#running-the-pipeline)
 * [Updating the pipeline](#updating-the-pipeline)
 * [Reproducibility](#reproducibility)
@@ -174,12 +174,14 @@ A normal glob pattern, enclosed in quotation marks, can then be used for `--read
 ```
 
 ### `--fasta`
-If you prefer, you can specify the full path to your reference genome when you run the pipeline:
+You specify the full path to your reference genome here. The FASTA file can have any file suffix, such as `.fasta`, `.fna`, `.fa`, `.FastA` etc. You may also supply a gzipped reference files, which will be unzipped automatically for you. 
+
+For example:
 
 ```bash
---fasta '[path to Fasta reference]'
+--fasta '/<path>/<to>/my_reference.fasta'
 ```
-> If you don't specify appropriate `--bwa_index`, `--fasta_index` parameters, the pipeline will create these indices for you automatically. Note, that saving these for later has to be turned on using `--saveReference`. You may also specify the path to a gzipped (`*.gz` file extension) FastA as reference genome - this will be uncompressed by the pipeline automatically for you. Note that other file extensions such as `.fna`, `.fa` are also supported but will be renamed to `.fasta` automatically by the pipeline.
+> If you don't specify appropriate `--bwa_index`, `--fasta_index` parameters (see [below](#optional-reference-options)), the pipeline will create these indices for you automatically. Note that you can save the indices created for you for later by giving the `--saveReference` flag.
 
 ### `--large_ref`
 
@@ -220,23 +222,55 @@ params {
 }
 ```
 
-### Optional Reference Utility Files
+## Optional Reference Options
+
+### Generating Fresh Indices
+
+#### `--saveReference`
+
+Use this if you do not have pre-made reference FASTA indices for `bwa`, `samtools` and `picard`. If you turn this on, the indices EAGER2 generates for you will be stored in the `<your_output_dir>/results/reference_genomes` for you. 
+
+### Premade Indices
+
+Supplying pre-made indices saves time in pipeline execution and is especially advised when running multiple times on the same cluster system for example. You can even add a resource specific profile that sets paths to pre-computed reference genomes, saving even time when specifying these.
+
+#### `--bwa_index`
+
+If you want to use pre-existing `bwa index` indices, please supply the path **and file** to the FASTA you also specified in `--fasta` (see above). EAGER2 will automagically detect the index files by searching for the FASTA filename with the corresponding `bwa` index file suffixes.
+
+For example:
+
+```
+nextflow run nf-core/eager \
+-profile test_fna,docker \
+--pairedEnd \
+--reads *{R1,R2}*.fq.gz
+--fasta results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta \
+--bwa_index results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta
+```
+
+> `bwa index` does not give you an option to supply alternative suffixes/names for these indices. Thus, the file names generated by this command _must not_ be changed, otherwise EAGER2 will not be able to find them.
 
-### `--bwa_index`
+#### `--seq_dict`
 
-Use this to specify a _directory_ containing previously created BWA index files. This saves time in pipeline execution and is especially advised when running multiple times on the same cluster system for example. You can even add a resource specific profile that sets paths to pre-computed reference genomes, saving even time when specifying these.
+If you want to use a pre-existing `picard CreateSequenceDictionary` dictionary file, use this to specify the required `.dict` file for the selected reference genome.
 
-### `--seq_dict` false
+For example:
 
-Use this to specify the required sequence dictionary file for the selected reference genome.
+```
+--seq_dict Mammoth_MT_Krause.dict
+```
+
+#### `--fasta_index`
 
-### `--fasta_index` false
+If you want to use a pre-existing `samtools faidx` index, Use this to specify the required FASTA index file for the selected reference genome. This should be generated by `samtools faidx` and has a file suffix of `.fai`
 
-Use this to specify the required FastA index file for the selected reference genome.
+For example:
 
-### `--saveReference` false
+```
+--fasta_index Mammoth_MT_Krause.fasta.fai
+```
 
-If you turn this on, the generated indices will be stored in the `./results/reference_genomes` for you. 
 
 ## Other command line parameters