Skip to content

Commit 3d42d0c

Browse files
authored
Merge pull request #206 from nf-core/indices-documentation-update
Documented new reference indices flag system
2 parents 89ff7e3 + 9bb31fa commit 3d42d0c

1 file changed

Lines changed: 47 additions & 13 deletions

File tree

docs/usage.md

Lines changed: 47 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<!-- Install Atom plugin markdown-toc-auto for this ToC to auto-update on save -->
66
<!-- TOC START min:2 max:3 link:true asterisk:true update:true -->
77
* [Table of contents](#table-of-contents)
8-
* [Introduction](#introduction)
8+
* [Introduction](#general-nextflow-info)
99
* [Running the pipeline](#running-the-pipeline)
1010
* [Updating the pipeline](#updating-the-pipeline)
1111
* [Reproducibility](#reproducibility)
@@ -174,12 +174,14 @@ A normal glob pattern, enclosed in quotation marks, can then be used for `--read
174174
```
175175

176176
### `--fasta`
177-
If you prefer, you can specify the full path to your reference genome when you run the pipeline:
177+
You specify the full path to your reference genome here. The FASTA file can have any file suffix, such as `.fasta`, `.fna`, `.fa`, `.FastA` etc. You may also supply a gzipped reference files, which will be unzipped automatically for you.
178+
179+
For example:
178180

179181
```bash
180-
--fasta '[path to Fasta reference]'
182+
--fasta '/<path>/<to>/my_reference.fasta'
181183
```
182-
> If you don't specify appropriate `--bwa_index`, `--fasta_index` parameters, the pipeline will create these indices for you automatically. Note, that saving these for later has to be turned on using `--saveReference`. You may also specify the path to a gzipped (`*.gz` file extension) FastA as reference genome - this will be uncompressed by the pipeline automatically for you. Note that other file extensions such as `.fna`, `.fa` are also supported but will be renamed to `.fasta` automatically by the pipeline.
184+
> If you don't specify appropriate `--bwa_index`, `--fasta_index` parameters (see [below](#optional-reference-options)), the pipeline will create these indices for you automatically. Note that you can save the indices created for you for later by giving the `--saveReference` flag.
183185
184186
### `--large_ref`
185187

@@ -220,23 +222,55 @@ params {
220222
}
221223
```
222224

223-
### Optional Reference Utility Files
225+
## Optional Reference Options
226+
227+
### Generating Fresh Indices
228+
229+
#### `--saveReference`
230+
231+
Use this if you do not have pre-made reference FASTA indices for `bwa`, `samtools` and `picard`. If you turn this on, the indices EAGER2 generates for you will be stored in the `<your_output_dir>/results/reference_genomes` for you.
232+
233+
### Premade Indices
234+
235+
Supplying pre-made indices saves time in pipeline execution and is especially advised when running multiple times on the same cluster system for example. You can even add a resource specific profile that sets paths to pre-computed reference genomes, saving even time when specifying these.
236+
237+
#### `--bwa_index`
238+
239+
If you want to use pre-existing `bwa index` indices, please supply the path **and file** to the FASTA you also specified in `--fasta` (see above). EAGER2 will automagically detect the index files by searching for the FASTA filename with the corresponding `bwa` index file suffixes.
240+
241+
For example:
242+
243+
```
244+
nextflow run nf-core/eager \
245+
-profile test_fna,docker \
246+
--pairedEnd \
247+
--reads *{R1,R2}*.fq.gz
248+
--fasta results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta \
249+
--bwa_index results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta
250+
```
251+
252+
> `bwa index` does not give you an option to supply alternative suffixes/names for these indices. Thus, the file names generated by this command _must not_ be changed, otherwise EAGER2 will not be able to find them.
224253
225-
### `--bwa_index`
254+
#### `--seq_dict`
226255

227-
Use this to specify a _directory_ containing previously created BWA index files. This saves time in pipeline execution and is especially advised when running multiple times on the same cluster system for example. You can even add a resource specific profile that sets paths to pre-computed reference genomes, saving even time when specifying these.
256+
If you want to use a pre-existing `picard CreateSequenceDictionary` dictionary file, use this to specify the required `.dict` file for the selected reference genome.
228257

229-
### `--seq_dict` false
258+
For example:
230259

231-
Use this to specify the required sequence dictionary file for the selected reference genome.
260+
```
261+
--seq_dict Mammoth_MT_Krause.dict
262+
```
263+
264+
#### `--fasta_index`
232265

233-
### `--fasta_index` false
266+
If you want to use a pre-existing `samtools faidx` index, Use this to specify the required FASTA index file for the selected reference genome. This should be generated by `samtools faidx` and has a file suffix of `.fai`
234267

235-
Use this to specify the required FastA index file for the selected reference genome.
268+
For example:
236269

237-
### `--saveReference` false
270+
```
271+
--fasta_index Mammoth_MT_Krause.fasta.fai
272+
```
238273

239-
If you turn this on, the generated indices will be stored in the `./results/reference_genomes` for you.
240274

241275
## Other command line parameters
242276

0 commit comments

Comments
 (0)