You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*[#147](https://github.com/nf-core/eager/pull/147) - Fix Samtools Index for [large references](https://github.com/nf-core/eager/issues/146)
Copy file name to clipboardExpand all lines: README.md
+24-3Lines changed: 24 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,20 +45,28 @@ Additional functionality contained by the pipeline currently includes:
45
45
## Quick Start
46
46
47
47
1. Install [`nextflow`](docs/installation.md)
48
+
48
49
2. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html)
50
+
49
51
3. Download the EAGER pipeline
50
52
51
53
```bash
52
54
nextflow pull nf-core/eager
53
55
```
54
56
55
-
4.Set up your job with default parameters
57
+
4.Test the pipeline using the provided test data
56
58
57
59
```bash
58
-
nextflow run nf-core -profile <docker/singularity/conda> --reads'*_R{1,2}.fastq.gz' --fasta '<REFERENCE>.fasta'
60
+
nextflow run nf-core/eager -profile <docker/singularity/conda>,test --pairedEnd
59
61
```
60
62
61
-
5. See the overview of the run with under `<OUTPUT_DIR>/MultiQC/multiqc_report.html`
63
+
5. Start running your own ancient DNA analysis!
64
+
65
+
```bash
66
+
nextflow run nf-core/eager -profile <docker/singularity/conda> --reads'*_R{1,2}.fastq.gz' --fasta '<REFERENCE>.fasta'
67
+
```
68
+
69
+
NB. You can see an overview of the run in the MultiQC report located at `<OUTPUT_DIR>/MultiQC/multiqc_report.html`
62
70
63
71
Modifications to the default pipeline are easily made using various options
64
72
as described in the documentation.
@@ -84,6 +92,19 @@ James Fellows Yates, Raphael Eisenhofer and Judith Neukamm. If you want to
84
92
contribute, please open an issue and ask to be added to the project - happy to
85
93
do so and everyone is welcome to contribute here!
86
94
95
+
## Contributors
96
+
97
+
-[James A. Fellows-Yates](https://github.com/jfy133)
Copy file name to clipboardExpand all lines: docs/usage.md
+29-3Lines changed: 29 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -170,6 +170,10 @@ If you prefer, you can specify the full path to your reference genome when you r
170
170
```
171
171
> If you don't specify appropriate `--bwa_index`, `--fasta_index` parameters, the pipeline will create these indices for you automatically. Note, that saving these for later has to be turned on using `--saveReference`. You may also specify the path to a gzipped (`*.gz` file extension) FastA as reference genome - this will be uncompressed by the pipeline automatically for you. Note that other file extensions such as `.fna`, `.fa` are also supported but will be renamed to `.fasta` automatically by the pipeline.
172
172
173
+
### `--large_ref`
174
+
175
+
This parameter is required to be set for large reference genomes. If your reference genome is larger than 3.5GB, the `samtools index` calls in the pipeline need to generate `CSI` indices instead of `BAI` indices to accompensate for the size of the reference genome. This parameter is not required for smaller references (including a human `hg19` or `grch37`/`grch38` reference), but `>4GB` genomes have been shown to need `CSI` indices.
176
+
173
177
### `--genome` (using iGenomes)
174
178
175
179
The pipeline config files come bundled with paths to the illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource.
@@ -237,7 +241,7 @@ Use to set a top-limit for the default time requirement for each process.
237
241
Should be a string in the format integer-unit. eg. `--max_time '2.h'`. If not specified, will be taken from the configuration in the `-profile` flag.
238
242
239
243
### `--max_cpus`
240
-
Use to set a top-limit for the default CPU requirement for each process.
244
+
Use to set a top-limit for the default CPU requirement for each **process**. This is not the maximum number of CPUs that can be used for the whole pipeline, but the maximum number of CPUs each program can use for each program submission (known as a process). Do not set this higher than what is available on your workstation or computing node can provide. If you're unsure, ask your local IT administrator for details on compute node capabilities!
241
245
Should be a string in the format integer-unit. eg. `--max_cpus 1`. If not specified, will be taken from the configuration in the `-profile` flag.
242
246
243
247
### `--email`
@@ -279,12 +283,16 @@ This part of the documentation contains a list of user-adjustable parameters in
279
283
280
284
## Step skipping parameters
281
285
282
-
Some of the steps in the pipeline can be executed optionally. If you specify specific steps to be skipped, there won't be any output related to these modules.
286
+
Some of the steps in the pipeline can be executed optionally. If you specify specific steps to be skipped, there won't be any output related to these modules.
283
287
284
288
### `--skip_preseq`
285
289
286
290
Turns off the computation of library complexity estimation.
287
291
292
+
### `--skip_adapterremoval`
293
+
294
+
Turns off adaptor trimming and paired-end read merging. Equivalent to setting both `--skip_collapse` and `--skip_trim`.
295
+
288
296
### `--skip_damage_calculation`
289
297
290
298
Turns off the DamageProfiler module to compute DNA damage profiles.
@@ -299,7 +307,7 @@ Turns off duplicate removal methods DeDup and MarkDuplicates respectively. No du
299
307
300
308
## Complexity Filtering Options
301
309
302
-
### `--complexity_filter`
310
+
### `--complexity_filter_poly_g`
303
311
304
312
Performs a poly-G tail removal step in the beginning of the pipeline, if turned on. This can be useful for trimming ploy-G tails from short-fragments sequenced on two-colour Illumina chemistry such as NextSeqs (where no-fluorescence is read as a G on two-colour chemistry), which can inflate reported GC content values.
305
313
@@ -329,6 +337,24 @@ Defines the minimum read quality per base that is required for a base to be kept
329
337
### `--clip_min_adap_overlap` 1
330
338
Sets the minimum overlap between two reads when read merging is performed. Default is set to `1` base overlap.
0 commit comments