Skip to content

Commit 291990f

Browse files
authored
Merge pull request #142 from maxibor/dev
Add optional merging and trimming
2 parents 2786af3 + 01d056c commit 291990f

8 files changed

Lines changed: 262 additions & 90 deletions

File tree

.travis.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,12 @@ script:
4040
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --saveReference
4141
# Run the basic pipeline with single end data (pretending its single end actually)
4242
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --singleEnd --bwa_index results/reference_genome/bwa_index/bwa_index/
43+
# Run the basic pipeline with paired end data without collapsing
44+
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --skip_collapse --saveReference
45+
# Run the basic pipeline with paired end data without trimming
46+
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --skip_trim --saveReference
47+
# Run the basic pipeline with paired end data without adapterRemoval
48+
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --skip_adapterremoval --saveReference
4349
# Run the same pipeline testing optional step: fastp, complexity
4450
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pairedEnd --complexity_filter --bwa_index results/reference_genome/bwa_index/bwa_index/
4551
# Test BAM Trimming

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,17 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
66

77
## [Unpublished / Dev Branch]
88

9+
### `Added`
10+
11+
* [#152](https://github.com/nf-core/eager/pull/152) - Clarified `--complexity_filter` flag to be specifically for poly G trimming.
12+
* [#155](https://github.com/nf-core/eager/pull/155) - Added [Dedup log to output folders](https://github.com/nf-core/eager/issues/154)
13+
14+
### `Fixed`
15+
16+
* [#151](https://github.com/nf-core/eager/pull/151) - Fixed [post-deduplication step errors](https://github.com/nf-core/eager/issues/128
17+
* [#147](https://github.com/nf-core/eager/pull/147) - Fix Samtools Index for [large references](https://github.com/nf-core/eager/issues/146)
18+
* [#145](https://github.com/nf-core/eager/pull/145) - Added Picard Memory Handling [fix](https://github.com/nf-core/eager/issues/144)
19+
920
## [2.0.5] - 2019-01-28
1021

1122
### `Added`

README.md

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,20 +45,28 @@ Additional functionality contained by the pipeline currently includes:
4545
## Quick Start
4646

4747
1. Install [`nextflow`](docs/installation.md)
48+
4849
2. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html)
50+
4951
3. Download the EAGER pipeline
5052

5153
```bash
5254
nextflow pull nf-core/eager
5355
```
5456

55-
4. Set up your job with default parameters
57+
4. Test the pipeline using the provided test data
5658

5759
```bash
58-
nextflow run nf-core -profile <docker/singularity/conda> --reads'*_R{1,2}.fastq.gz' --fasta '<REFERENCE>.fasta'
60+
nextflow run nf-core/eager -profile <docker/singularity/conda>,test --pairedEnd
5961
```
6062

61-
5. See the overview of the run with under `<OUTPUT_DIR>/MultiQC/multiqc_report.html`
63+
5. Start running your own ancient DNA analysis!
64+
65+
```bash
66+
nextflow run nf-core/eager -profile <docker/singularity/conda> --reads'*_R{1,2}.fastq.gz' --fasta '<REFERENCE>.fasta'
67+
```
68+
69+
NB. You can see an overview of the run in the MultiQC report located at `<OUTPUT_DIR>/MultiQC/multiqc_report.html`
6270

6371
Modifications to the default pipeline are easily made using various options
6472
as described in the documentation.
@@ -84,6 +92,18 @@ James Fellows Yates, Raphael Eisenhofer and Judith Neukamm. If you want to
8492
contribute, please open an issue and ask to be added to the project - happy to
8593
do so and everyone is welcome to contribute here!
8694

95+
## Contributors
96+
97+
- [James A. Fellows-Yates](https://github.com/jfy133)
98+
- [Stephen Clayton](https://github.com/sc13-bioinf)
99+
- [Judith Neukamm](https://github.com/JudithNeukamm)
100+
- [Raphael Eisenhofer](https://github.com/EisenRa)
101+
- [Maxime Garcia](https://github.com/MaxUlysse)
102+
- [Luc Venturini](https://github.com/lucventurini)
103+
- [Hester van Schalkwyk](https://github.com/hesterjvs)
104+
105+
If you've contributed and you're missing in here, please let me know and I'll add you in.
106+
87107
## Tool References
88108

89109
* **EAGER v1**, CircularMapper, DeDup* Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. [https://doi.org/10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z) Download: [https://github.com/apeltzer/EAGER-GUI](https://github.com/apeltzer/EAGER-GUI) and [https://github.com/apeltzer/EAGER-CLI](https://github.com/apeltzer/EAGER-CLI)

conf/base.config

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,10 @@ process {
3131
withName:convertBam {
3232
cpus = { check_max(8 * task.attempt, 'cpus') }
3333
}
34-
34+
withName:makeSeqDict {
35+
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
36+
}
37+
3538
withName:bwa {
3639
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
3740
cpus = { check_max(8 * task.attempt, 'cpus') }

conf/multiqc_config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ top_modules:
99
- '*_fastqc.zip'
1010
path_filters_exclude:
1111
- '*.combined.prefixed_fastqc.zip'
12+
- 'fastp'
1213
- 'adapterRemoval'
1314
- 'fastqc':
1415
name: 'FastQC (post-AdapterRemoval)'

docs/usage.md

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,10 @@ If you prefer, you can specify the full path to your reference genome when you r
170170
```
171171
> If you don't specify appropriate `--bwa_index`, `--fasta_index` parameters, the pipeline will create these indices for you automatically. Note, that saving these for later has to be turned on using `--saveReference`. You may also specify the path to a gzipped (`*.gz` file extension) FastA as reference genome - this will be uncompressed by the pipeline automatically for you. Note that other file extensions such as `.fna`, `.fa` are also supported but will be renamed to `.fasta` automatically by the pipeline.
172172
173+
### `--large_ref`
174+
175+
This parameter is required to be set for large reference genomes. If your reference genome is larger than 3.5GB, the `samtools index` calls in the pipeline need to generate `CSI` indices instead of `BAI` indices to accompensate for the size of the reference genome. This parameter is not required for smaller references (including a human `hg19` or `grch37`/`grch38` reference), but `>4GB` genomes have been shown to need `CSI` indices.
176+
173177
### `--genome` (using iGenomes)
174178

175179
The pipeline config files come bundled with paths to the illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource.
@@ -237,7 +241,7 @@ Use to set a top-limit for the default time requirement for each process.
237241
Should be a string in the format integer-unit. eg. `--max_time '2.h'`. If not specified, will be taken from the configuration in the `-profile` flag.
238242

239243
### `--max_cpus`
240-
Use to set a top-limit for the default CPU requirement for each process.
244+
Use to set a top-limit for the default CPU requirement for each **process**. This is not the maximum number of CPUs that can be used for the whole pipeline, but the maximum number of CPUs each program can use for each program submission (known as a process). Do not set this higher than what is available on your workstation or computing node can provide. If you're unsure, ask your local IT administrator for details on compute node capabilities!
241245
Should be a string in the format integer-unit. eg. `--max_cpus 1`. If not specified, will be taken from the configuration in the `-profile` flag.
242246

243247
### `--email`
@@ -279,12 +283,17 @@ This part of the documentation contains a list of user-adjustable parameters in
279283

280284
## Step skipping parameters
281285

282-
Some of the steps in the pipeline can be executed optionally. If you specify specific steps to be skipped, there won't be any output related to these modules.
286+
Some of the steps in the pipeline can be executed optionally. If you specify specific steps to be skipped, there won't be any output related to these modules.
283287

284288
### `--skip_preseq`
285289

286290
Turns off the computation of library complexity estimation.
287291

292+
### `--skip_adapterremoval`
293+
294+
Turns off adaptor trimming and paired-end read merging.
295+
Equivalent to setting both `--skip_collapse` and `--skip_trim`
296+
288297
### `--skip_damage_calculation`
289298

290299
Turns off the DamageProfiler module to compute DNA damage profiles.
@@ -299,7 +308,7 @@ Turns off duplicate removal methods DeDup and MarkDuplicates respectively. No du
299308

300309
## Complexity Filtering Options
301310

302-
### `--complexity_filter`
311+
### `--complexity_filter_poly_g`
303312

304313
Performs a poly-G tail removal step in the beginning of the pipeline, if turned on. This can be useful for trimming ploy-G tails from short-fragments sequenced on two-colour Illumina chemistry such as NextSeqs (where no-fluorescence is read as a G on two-colour chemistry), which can inflate reported GC content values.
305314

@@ -329,6 +338,24 @@ Defines the minimum read quality per base that is required for a base to be kept
329338
### `--clip_min_adap_overlap` 1
330339
Sets the minimum overlap between two reads when read merging is performed. Default is set to `1` base overlap.
331340

341+
### `--skip_collapse`
342+
343+
Turns off the paired-end read merging.
344+
345+
For example
346+
```bash
347+
--pairedEnd --skip_collapse --reads '*.fastq'
348+
```
349+
350+
### `--skip_trim`
351+
352+
Turns off the adaptor and quality trimming.
353+
354+
For example
355+
```bash
356+
--pairedEnd --skip_trim --reads '*.fastq'
357+
```
358+
332359
## Read Mapping Parameters
333360

334361
## BWA (default)

0 commit comments

Comments
 (0)