Skip to content

Commit 80f263f

Browse files
authored
Merge pull request #604 from nf-core/adapterremoval-prefix-fix
Adapterremoval prefix fix
2 parents 2232c2c + a706788 commit 80f263f

4 files changed

Lines changed: 22 additions & 18 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
1212
### `Fixed`
1313

1414
- Fixed AWS full test profile.
15+
- [#587](https://github.com/nf-core/eager/issues/587) - Re-implemented AdapterRemovalFixPrefix for DeDup compatibility of including singletons
1516

1617
## [2.2.1] - 2020-10-20
1718

docs/usage.md

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@
88

99
<!-- TOC depthfrom:2 depthto:3 -->
1010

11+
- [:warning: Please read this documentation on the nf-core website: https://nf-co.re/eager/usage](#warning-please-read-this-documentation-on-the-nf-core-website-httpsnf-coreeagerusage)
1112
- [Table of contents](#table-of-contents)
13+
- [Introduction](#introduction)
1214
- [Running the pipeline](#running-the-pipeline)
1315
- [Quick Start](#quick-start)
1416
- [Updating the pipeline](#updating-the-pipeline)
@@ -1351,21 +1353,16 @@ Picard. Alternatively an ancient DNA specific read deduplication tool `dedup`
13511353

13521354
This utilises both ends of paired-end data to remove duplicates (i.e. true exact
13531355
duplicates, as markduplicates will over-zealously deduplicate anything with the
1354-
same starting position even if the ends are different). DeDup should only be
1355-
used solely on paired-end data otherwise suboptimal deduplication can occur if
1356-
applied to either single-end or a mix of single-end/paired-end data.
1357-
1358-
Note that if you run without the `--mergedonly` flag for AdapterRemoval, DeDup
1359-
will likely fail. If you absolutely want to use both PE and SE data, you can
1360-
supply the `--dedup_all_merged` flag to consider singletons to also be merged
1361-
paired-end reads. This may result in over-zealous deduplication.
1356+
same starting position even if the ends are different). DeDup should generally
1357+
only be used solely on paired-end data otherwise suboptimal deduplication can
1358+
occur if applied to either single-end or a mix of single-end/paired-end data.
13621359

13631360
#### `--dedup_all_merged`
13641361

13651362
Sets DeDup to treat all reads as merged reads. This is useful if reads are for
1366-
example not prefixed with `M_` in all cases. Therefore, this can be used as a
1367-
workaround when also using a mixture of paired-end and single-end data, however
1368-
this is not recommended (see above).
1363+
example not prefixed with `M_`, `R_`, or `L_` in all cases. Therefore, this can
1364+
be used as a workaround when also using a mixture of paired-end and single-end
1365+
data, however this is not recommended (see above).
13691366

13701367
> Modifies dedup parameter: `-m`
13711368

main.nf

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1175,14 +1175,17 @@ process adapter_removal {
11751175
11761176
#Combine files
11771177
if [ ${preserve5p} = "--preserve5p" ] && [ ${mergedonly} = "N" ]; then
1178-
cat *.collapsed.gz *.singleton.truncated.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.fq.gz
1178+
cat *.collapsed.gz *.singleton.truncated.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.tmp.fq.gz
11791179
elif [ ${preserve5p} = "--preserve5p" ] && [ ${mergedonly} = "Y" ] ; then
1180-
cat *.collapsed.gz > output/${base}.pe.combined.fq.gz
1180+
cat *.collapsed.gz > output/${base}.pe.combined.tmp.fq.gz
11811181
elif [ ${mergedonly} = "Y" ] ; then
1182-
cat *.collapsed.gz *.collapsed.truncated.gz > output/${base}.pe.combined.fq.gz
1182+
cat *.collapsed.gz *.collapsed.truncated.gz > output/${base}.pe.combined.tmp.fq.gz
11831183
else
1184-
cat *.collapsed.gz *.collapsed.truncated.gz *.singleton.truncated.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.fq.gz
1184+
cat *.collapsed.gz *.collapsed.truncated.gz *.singleton.truncated.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.tmp.fq.gz
11851185
fi
1186+
1187+
## Add R_ and L_ for unmerged reads for DeDup compatibility
1188+
AdapterRemovalFixPrefix output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus} > output/${base}.pe.combined.fq.gz
11861189
11871190
mv *.settings output/
11881191
"""
@@ -1200,11 +1203,14 @@ process adapter_removal {
12001203
AdapterRemoval --file1 ${r1} --file2 ${r2} --basename ${base}.pe --gzip --threads ${task.cpus} ${collapse_me} ${trim_me}
12011204
12021205
if [ ${mergedonly} = "Y" ]; then
1203-
cat *.collapsed.gz *.collapsed.truncated.gz > output/${base}.pe.combined.fq.gz
1206+
cat *.collapsed.gz *.collapsed.truncated.gz > output/${base}.pe.combined.tmp.fq.gz
12041207
else
1205-
cat *.collapsed.gz *.collapsed.truncated.gz *.singleton.truncated.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.fq.gz
1208+
cat *.collapsed.gz *.collapsed.truncated.gz *.singleton.truncated.gz *.pair1.truncated.gz *.pair2.truncated.gz > output/${base}.pe.combined.tmp.fq.gz
12061209
fi
12071210
1211+
## Add R_ and L_ for unmerged reads for DeDup compatibility
1212+
AdapterRemovalFixPrefix output/${base}.pe.combined.tmp.fq.gz | pigz -p ${task.cpus} > output/${base}.pe.combined.fq.gz
1213+
12081214
mv *.settings output/
12091215
"""
12101216
} else if ( seqtype != 'PE' ) {

nextflow_schema.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -694,7 +694,7 @@
694694
"default": "markduplicates",
695695
"description": "Deduplication method to use. Options: 'markduplicates', 'dedup'.",
696696
"fa_icon": "fas fa-object-group",
697-
"help_text": "Sets the duplicate read removal tool. By default uses `markduplicates` from Picard. Alternatively an ancient DNA specific read deduplication tool `dedup` ([Peltzer et al. 2016](http://dx.doi.org/10.1186/s13059-016-0918-z)) is offered.\n\nThis utilises both ends of paired-end data to remove duplicates (i.e. true exact duplicates, as markduplicates will over-zealously deduplicate anything with the same starting position even if the ends are different). DeDup should only be used solely on paired-end data otherwise suboptimal deduplication can occur if applied to either single-end or a mix of single-end/paired-end data.\n\nNote that if you run without the `--mergedonly` flag for AdapterRemoval, DeDup will\nlikely fail. If you absolutely want to use both PE and SE data, you can supply the\n`--dedup_all_merged` flag to consider singletons to also be merged paired-end reads. This\nmay result in over-zealous deduplication.",
697+
"help_text": "Sets the duplicate read removal tool. By default uses `markduplicates` from Picard. Alternatively an ancient DNA specific read deduplication tool `dedup` ([Peltzer et al. 2016](http://dx.doi.org/10.1186/s13059-016-0918-z)) is offered.\n\nThis utilises both ends of paired-end data to remove duplicates (i.e. true exact duplicates, as markduplicates will over-zealously deduplicate anything with the same starting position even if the ends are different). DeDup should generally only be used solely on paired-end data otherwise suboptimal deduplication can occur if applied to either single-end or a mix of single-end/paired-end data.\n",
698698
"enum": [
699699
"markduplicates",
700700
"dedup"

0 commit comments

Comments
 (0)