Skip to content

Commit 9a7e32e

Browse files
authored
Merge pull request #524 from nf-core/trim-bam-fix
Merge trim-bam-fix into flxitrim
2 parents de6e686 + 762c8e5 commit 9a7e32e

3 files changed

Lines changed: 19 additions & 4 deletions

File tree

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ jobs:
109109
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_bam_filtering --bam_mapping_quality_threshold 37 --bam_discard_unmapped --bam_unmapped_type 'fastq'
110110
- name: DEDUPLICATION Test with dedup
111111
run: |
112-
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --dedupper 'dedup'
112+
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --dedupper 'dedup' --dedup_all_merged
113113
- name: GENOTYPING_HC Test running GATK HaplotypeCaller
114114
run: |
115115
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_fna,docker --run_genotyping --genotyping_tool 'hc' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_hc_emitrefconf 'BP_RESOLUTION'

docs/usage.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -588,7 +588,11 @@ Turns off quality based trimming at the 5p end of reads when any of the --trimns
588588
589589
#### `--mergedonly`
590590

591-
This flag means that only merged reads are sent downstream for analysis. Singletons (i.e. reads missing a pair), or un-merged reads (where there wasn't sufficient overlap) are discarded. You may want to use this if you want ensure only the best quality reads for your analysis, but with the penalty of potentially losing still valid data (even if some reads have slightly lower quality).
591+
Specify that only merged reads are sent downstream for analysis.
592+
593+
Singletons (i.e. reads missing a pair), or un-merged reads (where there wasn't sufficient overlap) are discarded.
594+
595+
You may want to use this if you want ensure only the best quality reads for your analysis, but with the penalty of potentially losing still valid data (even if some reads have slightly lower quality). It is highly recommended when using `--dedupper 'dedup'` (see below).
592596

593597
### Read Mapping Parameters
594598

@@ -707,11 +711,18 @@ If using TSV input, deduplication is performed library, i.e. after lane merging.
707711

708712
#### `--dedupper`
709713

710-
Sets the duplicate read removal tool. By default uses `markduplicates` from Picard. Alternatively an ancient DNA specific read deduplication tool 'dedup' ([Pelter et al. 2016](http://dx.doi.org/10.1186/s13059-016-0918-z)) is offered. This utilises both ends of paired-end data to remove duplicates (i.e. true exact duplicates, as markduplicates will over-zealously deduplicate anything with the same starting position even if the ends are different). DeDup should only be used solely on paired-end data otherwise suboptimal deduplication can occur if applied to either single-end or a mix of single-end/paired-end data.
714+
Sets the duplicate read removal tool. By default uses `markduplicates` from Picard. Alternatively an ancient DNA specific read deduplication tool 'dedup' ([Pelter et al. 2016](http://dx.doi.org/10.1186/s13059-016-0918-z)) is offered.
715+
716+
This utilises both ends of paired-end data to remove duplicates (i.e. true exact duplicates, as markduplicates will over-zealously deduplicate anything with the same starting position even if the ends are different). DeDup should only be used solely on paired-end data otherwise suboptimal deduplication can occur if applied to either single-end or a mix of single-end/paired-end data.
717+
718+
Note that if you run without the `--mergedonly` flag for AdapterRemoval, DeDup will
719+
likely fail. If you absolutely want to use both PE and SE data, you can supply the
720+
`--dedup_all_merged` flag to consider singletons to also be merged paired-end reads. This
721+
may result in over-zealous deduplication.
711722

712723
#### `--dedup_all_merged`
713724

714-
Sets DeDup to treat all reads as merged reads. This is useful if reads are for example not prefixed with `M_` in all cases.
725+
Sets DeDup to treat all reads as merged reads. This is useful if reads are for example not prefixed with `M_` in all cases. Therefore, this can be used as a workaround when also using a mixture of paired-end and single-end data, however this is not recommended (see above).
715726

716727
### Library Complexity Estimation Parameters
717728

main.nf

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -385,6 +385,10 @@ if (params.dedupper != 'dedup' && params.dedupper != 'markduplicates') {
385385
exit 1, "[nf-core/eager] error: Selected deduplication tool is not recognised. Options: 'dedup' or 'markduplicates'. You gave: --dedupper '${params.dedupper}'."
386386
}
387387

388+
if (params.dedupper == 'dedup' && !params.mergedonly) {
389+
log.warn "[nf-core/eager] Warning: you are using DeDup but without specifying --mergedonly for AdapterRemoval, dedup will likely fail! See documentation for more information."
390+
}
391+
388392
// Genotyping validation
389393
if (params.run_genotyping){
390394
if (params.genotyping_tool != 'ug' && params.genotyping_tool != 'hc' && params.genotyping_tool != 'freebayes' && params.genotyping_tool != 'pileupcaller' && params.genotyping_tool != 'angsd' ) {

0 commit comments

Comments
 (0)