Skip to content

Commit 69c2373

Browse files
authored
Merge pull request #220 from jfy133/postmapfilter-stats
Added post-mapping filter statistics generation
2 parents 74867fe + 27d0d87 commit 69c2373

4 files changed

Lines changed: 82 additions & 36 deletions

File tree

CHANGELOG.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
1414
/(https://github.com/nf-core/eager/issues/182)
1515
* Merged in [nf-core/tools](https://github.com/nf-core/tools) release V1.6 template changes
1616
* A lot more automated tests using Travis CI
17-
* Don't ignore DamageProfiler errors anymore
17+
* Don't ignore DamageProfiler errors anymore
18+
* [#220](https://github.com/nf-core/eager/pull/220) - Added post-mapping filtering statistics module and corresponding MultiQC statistics [#217](https://github.com/nf-core/eager/issues/217)
19+
1820

1921
### `Fixed`
2022

assets/multiqc_config.yaml

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,18 @@ top_modules:
1010
- 'fastp'
1111
- 'adapterRemoval'
1212
- 'fastqc':
13-
name: 'FastQC (post-AdapterRemoval)'
14-
path_filters:
15-
- '*.truncated_fastqc.zip'
16-
- '*.combined*_fastqc.zip'
17-
- 'samtools'
13+
name: 'FastQC (post-AdapterRemoval)'
14+
path_filters:
15+
- '*.truncated_fastqc.zip'
16+
- '*.combined*_fastqc.zip'
17+
- 'samtools':
18+
name: 'Samtools Flagstat (pre-samtools filter)'
19+
path_filters:
20+
- '*.sorted.stats'
21+
- 'samtools':
22+
name: 'Samtools Flagstat (post-samtools filter)'
23+
path_filters:
24+
- '*.sorted.bam.filtered.stats'
1825
- 'dedup'
1926
- 'preseq'
2027
- 'qualimap'
@@ -43,11 +50,15 @@ table_columns_visible:
4350
percent_duplicates: False
4451
total_sequences: True
4552
percent_gc: True
53+
Samtools Flagstat (pre-samtools filter):
54+
mapped_passed: True
55+
Samtools Flagstat (post-samtools filter):
56+
mapped_passed: True
4657
QualiMap:
4758
1_x_pc: True
4859
5_x_pc: True
4960
percentage_aligned: False
50-
DamageProfiler::
61+
DamageProfiler:
5162
3 Prime1: True
5263
3 Prime2: True
5364
5 Prime1: True
@@ -67,9 +78,11 @@ table_columns_placement:
6778
total_sequences: 400
6879
avg_sequence_length: 410
6980
percent_gc: 420
70-
Samtools Flagstat:
81+
Samtools Flagstat (pre-samtools filter):
7182
mapped_passed: 500
72-
DeDup:
83+
Samtools Flagstat (post-samtools filter):
84+
mapped_passed: 510
85+
DeDup:
7386
clusterfactor: 600
7487
duplication_rate: 610
7588
QualiMap:
@@ -84,7 +97,7 @@ table_columns_placement:
8497
mtreads: 800
8598
mt_cov_avg: 810
8699
mt_nuc_ratio: 820
87-
DamageProfiler::
100+
DamageProfiler:
88101
3 Prime1: 900
89102
3 Prime2: 910
90103
5 Prime1: 920

docs/output.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ The default columns are as follows:
7272
* **Seqs** This is from Post-AdapterRemoval FastQC. Represents the number of preprocessed reads in your adapter trimmed (paired end) merged FASTQ file. The loss between this number and the Pre-AdapterRemoval FastQC can give you an idea of the quality of trimming and merging.
7373
* **%GC** This is from Post-AdapterRemoval FastQC. Represents the average GC of all preprocessed reads in your adapter trimmed (paired end) merged FASTQ file.
7474
* **Reads Mapped** This is from Samtools. This is the raw number of preprocessed reads mapped to your reference genome _prior_ map quality filtering and deduplication.
75+
* **Reads Mapped** This is from Samtools. This is the raw number of preprocessed reads mapped to your reference genome _after_ map quality filtering and deduplication (note the column name does not distinguish itself from prior-map quality filtering, but the post-filter column is always second)
7576
* **Duplication Rate** This is from DeDup. This is the percentage of overall number of mapped reads that were an exact duplicate of another read. The number of reads removed by DeDup can be calculating this number by mapped reads (if no map quality filtering was applied!)
7677
* **Coverage** This is from Qualimap. This is the median number of times a base on your reference genome was covered by a read (i.e. depth coverage).. This average includes bases with 0 reads covering that position.
7778
* **>= 1X** to **>= 5X** These are from Qualimap. This is the percentage of the genome covered at that particular depth coverage.
@@ -233,6 +234,9 @@ The third row 'Mapped' represents the number of reads that found a place that co
233234

234235
The remaining rows will be 0 when running `bwa aln` as these characteristucs of the data are not considered by the algorithm by default.
235236

237+
> **NB:** The Samtools (pre-samtools filter) plots displayed in the MultiQC report shows mapped reads without mapping quality filtering. This will contain reads that can map to multiple places on your reference genome with equal or slightly less mapping quality score. To see how your reads look after mapping quality, look at the FastQC reports in the Samtools (pre-samtools filter). You should expect after mapping quality filtering, that you will have less reads.
238+
239+
236240
### DeDup
237241
#### Background
238242
DeDup is a duplicate removal tool which searchs for PCR duplicates and removes them from your BAM file. We remove these duplicates because otherwise you would be artificially increasing your coverage and subsequently confidence in genotyping, by considering these lab artefacts which are not biologically meaningful. DeDup looks for reads with the same start and end coordinates, and whether they have exactly the same sequence. The main difference of DeDup versus e.g. `samtools markduplicates` is that DeDup considers _both_ ends of a read, not just the start position, so it is more precise in removing actual duplicates without penalising often already low aDNA data.

main.nf

Lines changed: 53 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -443,7 +443,7 @@ ${summary.collect { k,v -> " <dt>$k</dt><dd><samp>${v ?: '<span style
443443

444444

445445
/*
446-
* Create BWA indices if they are not present
446+
* PREPROCESSING - Create BWA indices if they are not present
447447
*/
448448

449449
if(!params.bwa_index && !params.fasta.isEmpty() && (params.aligner == 'bwa' || params.bwamem)){
@@ -528,7 +528,7 @@ process makeSeqDict {
528528
}
529529

530530
/*
531-
* Convert BAM to FastQ if BAM input is specified instead of FastQ file(s)
531+
* PREPROCESSING - Convert BAM to FastQ if BAM input is specified instead of FastQ file(s)
532532
*
533533
*/
534534

@@ -554,7 +554,7 @@ process convertBam {
554554

555555

556556
/*
557-
* STEP 1 - FastQC
557+
* STEP 1a - FastQC
558558
*/
559559
process fastqc {
560560
tag "$name"
@@ -578,7 +578,7 @@ process fastqc {
578578
}
579579

580580

581-
/* STEP 2.0 - FastP
581+
/* STEP 1b - FastP
582582
* Optional poly-G complexity filtering step before read merging/adapter clipping etc
583583
* Note: Clipping, Merging, Quality Trimning are turned off here - we leave this to adapter removal itself!
584584
*/
@@ -676,7 +676,7 @@ process adapter_removal {
676676

677677

678678
/*
679-
* STEP 2.1 - FastQC after clipping/merging (if applied!)
679+
* STEP 2b - FastQC after clipping/merging (if applied!)
680680
*/
681681
process fastqc_after_clipping {
682682
tag "${name}"
@@ -698,7 +698,7 @@ process fastqc_after_clipping {
698698
}
699699

700700
/*
701-
Step 3: Mapping with BWA, SAM to BAM, Sort BAM
701+
Step 3a - Mapping with BWA, SAM to BAM, Sort BAM
702702
*/
703703

704704
process bwa {
@@ -713,7 +713,7 @@ process bwa {
713713

714714

715715
output:
716-
file "*.sorted.bam" into ch_mapped_reads_idxstats,ch_mapped_reads_filter,ch_mapped_reads_preseq, ch_mapped_reads_damageprofiler, ch_bwa_mapped_reads_strip
716+
file "*.sorted.bam" into ch_mapped_reads_flagstat,ch_mapped_reads_filter,ch_mapped_reads_preseq, ch_mapped_reads_damageprofiler, ch_bwa_mapped_reads_strip
717717
file "*.{bai,csi}" into ch_bam_index_for_damageprofiler
718718

719719

@@ -780,7 +780,7 @@ process circularmapper{
780780
file fasta from fasta_for_indexing
781781

782782
output:
783-
file "*.sorted.bam" into ch_mapped_reads_idxstats_cm,ch_mapped_reads_filter_cm,ch_mapped_reads_preseq_cm, ch_mapped_reads_damageprofiler_cm, ch_circular_mapped_reads_strip
783+
file "*.sorted.bam" into ch_mapped_reads_flagstat_cm,ch_mapped_reads_filter_cm,ch_mapped_reads_preseq_cm, ch_mapped_reads_damageprofiler_cm, ch_circular_mapped_reads_strip
784784
file "*.{bai,csi}"
785785

786786
script:
@@ -824,7 +824,7 @@ process bwamem {
824824
file index from bwa_index_bwamem.collect()
825825

826826
output:
827-
file "*.sorted.bam" into ch_bwamem_mapped_reads_idxstats,ch_bwamem_mapped_reads_filter,ch_bwamem_mapped_reads_preseq, ch_bwamem_mapped_reads_damageprofiler, ch_bwamem_mapped_reads_strip
827+
file "*.sorted.bam" into ch_bwamem_mapped_reads_flagstat,ch_bwamem_mapped_reads_filter,ch_bwamem_mapped_reads_preseq, ch_bwamem_mapped_reads_damageprofiler, ch_bwamem_mapped_reads_strip
828828
file "*.{bai,csi}"
829829

830830

@@ -848,18 +848,18 @@ process bwamem {
848848
}
849849

850850
/*
851-
* Step 4 - IDXStats
851+
* Step 3b - flagstat
852852
*/
853853

854-
process samtools_idxstats {
854+
process samtools_flagstat {
855855
tag "$prefix"
856856
publishDir "${params.outdir}/samtools/stats", mode: 'copy'
857857

858858
input:
859-
file(bam) from ch_mapped_reads_idxstats.mix(ch_mapped_reads_idxstats_cm,ch_bwamem_mapped_reads_idxstats)
859+
file(bam) from ch_mapped_reads_flagstat.mix(ch_mapped_reads_flagstat_cm,ch_bwamem_mapped_reads_flagstat)
860860

861861
output:
862-
file "*.stats" into ch_idxstats_for_multiqc
862+
file "*.stats" into ch_flagstat_for_multiqc
863863

864864
script:
865865
prefix = "$bam" - ~/(\.bam)?$/
@@ -870,7 +870,7 @@ process samtools_idxstats {
870870

871871

872872
/*
873-
* Step 5: Keep unmapped/remove unmapped reads
873+
* Step 4a - Keep unmapped/remove unmapped reads
874874
*/
875875

876876
process samtools_filter {
@@ -887,7 +887,7 @@ process samtools_filter {
887887
file bam from ch_mapped_reads_filter.mix(ch_mapped_reads_filter_cm,ch_bwamem_mapped_reads_filter)
888888

889889
output:
890-
file "*filtered.bam" into ch_bam_filtered_qualimap, ch_bam_filtered_dedup, ch_bam_filtered_markdup, ch_bam_filtered_pmdtools, ch_bam_filtered_angsd, ch_bam_filtered_gatk
890+
file "*filtered.bam" into ch_bam_filtered_flagstat, ch_bam_filtered_qualimap, ch_bam_filtered_dedup, ch_bam_filtered_markdup, ch_bam_filtered_pmdtools, ch_bam_filtered_angsd, ch_bam_filtered_gatk
891891
file "*.fastq.gz" optional true
892892
file "*.unmapped.bam" optional true
893893
file "*.{bai,csi}"
@@ -959,9 +959,31 @@ process strip_input_fastq {
959959

960960
}
961961

962+
/*
963+
* Step 4b: Keep unmapped/remove unmapped reads flagstat
964+
*/
965+
966+
967+
process samtools_flagstat_after_filter {
968+
tag "$prefix"
969+
publishDir "${params.outdir}/samtools/stats", mode: 'copy'
970+
971+
input:
972+
file(bam) from ch_bam_filtered_flagstat
973+
974+
output:
975+
file "*.stats" into ch_bam_filtered_flagstat_for_multiqc
976+
977+
script:
978+
prefix = "$bam" - ~/(\.bam)?$/
979+
"""
980+
samtools flagstat $bam > ${prefix}.stats
981+
"""
982+
}
983+
962984

963985
/*
964-
Step 6: DeDup / MarkDuplicates
986+
Step 5a: DeDup / MarkDuplicates
965987
*/
966988

967989
process dedup{
@@ -1004,7 +1026,7 @@ process dedup{
10041026
}
10051027

10061028
/*
1007-
Step 5.1: Preseq
1029+
Step 6: Preseq
10081030
*/
10091031

10101032
process preseq {
@@ -1034,7 +1056,7 @@ process preseq {
10341056
}
10351057

10361058
/*
1037-
Step 5.2: DMG Assessment
1059+
Step 7a: DMG Assessment
10381060
*/
10391061

10401062
process damageprofiler {
@@ -1062,7 +1084,7 @@ process damageprofiler {
10621084
}
10631085

10641086
/*
1065-
Step 5.3: Qualimap
1087+
Step 8: Qualimap
10661088
*/
10671089

10681090
process qualimap {
@@ -1090,7 +1112,7 @@ process qualimap {
10901112

10911113

10921114
/*
1093-
Step 6: MarkDuplicates
1115+
Step 5b: MarkDuplicates
10941116
*/
10951117

10961118
process markDup{
@@ -1130,6 +1152,10 @@ if(!params.run_pmdtools){
11301152
ch_dedup_for_pmdtools.close()
11311153
}
11321154

1155+
/*
1156+
Step 9: PMDtools
1157+
*/
1158+
11331159
process pmdtools {
11341160
tag "${bam.baseName}"
11351161
publishDir "${params.outdir}/pmdtools", mode: 'copy'
@@ -1162,7 +1188,7 @@ process pmdtools {
11621188
}
11631189

11641190
/*
1165-
* Optional BAM Trimming step using bamUtils
1191+
* Step 10 - BAM Trimming step using bamUtils
11661192
* Can be used for UDGhalf protocols to clip off -n bases of each read
11671193
*/
11681194

@@ -1217,7 +1243,7 @@ Downstream VCF tools:
12171243

12181244

12191245
/*
1220-
* STEP 3 - Output Description HTML
1246+
* Step 11a - Output Description HTML
12211247
*/
12221248
process output_documentation {
12231249
publishDir "${params.outdir}/Documentation", mode: 'copy'
@@ -1236,7 +1262,7 @@ process output_documentation {
12361262

12371263

12381264
/*
1239-
* Parse software version numbers
1265+
* Step 11b - Parse software version numbers
12401266
*/
12411267
process get_software_versions {
12421268

@@ -1271,7 +1297,7 @@ process get_software_versions {
12711297

12721298

12731299
/*
1274-
* STEP 2 - MultiQC
1300+
* Step 11c - MultiQC
12751301
*/
12761302
process multiqc {
12771303
publishDir "${params.outdir}/MultiQC", mode: 'copy'
@@ -1282,7 +1308,8 @@ process multiqc {
12821308
file('fastqc/*') from ch_fastqc_after_clipping.collect().ifEmpty([])
12831309
file ('software_versions/software_versions_mqc*') from software_versions_yaml.collect().ifEmpty([])
12841310
file ('adapter_removal/*') from ch_adapterremoval_logs.collect().ifEmpty([])
1285-
file ('idxstats/*') from ch_idxstats_for_multiqc.collect().ifEmpty([])
1311+
file ('flagstat/*') from ch_flagstat_for_multiqc.collect().ifEmpty([])
1312+
file ('flagstat_filtered/*') from ch_bam_filtered_flagstat_for_multiqc.collect().ifEmpty([])
12861313
file ('preseq/*') from ch_preseq_results.collect().ifEmpty([])
12871314
file ('damageprofiler/dmgprof*/*') from ch_damageprofiler_results.collect().ifEmpty([])
12881315
file ('qualimap/qualimap*/*') from ch_qualimap_results.collect().ifEmpty([])
@@ -1308,7 +1335,7 @@ process multiqc {
13081335

13091336

13101337
/*
1311-
* Completion e-mail notification
1338+
* Step 11d - Completion e-mail notification
13121339
*/
13131340
workflow.onComplete {
13141341

0 commit comments

Comments
 (0)