Skip to content

Commit 16fedae

Browse files
authored
Merge branch 'dev' into dev
2 parents 6d13545 + e85b58f commit 16fedae

5 files changed

Lines changed: 53 additions & 32 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,13 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
88

99
### `Added`
1010

11-
### `Fixed`
12-
* [#145](https://github.com/nf-core/eager/issues/145) - Added Picard Memory Handling [fix](https://github.com/nf-core/eager/issues/144)
1311
* Clarified `--complexity_filter` flag to be specifically for poly G trimming.
1412

13+
### `Fixed`
14+
15+
* [#151](https://github.com/nf-core/eager/pull/151) - Fixed [post-deduplication step errors](https://github.com/nf-core/eager/issues/128
16+
* [#147](https://github.com/nf-core/eager/pull/147) - Fix Samtools Index for [large references](https://github.com/nf-core/eager/issues/146)
17+
* [#145](https://github.com/nf-core/eager/pull/145) - Added Picard Memory Handling [fix](https://github.com/nf-core/eager/issues/144)
1518

1619
## [2.0.5] - 2019-01-28
1720

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,18 @@ James Fellows Yates, Raphael Eisenhofer and Judith Neukamm. If you want to
9292
contribute, please open an issue and ask to be added to the project - happy to
9393
do so and everyone is welcome to contribute here!
9494

95+
## Contributors
96+
97+
- [James A. Fellows-Yates](https://github.com/jfy133)
98+
- [Stephen Clayton](https://github.com/sc13-bioinf)
99+
- [Judith Neukamm](https://github.com/JudithNeukamm)
100+
- [Raphael Eisenhofer](https://github.com/EisenRa)
101+
- [Maxime Garcia](https://github.com/MaxUlysse)
102+
- [Luc Venturini](https://github.com/lucventurini)
103+
- [Hester van Schalkwyk](https://github.com/hesterjvs)
104+
105+
If you've contributed and you're missing in here, please let me know and I'll add you in.
106+
95107
## Tool References
96108

97109
* **EAGER v1**, CircularMapper, DeDup* Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. [https://doi.org/10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z) Download: [https://github.com/apeltzer/EAGER-GUI](https://github.com/apeltzer/EAGER-GUI) and [https://github.com/apeltzer/EAGER-CLI](https://github.com/apeltzer/EAGER-CLI)

docs/usage.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,10 @@ If you prefer, you can specify the full path to your reference genome when you r
170170
```
171171
> If you don't specify appropriate `--bwa_index`, `--fasta_index` parameters, the pipeline will create these indices for you automatically. Note, that saving these for later has to be turned on using `--saveReference`. You may also specify the path to a gzipped (`*.gz` file extension) FastA as reference genome - this will be uncompressed by the pipeline automatically for you. Note that other file extensions such as `.fna`, `.fa` are also supported but will be renamed to `.fasta` automatically by the pipeline.
172172
173+
### `--large_ref`
174+
175+
This parameter is required to be set for large reference genomes. If your reference genome is larger than 3.5GB, the `samtools index` calls in the pipeline need to generate `CSI` indices instead of `BAI` indices to accompensate for the size of the reference genome. This parameter is not required for smaller references (including a human `hg19` or `grch37`/`grch38` reference), but `>4GB` genomes have been shown to need `CSI` indices.
176+
173177
### `--genome` (using iGenomes)
174178

175179
The pipeline config files come bundled with paths to the illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource.

main.nf

Lines changed: 30 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -240,12 +240,6 @@ if("${params.fasta}".endsWith(".gz")){
240240
.ifEmpty { exit 1, "No genome specified! Please specify one with --fasta"}
241241
.into {ch_fasta_for_bwa_indexing;ch_fasta_for_faidx_indexing;ch_fasta_for_dict_indexing; ch_fasta_for_damageprofiler; ch_fasta_for_qualimap; ch_fasta_for_pmdtools; ch_fasta_for_circularmapper_index}
242242
}
243-
244-
245-
246-
247-
248-
249243

250244
//Index files provided? Then check whether they are correct and complete
251245
if (params.aligner != 'bwa' && !params.circularmapper && !params.bwamem){
@@ -346,6 +340,7 @@ summary['Pipeline Version'] = workflow.manifest.version
346340
summary['Run Name'] = custom_runName ?: workflow.runName
347341
summary['Reads'] = params.reads
348342
summary['Fasta Ref'] = params.fasta
343+
summary['BAM Index Type'] = (params.large_ref == "") ? 'BAI' : 'CSI'
349344
if(params.bwa_index) summary['BWA Index'] = params.bwa_index
350345
summary['Data Type'] = params.singleEnd ? 'Single-End' : 'Paired-End'
351346
summary['Max Memory'] = params.max_memory
@@ -649,16 +644,17 @@ process bwa {
649644

650645
output:
651646
file "*.sorted.bam" into ch_mapped_reads_idxstats,ch_mapped_reads_filter,ch_mapped_reads_preseq, ch_mapped_reads_damageprofiler
652-
file "*.bai" into ch_bam_index_for_damageprofiler
647+
file "*.{bai,csi}" into ch_bam_index_for_damageprofiler
653648

654649

655650
script:
656651
prefix = reads[0].toString() - ~/(_R1)?(\.combined\.)?(prefixed)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/
657652
fasta = "${index}/*.fasta"
653+
size = "${params.large_ref}" ? '-c' : ''
658654
"""
659655
bwa aln -t ${task.cpus} $fasta $reads -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -f "${reads.baseName}.sai"
660656
bwa samse -r "@RG\\tID:ILLUMINA-${prefix}\\tSM:${prefix}\\tPL:illumina" $fasta "${reads.baseName}".sai $reads | samtools sort -@ ${task.cpus} -O bam - > "${prefix}".sorted.bam
661-
samtools index "${prefix}".sorted.bam
657+
samtools index "${size}" "${prefix}".sorted.bam
662658
"""
663659
}
664660

@@ -703,19 +699,20 @@ process circularmapper{
703699

704700
output:
705701
file "*.sorted.bam" into ch_mapped_reads_idxstats_cm,ch_mapped_reads_filter_cm,ch_mapped_reads_preseq_cm, ch_mapped_reads_damageprofiler_cm
706-
file "*.bai"
702+
file "*.{bai,csi}"
707703

708704
script:
709705
filter = "${params.circularfilter}" ? '' : '-f true -x false'
710706
prefix = reads[0].toString() - ~/(_R1)?(\.combined\.)?(prefixed)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/
711707
fasta = "${index}/*_*.fasta"
708+
size = "${params.large_ref}" ? '-c' : ''
712709

713710
"""
714711
bwa aln -t ${task.cpus} $fasta $reads -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -f "${reads.baseName}.sai"
715712
bwa samse -r "@RG\\tID:ILLUMINA-${prefix}\\tSM:${prefix}\\tPL:illumina" $fasta "${reads.baseName}".sai $reads > tmp.out
716713
realignsamfile -e ${params.circularextension} -i tmp.out -r $fasta $filter
717714
samtools sort -@ ${task.cpus} -O bam tmp_realigned.bam > "${prefix}".sorted.bam
718-
samtools index "${prefix}".sorted.bam
715+
samtools index "${size}" "${prefix}".sorted.bam
719716
"""
720717
}
721718

@@ -731,15 +728,16 @@ process bwamem {
731728

732729
output:
733730
file "*.sorted.bam" into ch_bwamem_mapped_reads_idxstats,ch_bwamem_mapped_reads_filter,ch_bwamem_mapped_reads_preseq, ch_bwamem_mapped_reads_damageprofiler
734-
file "*.bai"
731+
file "*.{bai,csi}"
735732

736733

737734
script:
738735
prefix = reads[0].toString() - ~/(_R1)?(\.combined\.)?(prefixed)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/
739736
fasta = "${index}/*.fasta"
737+
size = "${params.large_ref}" ? '-c' : ''
740738
"""
741739
bwa mem -t ${task.cpus} $fasta $reads -R "@RG\\tID:ILLUMINA-${prefix}\\tSM:${prefix}\\tPL:illumina" | samtools sort -@ ${task.cpus} -O bam - > "${prefix}".sorted.bam
742-
samtools index -@ ${task.cpus} "${prefix}".sorted.bam
740+
samtools index "${size}" -@ ${task.cpus} "${prefix}".sorted.bam
743741
"""
744742
}
745743

@@ -786,38 +784,39 @@ process samtools_filter {
786784
file "*filtered.bam" into ch_bam_filtered_qualimap, ch_bam_filtered_dedup, ch_bam_filtered_markdup, ch_bam_filtered_pmdtools, ch_bam_filtered_angsd, ch_bam_filtered_gatk
787785
file "*.fastq.gz" optional true
788786
file "*.unmapped.bam" optional true
789-
file "*.bai"
787+
file "*.{bai,csi}"
790788

791789
script:
792790
prefix="$bam" - ~/(\.bam)?/
791+
size = "${params.large_ref}" ? '-c' : ''
793792

794793
if("${params.bam_discard_unmapped}" && "${params.bam_unmapped_type}" == "discard"){
795794
"""
796795
samtools view -h -b $bam -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.filtered.bam
797-
samtools index ${prefix}.filtered.bam
796+
samtools index "${size}" ${prefix}.filtered.bam
798797
"""
799798
} else if("${params.bam_discard_unmapped}" && "${params.bam_unmapped_type}" == "bam"){
800799
"""
801800
samtools view -h $bam | tee >(samtools view - -@ ${task.cpus} -f4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.unmapped.bam) >(samtools view - -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.filtered.bam)
802-
samtools index ${prefix}.filtered.bam
801+
samtools index "${size}" ${prefix}.filtered.bam
803802
"""
804803
} else if("${params.bam_discard_unmapped}" && "${params.bam_unmapped_type}" == "fastq"){
805804
"""
806805
samtools view -h $bam | tee >(samtools view - -@ ${task.cpus} -f4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.unmapped.bam) >(samtools view - -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.filtered.bam)
807-
samtools index ${prefix}.filtered.bam
806+
samtools index "${size}" ${prefix}.filtered.bam
808807
samtools fastq -tn ${prefix}.unmapped.bam | pigz -p ${task.cpus} > ${prefix}.unmapped.fastq.gz
809808
rm ${prefix}.unmapped.bam
810809
"""
811810
} else if("${params.bam_discard_unmapped}" && "${params.bam_unmapped_type}" == "both"){
812811
"""
813812
samtools view -h $bam | tee >(samtools view - -@ ${task.cpus} -f4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.unmapped.bam) >(samtools view - -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.filtered.bam)
814-
samtools index ${prefix}.filtered.bam
813+
samtools index "${size}" ${prefix}.filtered.bam
815814
samtools fastq -tn ${prefix}.unmapped.bam | pigz -p ${task.cpus} > ${prefix}.unmapped.fastq.gz
816815
"""
817816
} else { //Only apply quality filtering, default
818817
"""
819818
samtools view -h -b $bam -@ ${task.cpus} -q ${params.bam_mapping_quality_threshold} -o ${prefix}.filtered.bam
820-
samtools index ${prefix}.filtered.bam
819+
samtools index "${size}" ${prefix}.filtered.bam
821820
"""
822821
}
823822
}
@@ -841,25 +840,26 @@ process dedup{
841840
file "*.hist" into ch_hist_for_preseq
842841
file "*.log" into ch_dedup_results_for_multiqc
843842
file "${prefix}.sorted.bam" into ch_dedup_bam
844-
file "*.bai"
843+
file "*.{bai,csi}"
845844

846845
script:
847846
prefix="${bam.baseName}"
848847
treat_merged="${params.dedup_all_merged}" ? '-m' : ''
849-
848+
size = "${params.large_ref}" ? '-c' : ''
849+
850850
if(params.singleEnd) {
851851
"""
852852
dedup -i $bam $treat_merged -o . -u
853853
mv *.log dedup.log
854854
samtools sort -@ ${task.cpus} "$prefix"_rmdup.bam -o "$prefix".sorted.bam
855-
samtools index "$prefix".sorted.bam
855+
samtools index "${size}" "$prefix".sorted.bam
856856
"""
857857
} else {
858858
"""
859859
dedup -i $bam $treat_merged -o . -u
860860
mv *.log dedup.log
861861
samtools sort -@ ${task.cpus} "$prefix"_rmdup.bam -o "$prefix".sorted.bam
862-
samtools index "$prefix".sorted.bam
862+
samtools index "${size}" "$prefix".sorted.bam
863863
"""
864864
}
865865
}
@@ -907,7 +907,7 @@ process damageprofiler {
907907

908908
input:
909909
file bam from ch_mapped_reads_damageprofiler.mix(ch_mapped_reads_damageprofiler_cm,ch_bwamem_mapped_reads_damageprofiler)
910-
file fasta from ch_fasta_for_damageprofiler
910+
file fasta from ch_fasta_for_damageprofiler.first()
911911
file bai from ch_bam_index_for_damageprofiler
912912

913913

@@ -934,7 +934,7 @@ process qualimap {
934934

935935
input:
936936
file bam from ch_bam_filtered_qualimap
937-
file fasta from ch_fasta_for_qualimap
937+
file fasta from ch_fasta_for_qualimap.first()
938938

939939
output:
940940
file "*" into ch_qualimap_results
@@ -1037,15 +1037,16 @@ process bam_trim {
10371037

10381038
output:
10391039
file "*.trimmed.bam" into ch_trimmed_bam_for_genotyping
1040-
file "*.bai"
1040+
file "*.{bai,csi}"
10411041

10421042
script:
10431043
prefix="${bam.baseName}"
10441044
softclip = "${params.bamutils_softclip}" ? '-c' : ''
1045+
size = "${params.large_ref}" ? '-c' : ''
10451046
"""
10461047
bam trimBam $bam tmp.bam -L ${params.bamutils_clip_left} -R ${params.bamutils_clip_right} ${softclip}
10471048
samtools sort -@ ${task.cpus} tmp.bam -o ${prefix}.trimmed.bam
1048-
samtools index ${prefix}.trimmed.bam
1049+
samtools index "${size}" ${prefix}.trimmed.bam
10491050
"""
10501051
}
10511052

@@ -1139,12 +1140,12 @@ process multiqc {
11391140
file multiqc_config from ch_multiqc_config.collect().ifEmpty([])
11401141
file ('fastqc_raw/*') from ch_fastqc_results.collect().ifEmpty([])
11411142
file('fastqc/*') from ch_fastqc_after_clipping.collect().ifEmpty([])
1142-
file ('software_versions/*') from software_versions_yaml.collect().ifEmpty([])
1143+
file ('software_versions/software_versions_mqc*') from software_versions_yaml.collect().ifEmpty([])
11431144
file ('adapter_removal/*') from ch_adapterremoval_logs.collect().ifEmpty([])
11441145
file ('idxstats/*') from ch_idxstats_for_multiqc.collect().ifEmpty([])
11451146
file ('preseq/*') from ch_preseq_results.collect().ifEmpty([])
1146-
file ('damageprofiler/*') from ch_damageprofiler_results.collect().ifEmpty([])
1147-
file ('qualimap/*') from ch_qualimap_results.collect().ifEmpty([])
1147+
file ('damageprofiler/dmgprof*/*') from ch_damageprofiler_results.collect().ifEmpty([])
1148+
file ('qualimap/qualimap*/*') from ch_qualimap_results.collect().ifEmpty([])
11481149
file ('markdup/*') from ch_markdup_results_for_multiqc.collect().ifEmpty([])
11491150
file ('dedup*/*') from ch_dedup_results_for_multiqc.collect().ifEmpty([])
11501151
file ('fastp/*') from ch_fastp_for_multiqc.collect().ifEmpty([])

nextflow.config

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ params {
2323
tracedir = "${params.outdir}/pipeline_info"
2424
readPaths = false
2525
bam = false
26-
26+
large_ref = false
27+
2728
//More defaults
2829
complexity_filter = false
2930
complexity_filter_poly_g_min = 10

0 commit comments

Comments
 (0)