Skip to content

Commit 51681fb

Browse files
committed
update kraken process for multiqc updated kraken module
1 parent 43958a0 commit 51681fb

2 files changed

Lines changed: 8 additions & 20 deletions

File tree

docs/output.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -669,7 +669,11 @@ Each module has it's own output directory which sit alongside the `MultiQC/` dir
669669
- `metagenomic_complexity_filter` - this contains the output from filtering of input reads to metagenomic classification of low-sequence complexity reads as performed by `bbduk`. This will include the filtered FASTQ files (`*_lowcomplexityremoved.fq.gz`) and also the run-time log (`_bbduk.stats`) for each sample. **Note:** there are no sections in the MultiQC report for this module, therefore you must check the `._bbduk.stats` files to get summary statistics of the filtering.
670670
- `metagenomic_classification/` - this contains the output for a given metagenomic classifier.
671671
- Running MALT will contain RMA6 files that can be loaded into MEGAN6 or MaltExtract for phylogenetic visualisation of read taxonomic assignments and aDNA characteristics respectively. Additional a `malt.log` file is provided which gives additional information such as run-time, memory usage and per-sample statistics of numbers of alignments with taxonomic assignment etc. This will also include gzip SAM files if requested.
672-
- Running kraken will contain the Kraken output and report files, as well as a merged Taxon count table. You will also get a Kraken kmer duplication table, in a [KrakenUniq](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1568-0) fashion. This is very useful to check for breadth of coverage and detect read stacking. A small number of aligned reads (low coverage) and a kmer duplication >1 is usually a sign of read stacking, usually indicative of a false positive hit (e.g. from over-amplified libraries). *Kmer duplication is defined as: number of kmers / number of unique kmers*
672+
- Running kraken will contain the Kraken output and report files, as well as a merged Taxon count table. You will also get a Kraken kmer duplication table, in a [KrakenUniq](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1568-0) fashion. This is very useful to check for breadth of coverage and detect read stacking. A small number of aligned reads (low coverage) and a kmer duplication >1 is usually a sign of read stacking, usually indicative of a false positive hit (e.g. from over-amplified libraries). *Kmer duplication is defined as: number of kmers / number of unique kmers*. You will find two kraken reports formats available:
673+
- the `*.kreport` which is the old report format, without distinct minimizer count information, used by some tools such as [Pavian](https://github.com/fbreitwieser/pavian)
674+
- the `*.kraken2_report` which is the new kraken report format, with the distinct minimizer count information.
675+
676+
Finally, the `*.kraken.out` file are the direct output of Kraken2
673677
- `maltextract/` - this contains a `results` directory in which contains the output from MaltExtract - typically one folder for each filter type, an error and a log file. The characteristics of each node (e.g. damage, read lengths, edit distances - each in different txt formats) can be seen in each sub-folder of the filter folders. Output can be visualised either with the [HOPS postprocessing script](https://github.com/rhuebler/HOPS) or [MEx-IPA](https://github.com/jfy133/MEx-IPA)
674678
- `consensus_sequence/` - this contains three FASTA files from VCF2Genome of a consensus sequence based on the reference FASTA with each sample's unique modifications. The main FASTA is a standard file with bases not passing the specified thresholds as Ns. The two other FASTAS (`_refmod.fasta.gz`) and (`_uncertainity.fasta.gz`) are IUPAC uncertainty codes (rather than Ns) and a special number-based uncertainty system used for other downstream tools, respectively.
675679
- `librarymerged_bams/` - these contain the final BAM files that would go into genotyping (if genotyping is turned on). This means the files will contain all libraries of a given sample (including trimmed non-UDG or half-UDG treated libraries, if BAM trimming turned on)

main.nf

Lines changed: 3 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -2894,33 +2894,17 @@ process kraken {
28942894

28952895
output:
28962896
file "*.kraken.out" into ch_kraken_out
2897-
tuple prefix, path("*.kraken2_report") into ch_kraken_report, ch_kraken_report_backward_compatibility
2897+
tuple prefix, path("*.kraken2_report") into ch_kraken_report, ch_kraken_for_multiqc
28982898

28992899
script:
29002900
prefix = fastq.toString().tokenize('.')[0]
29012901
out = prefix+".kraken.out"
29022902
kreport = prefix+".kraken2_report"
2903+
kreport_old = prefix+".kreport"
29032904

29042905
"""
29052906
kraken2 --db ${krakendb} --threads ${task.cpus} --output $out --report-minimizer-data --report $kreport $fastq
2906-
"""
2907-
}
2908-
2909-
process kraken_report_backward_compatibility {
2910-
tag "$prefix"
2911-
label 'sc_tiny'
2912-
2913-
input:
2914-
tuple val(prefix), path(kraken_r) from ch_kraken_report_backward_compatibility
2915-
2916-
output:
2917-
tuple prefix, path("*.kreport") into ch_kraken_for_multiqc
2918-
2919-
script:
2920-
kreport = prefix+".kreport"
2921-
2922-
"""
2923-
cut -f1-3,6-8 $kraken_r > $kreport
2907+
cut -f1-3,6-8 $kreport > $kreport_old
29242908
"""
29252909
}
29262910

0 commit comments

Comments
 (0)