Add cram support + read splitting with seqkit for speedup#388
Add cram support + read splitting with seqkit for speedup#388maxulysse merged 73 commits intonf-core:dsl2from
Conversation
|
|
will close #63 |
|
@FriederikeHanssen your branch is out-of-date with the base branch due to #390 |
|
ups, thanks for the heads up |
|
@maxulysse I cleaned up the code a bit now. there are still quite some things that need to be fixed/discussed in separate PRs. I'll add a collection here for context and turn it into cards on the project board: Things that need to still be fixed from PR #388
The CI tests are not passing anymore, since they are not bams but crams. I don't really know what is going on with the nf-core lint testing. the error message is a bit cryptic to me. I'll updraft it for you to take a look. then we can see what I should fix here before merge and what can wait for a later PR |
| // Import generic module functions | ||
| include { initOptions; saveFiles; getSoftwareName } from './functions' | ||
|
|
||
| params.options = [:] | ||
| options = initOptions(params.options) | ||
|
|
||
| process INDEX_TARGET_BED { | ||
| tag "$target_bed" | ||
| label 'process_medium' | ||
| publishDir "${params.outdir}", | ||
| mode: params.publish_dir_mode, | ||
| saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } | ||
|
|
||
| conda (params.enable_conda ? "bioconda::htslib=1.12" : null) | ||
| if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { | ||
| //TODO: No singularity container at the moment, use docker container for the moment | ||
| container "quay.io/biocontainers/htslib:1.12--h9093b5e_1" | ||
| } else { | ||
| container "quay.io/biocontainers/htslib:1.12--hd3b49d5_0" | ||
| } | ||
|
|
||
| input: | ||
| path target_bed | ||
|
|
||
| output: | ||
| tuple path("${target_bed}.gz"), path("${target_bed}.gz.tbi") | ||
|
|
||
| script: | ||
| """ | ||
| bgzip --threads ${task.cpus} -c ${target_bed} > ${target_bed}.gz | ||
| tabix ${target_bed}.gz | ||
| """ | ||
| } No newline at end of file |
There was a problem hiding this comment.
I added that in nf-core/modules
| try { | ||
| includeConfig 'conf/base.config' | ||
| } catch (Exception e) { | ||
| System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/base.config") |
There was a problem hiding this comment.
ok, yeah I had problems were the error messsage for failing to load configs was very confusing. But maybe this is also something to deal with upstream and not in sarek
| try { | ||
| includeConfig 'conf/modules.config' | ||
| } catch (Exception e) { | ||
| System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/modules.config") |
BAMQC is still a bit of an open problem, especially after base recalibration....
Important change: BamQC + Samtools Stats is only done ONCE BEFORE BaseRecalibration right now: if duplicates are marked it is run after duplicate marking, if they are not it is run after mapping. This reduces the runtime, since MarkDuplicates can take care of merging split reads internally without runtime punishments.
Spark implementation currently only works with singularity. The docker image has issue, and would have to possibly be rebuild :(
PR checklist
scrape_software_versions.pynf-core lint .).nextflow run . -profile test,docker).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).