Add cram support + read splitting with seqkit for speedup by FriederikeHanssen · Pull Request #388 · nf-core/sarek

FriederikeHanssen · 2021-06-15T20:43:15Z

BAMQC is still a bit of an open problem, especially after base recalibration....

Important change: BamQC + Samtools Stats is only done ONCE BEFORE BaseRecalibration right now: if duplicates are marked it is run after duplicate marking, if they are not it is run after mapping. This reduces the runtime, since MarkDuplicates can take care of merging split reads internally without runtime punishments.
Spark implementation currently only works with singularity. The docker image has issue, and would have to possibly be rebuild :(

PR checklist

github-actions · 2021-06-15T20:44:20Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 22d55ab

+| ✅ 135 tests passed       |+
#| ❔  11 tests were ignored |#
!| ❗  84 tests had warnings |!

Details

❗ Test warnings:

files_exist - File not found: environment.yml
files_exist - File not found: Dockerfile
nextflow_config - Config variable not found: process.container
params_used - Config variable not found in main.nf: params.input
params_used - Config variable not found in main.nf: params.step
params_used - Config variable not found in main.nf: params.genome
params_used - Config variable not found in main.nf: params.genomes_base
params_used - Config variable not found in main.nf: params.save_reference
params_used - Config variable not found in main.nf: params.help
params_used - Config variable not found in main.nf: params.no_intervals
params_used - Config variable not found in main.nf: params.nucleotides_per_second
params_used - Config variable not found in main.nf: params.sentieon
params_used - Config variable not found in main.nf: params.skip_qc
params_used - Config variable not found in main.nf: params.target_bed
params_used - Config variable not found in main.nf: params.tools
params_used - Config variable not found in main.nf: params.trim_fastq
params_used - Config variable not found in main.nf: params.clip_r1
params_used - Config variable not found in main.nf: params.clip_r2
params_used - Config variable not found in main.nf: params.three_prime_clip_r1
params_used - Config variable not found in main.nf: params.three_prime_clip_r2
params_used - Config variable not found in main.nf: params.trim_nextseq
params_used - Config variable not found in main.nf: params.save_trimmed
params_used - Config variable not found in main.nf: params.split_fastq
params_used - Config variable not found in main.nf: params.aligner
params_used - Config variable not found in main.nf: params.markdup_java_options
params_used - Config variable not found in main.nf: params.use_gatk_spark
params_used - Config variable not found in main.nf: params.save_bam_mapped
params_used - Config variable not found in main.nf: params.skip_markduplicates
params_used - Config variable not found in main.nf: params.ascat_ploidy
params_used - Config variable not found in main.nf: params.ascat_purity
params_used - Config variable not found in main.nf: params.cf_coeff
params_used - Config variable not found in main.nf: params.cf_contamination
params_used - Config variable not found in main.nf: params.cf_contamination_adjustment
params_used - Config variable not found in main.nf: params.cf_ploidy
params_used - Config variable not found in main.nf: params.cf_window
params_used - Config variable not found in main.nf: params.generate_gvcf
params_used - Config variable not found in main.nf: params.no_strelka_bp
params_used - Config variable not found in main.nf: params.pon
params_used - Config variable not found in main.nf: params.pon_index
params_used - Config variable not found in main.nf: params.ignore_soft_clipped_bases
params_used - Config variable not found in main.nf: params.umi
params_used - Config variable not found in main.nf: params.read_structure1
params_used - Config variable not found in main.nf: params.read_structure2
params_used - Config variable not found in main.nf: params.annotate_tools
params_used - Config variable not found in main.nf: params.annotation_cache
params_used - Config variable not found in main.nf: params.cadd_cache
params_used - Config variable not found in main.nf: params.cadd_indels
params_used - Config variable not found in main.nf: params.cadd_indels_tbi
params_used - Config variable not found in main.nf: params.cadd_wg_snvs
params_used - Config variable not found in main.nf: params.cadd_wg_snvs_tbi
params_used - Config variable not found in main.nf: params.genesplicer
params_used - Config variable not found in main.nf: params.snpeff_cache
params_used - Config variable not found in main.nf: params.config_profile_contact
params_used - Config variable not found in main.nf: params.config_profile_description
params_used - Config variable not found in main.nf: params.config_profile_url
params_used - Config variable not found in main.nf: params.outdir
params_used - Config variable not found in main.nf: params.publish_dir_mode
params_used - Config variable not found in main.nf: params.sequencing_center
params_used - Config variable not found in main.nf: params.multiqc_config
params_used - Config variable not found in main.nf: params.monochrome_logs
params_used - Config variable not found in main.nf: params.email
params_used - Config variable not found in main.nf: params.email_on_fail
params_used - Config variable not found in main.nf: params.plaintext_email
params_used - Config variable not found in main.nf: params.max_multiqc_email_size
params_used - Config variable not found in main.nf: params.hostnames
params_used - Config variable not found in main.nf: params.validate_params
params_used - Config variable not found in main.nf: params.tracedir
params_used - Config variable not found in main.nf: params.enable_conda
params_used - Config variable not found in main.nf: params.pull_docker_container
params_used - Config variable not found in main.nf: params.cpus
params_used - Config variable not found in main.nf: params.max_cpus
params_used - Config variable not found in main.nf: params.max_memory
params_used - Config variable not found in main.nf: params.max_time
actions_awsfulltest - .github/workflows/awsfulltest.yml should test full datasets, not -profile test
readme - README did not have a Nextflow minimum version badge.
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
pipeline_todos - TODO string in awstest.yml: You can customise CI pipeline run tests as required
pipeline_todos - TODO string in freebayes.nf: Named file extensions MUST be emitted for ALL output channels
pipeline_todos - TODO string in freebayes.nf: List additional required output channels/values here
pipeline_todos - TODO string in main.nf: It MUST be possible to pass additional parameters to the tool as a command-line string via the "$ioptions.args" variable
pipeline_todos - TODO string in main.nf: If the tool supports multi-threading then you MUST provide the appropriate parameter
pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
schema_description - No description provided in schema for parameter: cpus

❔ Tests ignored:

files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/feature_request.md
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo.png
files_unchanged - File ignored due to lint config: lib/NfcoreSchema.groovy
files_unchanged - File ignored due to lint config: .gitignore or foo
files_unchanged - File does not exist: .github/workflows/push_dockerhub_dev.yml
files_unchanged - File does not exist: .github/workflows/push_dockerhub_release.yml
conda_env_yaml - No environment.yml file found - skipping conda_env_yaml test
conda_dockerfile - No environment.yml / Dockerfile file found - skipping conda_dockerfile test

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: CHANGELOG.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File found: .github/markdownlint.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-sarek_logo.png
files_exist - File found: bin/markdown_to_html.py
files_exist - File found: docs/images/nf-core-sarek_logo.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreSchema.groovy
files_exist - File found: main.nf
files_exist - File found: conf/base.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.show_hidden_params
nextflow_config - Config variable found: params.schema_ignore_params
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.version
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .svg
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: '3.0dev'
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
params_used - Config variable found in main.nf: params.vep_cache
params_used - Config variable found in main.nf: params.ac_loci
params_used - Config variable found in main.nf: params.ac_loci_gc
params_used - Config variable found in main.nf: params.bwa
params_used - Config variable found in main.nf: params.chr_dir
params_used - Config variable found in main.nf: params.chr_length
params_used - Config variable found in main.nf: params.dbsnp
params_used - Config variable found in main.nf: params.dbsnp_index
params_used - Config variable found in main.nf: params.dict
params_used - Config variable found in main.nf: params.fasta
params_used - Config variable found in main.nf: params.fasta_fai
params_used - Config variable found in main.nf: params.germline_resource
params_used - Config variable found in main.nf: params.germline_resource_index
params_used - Config variable found in main.nf: params.intervals
params_used - Config variable found in main.nf: params.known_indels
params_used - Config variable found in main.nf: params.known_indels_index
params_used - Config variable found in main.nf: params.mappability
params_used - Config variable found in main.nf: params.snpeff_db
params_used - Config variable found in main.nf: params.vep_cache_version
params_used - Config variable found in main.nf: params.vep_genome
params_used - Config variable found in main.nf: params.vep_species
files_unchanged - .gitattributes matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/markdownlint.yml matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - bin/markdown_to_html.py matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - assets/multiqc_config.yaml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (267 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_params - Schema matched params returned from nextflow config
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: local_modules.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: branch.yml
merge_markers - No merge markers found in pipeline files

Run details

nf-core/tools version 1.14
Run at 2021-07-02 08:20:47

…scratch space' q

maxulysse · 2021-06-18T07:41:06Z

will close #63

maxulysse · 2021-06-21T09:18:11Z

@FriederikeHanssen your branch is out-of-date with the base branch due to #390
You might want to sync

FriederikeHanssen · 2021-06-21T12:18:26Z

ups, thanks for the heads up

FriederikeHanssen · 2021-07-15T14:42:47Z

maxulysse · 2021-07-15T15:01:05Z

+// Import generic module functions
+include { initOptions; saveFiles; getSoftwareName } from './functions'
+
+params.options = [:]
+options        = initOptions(params.options)
+
+process INDEX_TARGET_BED {
+    tag "$target_bed"
+    label 'process_medium'
+    publishDir "${params.outdir}",
+        mode: params.publish_dir_mode,
+        saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) }
+
+    conda (params.enable_conda ? "bioconda::htslib=1.12" : null)
+    if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
+        //TODO: No singularity container at the moment, use docker container for the moment
+        container "quay.io/biocontainers/htslib:1.12--h9093b5e_1"
+    } else {
+        container "quay.io/biocontainers/htslib:1.12--hd3b49d5_0"
+    }
+
+    input:
+    path target_bed
+
+    output:
+    tuple path("${target_bed}.gz"), path("${target_bed}.gz.tbi")
+
+    script:
+    """
+    bgzip --threads ${task.cpus} -c ${target_bed} > ${target_bed}.gz
+    tabix ${target_bed}.gz
+    """
+}


I added that in nf-core/modules

maxulysse · 2021-07-15T15:05:26Z

+try {
+  includeConfig 'conf/base.config'
+} catch (Exception e) {
+  System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/base.config")


Note sure about that

ok, yeah I had problems were the error messsage for failing to load configs was very confusing. But maybe this is also something to deal with upstream and not in sarek

no worry, we'll fix that

maxulysse · 2021-07-15T15:05:45Z

+try {
+  includeConfig 'conf/modules.config'
+} catch (Exception e) {
+  System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/modules.config")


maxulysse

Love it

Add cram support, read splitting

6ce647d

FriederikeHanssen added 9 commits June 16, 2021 10:15

Add estimate library complexity if spark is used

21a695a

Fixes resume problem, but is losing the file name...

a634a79

Add tmp dir to gatk processes so tmp files are written to the proper …

1632d4b

…scratch space' q

Fix filename display + resume for TABIX

ab0c1c8

Try to get spark to work

e8290a1

Add MDSpark back in

809d321

try with runoptions

6ba8720

The newest gatk container is not working for me with spark, 4.1.9.0 is

46dcdc2

Add docker.userEmulation back in

41649cc

FriederikeHanssen added 2 commits June 21, 2021 09:35

Add more spark things

b4dd4ca

Fix module params

147d6b4

FriederikeHanssen added 2 commits June 21, 2021 12:38

Publish ref

53d69fa

try with path instead of file

5b4fc53

FriederikeHanssen added 12 commits June 21, 2021 17:29

try with path instead of file

1158010

try with fromFile instead

9b1768f

file

6323fbe

whole fixownership seems to work

838932f

Merge remote-tracking branch 'upstream/dsl2' into dsl2

dd4f783

Add numLanes to meta sheet to deal with blocked mapping output channels

39403ec

Add channel dumping to check for missing id

a62acb1

use groupKey instead

cd51cbc

not sure if this works, but run a bigger test for this

21c168e

Simplify mapping epression

d4354d6

remove unused sw & add ref to samtools stats

26448a6

Add skip_fastqc in again

87b6831

FriederikeHanssen added 17 commits June 27, 2021 22:58

hardcode number of intervals for tests

3fd8078

fix docker image tag, can't find singularity one

493a147

add haplotypecalelr back in

dd88aa3

count num intervals with map oprator

ec7a2d5

collect dbsnp tbi to avoid consumption of channel

1be7416

add gvcf back in

206db89

Add bamqc after bqsr with crams

9ce70a0

Use docker image for htslib + singularity

047e5cf

add dbsnp back in

19c911a

add dbsnp back in

2342971

Resolve merge conflicts

5e8f2cc

add step/tools to indices wf

4f974f6

Resolve remaining merge conflicts/fix problems

d766d44

add try/catch to figure out why module conf is not loaded

3a37778

Split by num reads instead of parts to generate similar sized files

22d55ab

Code clean up

3f26535

Fix merge conflicts

069a4f1

FriederikeHanssen marked this pull request as ready for review July 15, 2021 14:43

FriederikeHanssen requested a review from maxulysse as a code owner July 15, 2021 14:43

maxulysse reviewed Jul 15, 2021

View reviewed changes

Comment thread conf/modules.config Outdated

apply suggestions from code review

1866022

maxulysse reviewed Jul 15, 2021

View reviewed changes

maxulysse merged commit 26a1e23 into nf-core:dsl2 Jul 15, 2021

This was referenced Jul 15, 2021

[FEATURE] Have non blocking channel out of bwamem #362

Closed

Collect MarkDuplicatesSpark metrics with the standalone Picard tool EstimateLibraryComplexity #81

Closed

FriederikeHanssen deleted the dsl2 branch July 10, 2023 18:59

Conversation

FriederikeHanssen commented Jun 15, 2021

PR checklist

Uh oh!

github-actions Bot commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

Uh oh!

maxulysse commented Jun 18, 2021

Uh oh!

maxulysse commented Jun 21, 2021

Uh oh!

FriederikeHanssen commented Jun 21, 2021

Uh oh!

FriederikeHanssen commented Jul 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

maxulysse Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

maxulysse Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

FriederikeHanssen Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

maxulysse Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

maxulysse Jul 15, 2021

Choose a reason for hiding this comment

Uh oh!

maxulysse left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 15, 2021 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

FriederikeHanssen commented Jul 15, 2021 •

edited

Loading