Skip to content

Split PREPARE_GENOME into REFERENCES + INDICES subworkflows#1851

Merged
pinin4fjords merged 9 commits into
devfrom
refactor/split-prepare-genome
May 8, 2026
Merged

Split PREPARE_GENOME into REFERENCES + INDICES subworkflows#1851
pinin4fjords merged 9 commits into
devfrom
refactor/split-prepare-genome

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

Summary

Closes #1721.

Splits the 33-input PREPARE_GENOME subworkflow into two focused subworkflows:

  • PREPARE_GENOME_REFERENCES (~20 inputs): FASTA / GTF / BED / transcript FASTA / chrom.sizes / rRNA FASTAs / Kraken DB
  • PREPARE_GENOME_INDICES (~24 inputs): per-aligner index build/load (STAR, RSEM, HISAT2, Bowtie2, Salmon, Kallisto, BBSplit, SortMeRNA)

Logic moved verbatim, only restructured. No user-facing parameter, output, or behaviour change.

main.nf now invokes both in sequence and feeds reference channels from _REFERENCES into _INDICES. Downstream RNASEQ workflow's take block is unchanged.

The 43 + 2 existing tests are split across the two new subworkflow test directories. Snapshot assertions for bare-path channels were rewritten as [file(p).name, path(p).md5] closures to work around an nf-test 0.9.5 / Nextflow 25.04.3 issue where bare-path serialization is non-deterministic when many tests share an invocation.

Test plan

  • CI green on nf-test for both new subworkflows
  • CPU and ARM CI green
  • GPU CI green for the parabricks INDICES test
  • Pipeline-level tests unchanged

🤖 Generated with Claude Code

Closes #1721. PREPARE_GENOME's 33-input take block split into two
focused subworkflows: PREPARE_GENOME_REFERENCES (FASTA/GTF/BED/transcript
fasta/chrom.sizes/rRNA/Kraken DB) and PREPARE_GENOME_INDICES (per-aligner
index build/load). No user-facing parameter, output, or behaviour change.

[skip ci]
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 189ac63

+| ✅ 215 tests passed       |+
#| ❔  19 tests were ignored |#
!| ❗   7 tests had warnings |!
Details

❗ Test warnings:

  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

❔ Tests ignored:

  • files_exist - File is ignored: conf/modules.config
  • files_exist - File is ignored: conf/containers_conda_lock_files_amd64.config
  • files_exist - File is ignored: conf/containers_conda_lock_files_arm64.config
  • files_exist - File is ignored: conf/containers_docker_amd64.config
  • files_exist - File is ignored: conf/containers_docker_arm64.config
  • files_exist - File is ignored: conf/containers_singularity_https_amd64.config
  • files_exist - File is ignored: conf/containers_singularity_https_arm64.config
  • files_exist - File is ignored: conf/containers_singularity_oras_amd64.config
  • files_exist - File is ignored: conf/containers_singularity_oras_arm64.config
  • nextflow_config - Config default ignored: params.ribo_database_manifest
  • nf_test_content - nf_test_content
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: assets/nf-core-rnaseq_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-rnaseq_logo_dark.png
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
  • actions_nf_test - actions_nf_test
  • modules_config - modules_config
  • container_configs - container_configs

✅ Tests passed:

Run details

  • nf-core/tools version 4.0.2
  • Run at 2026-05-08 17:19:48

Copy link
Copy Markdown
Contributor

@adamrtalbot adamrtalbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good - I'm adding a comment on whether this is the best strategy or should we organise it differently but that's up for debate.

Comment thread subworkflows/local/prepare_genome_references/main.nf
Comment thread subworkflows/local/prepare_genome_references/main.nf Outdated
Comment thread subworkflows/local/prepare_genome_indices/main.nf
Per review feedback (#1851 r3209803147), emit fasta+fai as
[meta, fasta, fai] from PREPARE_GENOME_REFERENCES. INDICES, main.nf,
and the RNASEQ workflow take block updated to consume the tuple
directly; the previous ad-hoc combine in workflows/rnaseq/main.nf
that rebuilt this triple is dropped.

Snapshots regenerated.
@pinin4fjords pinin4fjords enabled auto-merge May 8, 2026 17:21
@pinin4fjords pinin4fjords merged commit c8a688b into dev May 8, 2026
124 of 126 checks passed
@pinin4fjords pinin4fjords deleted the refactor/split-prepare-genome branch May 11, 2026 08:18
@pinin4fjords pinin4fjords linked an issue May 11, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Split prepare_genome subworkflow into smaller subworkflows

2 participants