Skip to content

demo: Record types for workflow outputs#1703

Closed
pinin4fjords wants to merge 1 commit into
nf-core:workflow-outputsfrom
pinin4fjords:demo/record-types-workflow-outputs
Closed

demo: Record types for workflow outputs#1703
pinin4fjords wants to merge 1 commit into
nf-core:workflow-outputsfrom
pinin4fjords:demo/record-types-workflow-outputs

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

Summary

Proof-of-concept showing how Nextflow record types would simplify the workflow outputs pattern introduced in the workflow-outputs branch.

This branch converts 38 files to use record types, replacing groups of related tuple-based channels with single record-typed channels. The code requires nextflow.preview.types = true and a future Nextflow version that implements PR #6679 - it won't parse on any current release.

How record types address workflow outputs limitations

The workflow-outputs branch requires every output file to have its own named channel threaded from process to subworkflow to workflow to publish block. For a pipeline like rnaseq with ~130 published outputs, this creates significant verbosity at every layer.

Record types help in three main ways:

1. Fewer channels to declare and thread

The biggest win. Instead of declaring, assigning, and emitting one channel per file, related outputs travel together as a single record. For example, the 6 MarkDuplicates outputs (bam, bai, metrics, stats, flagstat, idxstats) become one MarkDupResult channel. Similarly, the 22 RSeQC outputs collapse into one RSeQCResult using nested records for junction_annotation, read_duplication, and inner_distance sub-groups.

Concrete impact:

  • workflows/rnaseq/main.nf: 321 fewer lines (1326 -> 1005), mostly from eliminating channel.empty() declarations and per-field assignments in the emit block (190 lines -> 92 lines)
  • main.nf: 50 fewer lines (637 -> 587), despite the publish block now needing .map calls to extract fields

2. Process outputs are self-documenting

The record() output function groups related files with named fields and types, replacing opaque tuple val(meta), path(...) declarations. Each process defines its own record type at the top of the file, making the output contract explicit. Nullable types (Path?) replace optional: true on individual path qualifiers, and nested records (e.g. JunctionAnnotationResult inside RSeQCResult) express hierarchical relationships that flat tuples can't.

3. Subworkflows can pass through records without destructuring

Subworkflows that don't need to inspect individual fields can accept and emit whole records, avoiding the boilerplate of splitting a process output into N channels just to re-combine them later. The align_star subworkflow is a good example: it now emits a single StarAlignResult instead of threading 5 separate STAR log/tab channels.

Limitations and open questions

Record types don't eliminate all verbosity - some remains structural:

  • The publish block still needs per-file granularity. Each output goes to a specific directory with specific enabled conditions. Records group files that a process produces together, but the publish block needs to route each file separately. This means .map { r -> [r.meta, r.field] } calls in the publish section, which are more verbose than the direct dot-access (channel.field) originally attempted here. Channel-level field projection (e.g. ch.field returning a channel of that field) would eliminate this, but is explicitly out of scope for the initial record types PR.

  • stage: syntax for input staging is part of PR #6679 but hasn't been widely tested. The 3 STAR modules and tximeta/tximport need stageAs qualifiers on their inputs, which use the new stage: block syntax.

  • Publishing whole records to a single directory (used here for consolidated outputs like pseudo, rsem, star_salmon, deseq2) assumes Nextflow will auto-publish all Path fields. This behavior isn't confirmed yet.

  • Module portability. Adding record FooResult { ... } to the top of each module file means record type definitions are local to each module. Cross-module sharing (e.g. the 3 STAR variants sharing StarAlignResult) requires defining the record in each file. A future module-level type registry or shared type imports would improve this.

Net assessment

Record types would meaningfully reduce the boilerplate of the workflow outputs pattern - roughly a 25% reduction in the main workflow file, concentrated in the areas (channel declarations, assignments, emit blocks) where the verbosity is most painful. The biggest remaining source of verbosity is the publish block, which inherently needs per-file routing regardless of how channels are structured. A future channel-level field projection feature would address this.

Test plan

  • Visual inspection only - requires unmerged Nextflow features
  • Verify no remaining TODO comments about stageAs
  • Verify publish block uses .map (not direct dot-access) for record field extraction

🤖 Generated with Claude Code

Demonstrates how record types (nextflow-io/nextflow#6679) would
simplify the workflow outputs pattern by replacing groups of related
tuple-based channels with single record-typed channels.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nf-core-bot
Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Comment thread main.nf
Comment on lines +348 to +353
markdup_bam = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.bam] }.ifEmpty([])
markdup_bai = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.bai] }.ifEmpty([])
markdup_metrics = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.metrics] }.ifEmpty([])
markdup_stats = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.stats] }.ifEmpty([])
markdup_flagstat = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.flagstat] }.ifEmpty([])
markdup_idxstats = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.idxstats] }.ifEmpty([])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit in breaking up a wide record like this?

Comment on lines +3 to +12
record DeSeq2Result {
pdf: Path?
rdata: Path?
pca_txt: Path?
pca_multiqc: Path?
dists_txt: Path?
dists_multiqc: Path?
log: Path?
size_factors: Path?
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do you use this record?

@pinin4fjords
Copy link
Copy Markdown
Member Author

Closing for now. The record types feature landed upstream and thinking on the workflow outputs pattern has moved on since this was drafted. Will revisit with a fresh branch when we're ready to adopt record types in earnest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants