demo: Record types for workflow outputs by pinin4fjords · Pull Request #1703 · nf-core/rnaseq

pinin4fjords · 2026-02-13T14:30:50Z

Summary

Proof-of-concept showing how Nextflow record types would simplify the workflow outputs pattern introduced in the workflow-outputs branch.

This branch converts 38 files to use record types, replacing groups of related tuple-based channels with single record-typed channels. The code requires nextflow.preview.types = true and a future Nextflow version that implements PR #6679 - it won't parse on any current release.

How record types address workflow outputs limitations

The workflow-outputs branch requires every output file to have its own named channel threaded from process to subworkflow to workflow to publish block. For a pipeline like rnaseq with ~130 published outputs, this creates significant verbosity at every layer.

Record types help in three main ways:

1. Fewer channels to declare and thread

The biggest win. Instead of declaring, assigning, and emitting one channel per file, related outputs travel together as a single record. For example, the 6 MarkDuplicates outputs (bam, bai, metrics, stats, flagstat, idxstats) become one MarkDupResult channel. Similarly, the 22 RSeQC outputs collapse into one RSeQCResult using nested records for junction_annotation, read_duplication, and inner_distance sub-groups.

Concrete impact:

workflows/rnaseq/main.nf: 321 fewer lines (1326 -> 1005), mostly from eliminating channel.empty() declarations and per-field assignments in the emit block (190 lines -> 92 lines)
main.nf: 50 fewer lines (637 -> 587), despite the publish block now needing .map calls to extract fields

2. Process outputs are self-documenting

The record() output function groups related files with named fields and types, replacing opaque tuple val(meta), path(...) declarations. Each process defines its own record type at the top of the file, making the output contract explicit. Nullable types (Path?) replace optional: true on individual path qualifiers, and nested records (e.g. JunctionAnnotationResult inside RSeQCResult) express hierarchical relationships that flat tuples can't.

3. Subworkflows can pass through records without destructuring

Subworkflows that don't need to inspect individual fields can accept and emit whole records, avoiding the boilerplate of splitting a process output into N channels just to re-combine them later. The align_star subworkflow is a good example: it now emits a single StarAlignResult instead of threading 5 separate STAR log/tab channels.

Limitations and open questions

Record types don't eliminate all verbosity - some remains structural:

The publish block still needs per-file granularity. Each output goes to a specific directory with specific enabled conditions. Records group files that a process produces together, but the publish block needs to route each file separately. This means .map { r -> [r.meta, r.field] } calls in the publish section, which are more verbose than the direct dot-access (channel.field) originally attempted here. Channel-level field projection (e.g. ch.field returning a channel of that field) would eliminate this, but is explicitly out of scope for the initial record types PR.
stage: syntax for input staging is part of PR #6679 but hasn't been widely tested. The 3 STAR modules and tximeta/tximport need stageAs qualifiers on their inputs, which use the new stage: block syntax.
Publishing whole records to a single directory (used here for consolidated outputs like pseudo, rsem, star_salmon, deseq2) assumes Nextflow will auto-publish all Path fields. This behavior isn't confirmed yet.
Module portability. Adding record FooResult { ... } to the top of each module file means record type definitions are local to each module. Cross-module sharing (e.g. the 3 STAR variants sharing StarAlignResult) requires defining the record in each file. A future module-level type registry or shared type imports would improve this.

Net assessment

Record types would meaningfully reduce the boilerplate of the workflow outputs pattern - roughly a 25% reduction in the main workflow file, concentrated in the areas (channel declarations, assignments, emit blocks) where the verbosity is most painful. The biggest remaining source of verbosity is the publish block, which inherently needs per-file routing regardless of how channels are structured. A future channel-level field projection feature would address this.

Test plan

Visual inspection only - requires unmerged Nextflow features
Verify no remaining TODO comments about stageAs
Verify publish block uses .map (not direct dot-access) for record field extraction

🤖 Generated with Claude Code

Demonstrates how record types (nextflow-io/nextflow#6679) would simplify the workflow outputs pattern by replacing groups of related tuple-based channels with single record-typed channels. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nf-core-bot · 2026-02-13T14:31:27Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.5.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

adamrtalbot · 2026-02-13T16:55:26Z

+    markdup_bam        = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.bam] }.ifEmpty([])
+    markdup_bai        = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.bai] }.ifEmpty([])
+    markdup_metrics    = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.metrics] }.ifEmpty([])
+    markdup_stats      = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.stats] }.ifEmpty([])
+    markdup_flagstat   = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.flagstat] }.ifEmpty([])
+    markdup_idxstats   = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.idxstats] }.ifEmpty([])


What's the benefit in breaking up a wide record like this?

adamrtalbot · 2026-02-26T14:43:56Z

+record DeSeq2Result {
+    pdf:           Path?
+    rdata:         Path?
+    pca_txt:       Path?
+    pca_multiqc:   Path?
+    dists_txt:     Path?
+    dists_multiqc: Path?
+    log:           Path?
+    size_factors:  Path?
+}


When do you use this record?

pinin4fjords · 2026-04-17T16:47:41Z

Closing for now. The record types feature landed upstream and thinking on the workflow outputs pattern has moved on since this was drafted. Will revisit with a fresh branch when we're ready to adopt record types in earnest.

adamrtalbot reviewed Feb 13, 2026

View reviewed changes

adamrtalbot reviewed Feb 26, 2026

View reviewed changes

pinin4fjords closed this Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo: Record types for workflow outputs#1703

demo: Record types for workflow outputs#1703
pinin4fjords wants to merge 1 commit into
nf-core:workflow-outputsfrom
pinin4fjords:demo/record-types-workflow-outputs

pinin4fjords commented Feb 13, 2026

Uh oh!

nf-core-bot commented Feb 13, 2026

Uh oh!

adamrtalbot Feb 13, 2026

Uh oh!

adamrtalbot Feb 26, 2026

Uh oh!

pinin4fjords commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pinin4fjords commented Feb 13, 2026

Summary

How record types address workflow outputs limitations

1. Fewer channels to declare and thread

2. Process outputs are self-documenting

3. Subworkflows can pass through records without destructuring

Limitations and open questions

Net assessment

Test plan

Uh oh!

nf-core-bot commented Feb 13, 2026

Uh oh!

adamrtalbot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

adamrtalbot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

pinin4fjords commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants