demo: Record types for workflow outputs#1703
Closed
pinin4fjords wants to merge 1 commit into
Closed
Conversation
Demonstrates how record types (nextflow-io/nextflow#6679) would simplify the workflow outputs pattern by replacing groups of related tuple-based channels with single record-typed channels. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Member
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.5.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
adamrtalbot
reviewed
Feb 13, 2026
Comment on lines
+348
to
+353
| markdup_bam = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.bam] }.ifEmpty([]) | ||
| markdup_bai = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.bai] }.ifEmpty([]) | ||
| markdup_metrics = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.metrics] }.ifEmpty([]) | ||
| markdup_stats = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.stats] }.ifEmpty([]) | ||
| markdup_flagstat = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.flagstat] }.ifEmpty([]) | ||
| markdup_idxstats = NFCORE_RNASEQ.out.markdup.map { r -> [r.meta, r.idxstats] }.ifEmpty([]) |
Contributor
There was a problem hiding this comment.
What's the benefit in breaking up a wide record like this?
adamrtalbot
reviewed
Feb 26, 2026
Comment on lines
+3
to
+12
| record DeSeq2Result { | ||
| pdf: Path? | ||
| rdata: Path? | ||
| pca_txt: Path? | ||
| pca_multiqc: Path? | ||
| dists_txt: Path? | ||
| dists_multiqc: Path? | ||
| log: Path? | ||
| size_factors: Path? | ||
| } |
Contributor
There was a problem hiding this comment.
When do you use this record?
Member
Author
|
Closing for now. The record types feature landed upstream and thinking on the workflow outputs pattern has moved on since this was drafted. Will revisit with a fresh branch when we're ready to adopt record types in earnest. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Proof-of-concept showing how Nextflow record types would simplify the workflow outputs pattern introduced in the
workflow-outputsbranch.This branch converts 38 files to use record types, replacing groups of related tuple-based channels with single record-typed channels. The code requires
nextflow.preview.types = trueand a future Nextflow version that implements PR #6679 - it won't parse on any current release.How record types address workflow outputs limitations
The
workflow-outputsbranch requires every output file to have its own named channel threaded from process to subworkflow to workflow to publish block. For a pipeline like rnaseq with ~130 published outputs, this creates significant verbosity at every layer.Record types help in three main ways:
1. Fewer channels to declare and thread
The biggest win. Instead of declaring, assigning, and emitting one channel per file, related outputs travel together as a single record. For example, the 6 MarkDuplicates outputs (bam, bai, metrics, stats, flagstat, idxstats) become one
MarkDupResultchannel. Similarly, the 22 RSeQC outputs collapse into oneRSeQCResultusing nested records for junction_annotation, read_duplication, and inner_distance sub-groups.Concrete impact:
workflows/rnaseq/main.nf: 321 fewer lines (1326 -> 1005), mostly from eliminatingchannel.empty()declarations and per-field assignments in the emit block (190 lines -> 92 lines)main.nf: 50 fewer lines (637 -> 587), despite the publish block now needing.mapcalls to extract fields2. Process outputs are self-documenting
The
record()output function groups related files with named fields and types, replacing opaquetuple val(meta), path(...)declarations. Each process defines its own record type at the top of the file, making the output contract explicit. Nullable types (Path?) replaceoptional: trueon individual path qualifiers, and nested records (e.g.JunctionAnnotationResultinsideRSeQCResult) express hierarchical relationships that flat tuples can't.3. Subworkflows can pass through records without destructuring
Subworkflows that don't need to inspect individual fields can accept and emit whole records, avoiding the boilerplate of splitting a process output into N channels just to re-combine them later. The
align_starsubworkflow is a good example: it now emits a singleStarAlignResultinstead of threading 5 separate STAR log/tab channels.Limitations and open questions
Record types don't eliminate all verbosity - some remains structural:
The publish block still needs per-file granularity. Each output goes to a specific directory with specific
enabledconditions. Records group files that a process produces together, but the publish block needs to route each file separately. This means.map { r -> [r.meta, r.field] }calls in the publish section, which are more verbose than the direct dot-access (channel.field) originally attempted here. Channel-level field projection (e.g.ch.fieldreturning a channel of that field) would eliminate this, but is explicitly out of scope for the initial record types PR.stage:syntax for input staging is part of PR #6679 but hasn't been widely tested. The 3 STAR modules and tximeta/tximport needstageAsqualifiers on their inputs, which use the newstage:block syntax.Publishing whole records to a single directory (used here for consolidated outputs like pseudo, rsem, star_salmon, deseq2) assumes Nextflow will auto-publish all Path fields. This behavior isn't confirmed yet.
Module portability. Adding
record FooResult { ... }to the top of each module file means record type definitions are local to each module. Cross-module sharing (e.g. the 3 STAR variants sharingStarAlignResult) requires defining the record in each file. A future module-level type registry or shared type imports would improve this.Net assessment
Record types would meaningfully reduce the boilerplate of the workflow outputs pattern - roughly a 25% reduction in the main workflow file, concentrated in the areas (channel declarations, assignments, emit blocks) where the verbosity is most painful. The biggest remaining source of verbosity is the publish block, which inherently needs per-file routing regardless of how channels are structured. A future channel-level field projection feature would address this.
Test plan
.map(not direct dot-access) for record field extraction🤖 Generated with Claude Code