Skip to content

Commit 028e4cc

Browse files
Use destructuring syntax for process record/tuple inputs (#6912)
--------- Co-authored-by: Ben Sherman <bentshermann@gmail.com> Signed-off-by: Ben Sherman <bentshermann@gmail.com>
1 parent a99fb5d commit 028e4cc

18 files changed

Lines changed: 658 additions & 307 deletions

File tree

adr/20251017-typed-processes.md

Lines changed: 404 additions & 0 deletions
Large diffs are not rendered by default.

adr/20260306-record-types.md

Lines changed: 81 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@
66
- Date: 2026-03-06
77
- Tags: lang, static-types
88

9+
## Updates
10+
11+
### Version 1.1 (2026-03-23)
12+
13+
- Replaced inline record type syntax (`Record { ... }`) with destructuring syntax (`record(...)`) for better continuity with legacy syntax and record output syntax.
14+
915
## Summary
1016

1117
Provide a way to model composite data types in the Nextflow language.
@@ -155,54 +161,57 @@ When a record is supplied as input to a process, the process needs to know how t
155161

156162
Typed processes can stage inputs using the `stage:` section, but ideally the files in a record should be automatically detected and staged.
157163

158-
A typed process can declare a record using an *inline record type*:
164+
A typed process can declare a record input using a record type:
159165

160166
```groovy
161167
process FASTQC {
162168
input:
163-
sample: Record {
164-
id: String
165-
fastq_1: Path
166-
fastq_2: Path
167-
}
169+
sample: FastqPair
168170
169171
// ...
170172
}
173+
174+
record FastqPair {
175+
id: String
176+
fastq_1: Path
177+
fastq_2: Path
178+
}
171179
```
172180

173181
All record fields that are a `Path` or `Path` collection (e.g. `Set<Path>`) are automatically staged. The record itself is declared in the process body as `sample`, like any other input, and record fields are accessed as `sample.id`, `sample.fastq_1`, and so on.
174182

175-
A typed process can also use an explicit record type to achieve the same behavior:
183+
Alternatively, a typed process can declare a *destructured* record input:
176184

177185
```groovy
178186
process FASTQC {
179187
input:
180-
sample: FastqPair
188+
record(
189+
id: String,
190+
fastq_1: Path,
191+
fastq_2: Path
192+
)
181193
182194
// ...
183195
}
184-
185-
record FastqPair {
186-
id: String
187-
fastq_1: Path
188-
fastq_2: Path
189-
}
190196
```
191197

192-
The only difference between these two aprooaches is that the `FastqPair` type can be used elsewhere in pipeline code because it is declared externally.
198+
This approach allows record inputs to be declared without the need for external record types. Each record field is acessed directly as `id`, `fastq_1`, and so on.
193199

194200
### Process outputs
195201

196202
Typed processes can declare outputs with arbitrary expressions, so no new syntax is required to support record outputs. Simply use the `record()` function to create a record:
197203

198204
```groovy
199205
process FASTQC {
200-
// ...
206+
// ...
201207
202-
output:
203-
record(id: id, fastqc: file('fastqc_logs'))
208+
output:
209+
record(
210+
id: id,
211+
fastqc: file('fastqc_logs')
212+
)
204213
205-
// ...
214+
// ...
206215
}
207216
```
208217

@@ -258,6 +267,46 @@ println sample.id // -> '1'
258267
println sample2.id // -> '2'
259268
```
260269

270+
### Inline record input type
271+
272+
A process can declare a destructured record input as shown above:
273+
274+
```groovy
275+
process FASTQC {
276+
input:
277+
record(
278+
id: String,
279+
fastq_1: Path,
280+
fastq_2: Path
281+
)
282+
283+
// ...
284+
}
285+
```
286+
287+
One alternative is to declare an *inline record type*:
288+
289+
```groovy
290+
process FASTQC {
291+
input:
292+
sample: Record {
293+
id: String
294+
fastq_1: Path
295+
fastq_2: Path
296+
}
297+
298+
// ...
299+
}
300+
```
301+
302+
This approach was considered because it uses the same syntax as a `record` definition, making it easy to switch between inline and external record types. The block syntax is also slightly better suited for a type definition since it doesn't require commas.
303+
304+
However, this approach creates an asymmetry between record inputs and outputs (`Record { ... }` vs `record(...)`). It also removes the ability to destructure a record input.
305+
306+
Declaring a record input with `record()` can be understood as a reverse constructor, mirroring the `record()` function used to construct a record output in the `output:` section.
307+
308+
While both approaches have pros and cons, the `record()` approach was ultimately chosen for its continuity with the existing tuple syntax and its similarity with the record output syntax.
309+
261310
### Implicit process record output
262311

263312
A process record output can be defined using the `record()` function as shown above:
@@ -348,16 +397,16 @@ process PROKKA {
348397
// ...
349398
350399
input:
351-
sample: Record {
352-
meta: Map
400+
record(
401+
meta: Map,
353402
fasta: Path
354-
}
403+
)
355404
proteins: Path
356405
prodigal_tf: Path
357406
358407
output:
359408
record(
360-
meta: sample.meta,
409+
meta: meta,
361410
gff: file("${prefix}/*.gff"),
362411
gbk: file("${prefix}/*.gbk"),
363412
fna: file("${prefix}/*.fna"),
@@ -376,7 +425,7 @@ process PROKKA {
376425
file("versions.yml") >> 'versions'
377426
378427
script:
379-
prefix = sample.meta.id
428+
prefix = meta.id
380429
// ...
381430
}
382431
```
@@ -396,23 +445,23 @@ These processes would be defined as follows:
396445
process FOO {
397446
398447
input:
399-
sample: Record {
400-
meta: Map
448+
record(
449+
meta: Map,
401450
gff: Path
402-
}
451+
)
403452
404453
// ...
405454
}
406455
407456
process BAR {
408457
409458
input:
410-
sample: Record {
411-
meta: Map
412-
fna: Path
413-
faa: Path
459+
record(
460+
meta: Map,
461+
fna: Path,
462+
faa: Path,
414463
tbl: Path
415-
}
464+
)
416465
417466
// ...
418467
}

docs/process-typed.md

Lines changed: 31 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,8 @@ The `input:` section declares process inputs. In typed processes, each input dec
4747
```nextflow
4848
process fastqc {
4949
input:
50-
(meta, fastq): Tuple<Map,Path>
50+
meta: Map
51+
fastq: Path
5152
extra_args: String
5253
5354
script:
@@ -89,62 +90,62 @@ process cat_opt {
8990

9091
### Record inputs
9192

92-
Inputs with type `Record` can declare the name and type of each record field:
93+
Record inputs can be declared using a record type:
9394

9495
```nextflow
9596
process fastqc {
9697
input:
97-
sample: Record {
98-
id: String
99-
fastq: Path
100-
}
98+
sample: Sample
10199
102100
script:
103101
"""
104102
echo 'id: ${sample.id}'
105103
echo 'fastq: ${sample.fastq}'
106104
"""
107105
}
108-
```
109106
110-
In this example, the record is staged into the task as `sample`, and `sample.fastq` is staged as an input file since the `fastq` field is declared with type `Path`.
107+
record Sample {
108+
id: String
109+
fastq: Path
110+
}
111+
```
111112

112-
When the process is invoked, the incoming record should contain the specified fields, or else the run will fail. If the record has additional fields not declared by the process input, they are ignored.
113+
In this example, the record input is staged as `sample`, and `sample.fastq` is staged as an input file since it is declared with type `Path` in the `Sample` record type. Each field in the record type is staged into the task the same way as an individual input.
113114

114-
:::{tip}
115-
Record inputs are a useful way to select a subset of fields from a larger record. This way, the process only stages what it needs, allowing you to keep related data together in your workflow logic.
116-
:::
115+
When the process is invoked, the incoming record should contain the specified fields, or else the run will fail. If the incoming record has additional fields not declared by the process input, they are ignored.
117116

118-
You can achieve the same behavior using an external record type:
117+
Record inputs can also be declared as a *destructured* input:
119118

120119
```nextflow
121120
process fastqc {
122121
input:
123-
sample: Sample
122+
record(
123+
id: String,
124+
fastq: Path
125+
)
124126
125127
script:
126128
"""
127-
echo 'id: ${sample.id}'
128-
echo 'fastq: ${sample.fastq}'
129+
echo 'id: ${id}'
130+
echo 'fastq: ${fastq}'
129131
"""
130132
}
131-
132-
record Sample {
133-
id: String
134-
fastq: Path
135-
}
136133
```
137134

138-
This approach is useful when the record type can be re-used elsewhere in the pipeline.
135+
This pattern mirrors the standard `record()` function used to construct records. In this example, `fastq` is staged as an input file since the `fastq` field is declared with type `Path`.
136+
137+
:::{tip}
138+
Record inputs are a useful way to select a subset of fields from a larger record. This way, the process stages only what it needs, keeping related data together in your workflow logic.
139+
:::
139140
140141
### Tuple inputs
141142
142-
Inputs with type `Tuple` can declare the name of each tuple component:
143+
Tuple inputs can be declared as a *destructured* input:
143144
144145
```nextflow
145146
process fastqc {
146147
input:
147-
(id, fastq): Tuple<String,Path>
148+
tuple(id: String, fastq: Path)
148149
149150
script:
150151
"""
@@ -154,9 +155,7 @@ process fastqc {
154155
}
155156
```
156157
157-
This pattern is called *tuple destructuring*. Each tuple component is staged into the task the same way as an individual input.
158-
159-
The generic types inside the `Tuple<...>` annotation specify the type of each tuple compomnent and should match the component names. In the above example, `id` has type `String` and `fastq` has type `Path`.
158+
This pattern mirrors the standard `tuple()` function used to construct tuples. Each tuple component is staged into the task the same way as an individual input.
160159
161160
## Stage directives
162161
@@ -314,14 +313,14 @@ The `record()` standard library function can be used to create a record:
314313
```nextflow
315314
process fastqc {
316315
input:
317-
sample: Record {
318-
id: String
316+
record(
317+
id: String,
319318
fastq: Path
320-
}
319+
)
321320
322321
output:
323322
record(
324-
id: sample.id,
323+
id: id,
325324
fastqc: file('fastqc_logs')
326325
)
327326
@@ -335,7 +334,7 @@ The `tuple()` standard library function can be used to create a tuple:
335334
```nextflow
336335
process fastqc {
337336
input:
338-
(id, fastq): Tuple<String,Path>
337+
tuple(id: String, fastq: Path)
339338
340339
output:
341340
tuple(id, file('fastqc_logs'))

0 commit comments

Comments
 (0)