Skip to content

Commit d90e7fe

Browse files
committed
Record types
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
1 parent 6711a57 commit d90e7fe

49 files changed

Lines changed: 1721 additions & 165 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/process-typed.md

Lines changed: 117 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,8 @@ All {ref}`standard types <stdlib-types>` except for the dataflow types (`Channel
6565
6666
Nextflow automatically stages `Path` inputs and `Path` collections (such as `Set<Path>`) into the task directory.
6767
68+
### Nullable inputs
69+
6870
By default, tasks fail if any input receives a `null` value. To allow `null` values, add `?` to the type annotation:
6971
7072
```nextflow
@@ -85,10 +87,83 @@ process cat_opt {
8587
}
8688
```
8789
88-
### Stage directives
90+
### Record inputs
91+
92+
Inputs with type `Record` can declare the name and type of each record field:
93+
94+
```nextflow
95+
process fastqc {
96+
input:
97+
sample: Record {
98+
id: String
99+
fastq: Path
100+
}
101+
102+
script:
103+
"""
104+
echo 'id: ${sample.id}`
105+
echo 'fastq: ${sample.fastq}'
106+
"""
107+
}
108+
```
109+
110+
In this example, the record is staged into the task as `sample`, and `sample.fastq` is staged as an input file since the `fastq` field is declared with type `Path`.
111+
112+
When the process is invoked, the incoming record should contain the specified fields, or else the run will fail. If the record has additional fields not declared by the process input, they are ignored.
113+
114+
:::{tip}
115+
Record inputs are a useful way to select a subset of fields from a larger record. This way, the process only stages what it needs, allowing you to keep related data together in your workflow logic.
116+
:::
117+
118+
You can achieve the same behavior using an external record type:
119+
120+
```nextflow
121+
process fastqc {
122+
input:
123+
sample: Sample
124+
125+
script:
126+
"""
127+
echo 'id: ${sample.id}`
128+
echo 'fastq: ${sample.fastq}'
129+
"""
130+
}
131+
132+
record Sample {
133+
id: String
134+
fastq: Path
135+
}
136+
```
137+
138+
This approach is useful when the record type can be re-used elsewhere in the pipeline.
139+
140+
### Tuple inputs
141+
142+
Inputs with type `Tuple` can declare the name of each tuple component:
143+
144+
```nextflow
145+
process fastqc {
146+
input:
147+
(id, fastq): Tuple<String,Path>
148+
149+
script:
150+
"""
151+
echo 'id: ${id}`
152+
echo 'fastq: ${fastq}'
153+
"""
154+
}
155+
```
156+
157+
This pattern is called *tuple destructuring*. Each tuple component is staged into the task the same way as an individual input.
158+
159+
The generic types inside the `Tuple<...>` annotation specify the type of each tuple compomnent and should match the component names. In the above example, `id` has type `String` and `fastq` has type `Path`.
160+
161+
## Stage directives
89162
90163
The `stage:` section defines custom staging behavior using *stage directives*. It should be specified after the `input:` section. These directives serve the same purpose as input qualifiers such as `env` and `stdin` in the legacy syntax.
91164
165+
### Environment variables
166+
92167
The `env` directive declares an environment variable in terms of task inputs:
93168
94169
```nextflow
@@ -106,6 +181,8 @@ process echo_env {
106181
}
107182
```
108183
184+
### Standard input (stdin)
185+
109186
The `stdin` directive defines the standard input of the task script:
110187
111188
```nextflow
@@ -123,6 +200,8 @@ process cat {
123200
}
124201
```
125202
203+
### Custom file staging
204+
126205
The `stageAs` directive stages an input file (or files) under a custom file pattern:
127206
128207
```nextflow
@@ -222,6 +301,43 @@ process foo {
222301
}
223302
```
224303
304+
### Structured outputs
305+
306+
Whereas legacy process outputs could only be structured using specific qualifiers like `val` and `tuple`, typed process outputs are regular values.
307+
308+
The `record()` standard library function can be used to create a record:
309+
310+
```nextflow
311+
process fastqc {
312+
input:
313+
sample: Record {
314+
id: String
315+
fastq: Path
316+
}
317+
318+
output:
319+
record(id: sample.id, fastqc: file('fastqc_logs'))
320+
321+
script:
322+
// ...
323+
}
324+
```
325+
326+
The `tuple()` standard library function can be used to create a tuple:
327+
328+
```nextflow
329+
process fastqc {
330+
input:
331+
(id, fastq): Tuple<String,Path>
332+
333+
output:
334+
tuple(id, file('fastqc_logs'))
335+
336+
script:
337+
// ...
338+
}
339+
```
340+
225341
## Topics
226342
227343
The `topic:` section emits values to {ref}`topic channels <channel-topic>`. A topic emission consists of an output value and a topic name:

docs/reference/stdlib-namespaces.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ The global namespace contains globally available constants and functions.
4343
: Create a branch criteria to use with the {ref}`operator-branch` operator.
4444

4545
`env( name: String ) -> String`
46-
: :::{versionadded} 24.11.0-edge
46+
: :::{versionadded} 25.04.0
4747
:::
4848
: Get the value of the environment variable with the specified name in the Nextflow launch environment.
4949

@@ -108,8 +108,11 @@ The global namespace contains globally available constants and functions.
108108
`sleep( milliseconds: long )`
109109
: Sleep for the given number of milliseconds.
110110

111+
`record( [options] ) -> Record`
112+
: Create a record from the given named arguments.
113+
111114
`tuple( args... ) -> Tuple`
112-
: Create a tuple object from the given arguments.
115+
: Create a tuple from the given arguments.
113116

114117
(stdlib-namespaces-channel)=
115118

docs/reference/stdlib-types.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -768,6 +768,38 @@ The following methods are available for splitting and counting the records in fi
768768
`splitText() -> List<String>`
769769
: Splits a text file into a list of lines. See the {ref}`operator-splittext` operator for available options.
770770

771+
(stdlib-types-record)=
772+
773+
## Record
774+
775+
A record is an immutable map of fields to values (i.e., `Map<String,?>`). Each value can have its own type.
776+
777+
A record can be created using the `record` function:
778+
779+
```nextflow
780+
sample = record(id: '1', fastq_1: file('1_1.fastq'), fastq_2: file('1_2.fastq'))
781+
```
782+
783+
Record fields can be accessed as properties:
784+
785+
```nextflow
786+
sample.id
787+
// -> '1'
788+
```
789+
790+
The following operations are supported for records:
791+
792+
`+ : (Record, Record) -> Record`
793+
: Given two records, returns a new record containing the fields and values of both records. When a field is present in both records, the value of the right-hand record takes precedence.
794+
795+
`- : (Record, Iterable<String>) -> Record`
796+
: Given a record and a collection of strings, returns a copy of the record with the given fields removed.
797+
798+
The following methods are available for a record:
799+
800+
`subMap( keys: Iterable<String> ) -> Record`
801+
: Returns a new record containing only the given fields.
802+
771803
(stdlib-types-set)=
772804

773805
## Set\<E\>

docs/reference/syntax.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ A Nextflow script may contain the following top-level declarations:
3434
- Process definitions
3535
- Function definitions
3636
- Enum types
37+
- Record types
3738
- Output block
3839

3940
Script declarations are in turn composed of statements and expressions.
@@ -107,6 +108,8 @@ The following definitions can be included:
107108
- Functions
108109
- Processes
109110
- Named workflows
111+
- *New in 26.04:* Enum types
112+
- *New in 26.04:* Record types
110113

111114
### Params block
112115

@@ -360,9 +363,17 @@ enum Day {
360363

361364
Enum values in the above example can be accessed as `Day.MONDAY`, `Day.TUESDAY`, and so on.
362365

363-
:::{note}
364-
Enum types cannot be included across modules at this time.
365-
:::
366+
### Record type
367+
368+
A record type declaration consists of a name and a body. The body consists of one or more fields, where each field has a name and a type:
369+
370+
```nextflow
371+
record FastqPair {
372+
id: String
373+
fastq_1: Path
374+
fastq_2: Path
375+
}
376+
```
366377

367378
### Output block
368379

docs/script.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,36 @@ Copying a map with the `+` operator is a safer way to modify maps in Nextflow, s
111111

112112
See {ref}`stdlib-types-map` for the set of available map operations.
113113

114+
(script-records)=
115+
116+
## Records
117+
118+
Records are used to store a set of related fields, where each field can have its own type. They are created using the `record` function:
119+
120+
```nextflow
121+
person = record(name: 'Alice', age: 42, is_male: false)
122+
```
123+
124+
Record fields are accessed by name:
125+
126+
```nextflow
127+
name = person.name
128+
age = person.age
129+
is_male = person.is_male
130+
```
131+
132+
Records are immutable -- once a record is created, it cannot be modified. Use record operations to create new records instead.
133+
134+
For example:
135+
136+
```nextflow
137+
person + record(age: 43) - ['is_male']
138+
139+
// record(name: 'Alice', age: 43)
140+
```
141+
142+
See {ref}`stdlib-types-record` for the set of available record operations.
143+
114144
(script-tuples)=
115145

116146
## Tuples

docs/strict-syntax.md

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,29 @@ def json = new groovy.json.JsonSlurper().parseText(json_file.text)
5050

5151
Some users use classes in Nextflow to define helper functions or custom types. Helper functions should be defined as standalone functions in Nextflow. Custom types should be moved to the `lib` directory.
5252

53-
:::{note}
54-
Enums, a special type of class, are supported, but they cannot be included across modules at this time.
55-
:::
53+
You can use an enum type to model a choice between a fixed set of categories:
5654

57-
:::{note}
58-
Record types will be addressed in a future version of the Nextflow language specification.
55+
```nextflow
56+
enum Color {
57+
RED,
58+
GREEN,
59+
BLUE
60+
}
61+
```
62+
63+
:::{versionadded} 26.04.0
5964
:::
6065

66+
You can use a record type to model a composition of multiple values:
67+
68+
```nextflow
69+
record FastqPair {
70+
id: String
71+
fastq_1: Path
72+
fastq_2: Path
73+
}
74+
```
75+
6176
### Mixing script declarations and statements
6277

6378
In the strict syntax, a script may contain any of the following top-level declarations:
@@ -230,13 +245,14 @@ In the strict syntax, use `System.getenv()` instead:
230245
println "PWD = ${System.getenv('PWD')}"
231246
```
232247

233-
:::{versionadded} 24.04.0
234-
The `env()` function should be used instead of `System.getenv()`:
248+
:::{versionadded} 25.04.0
249+
:::
250+
251+
Use the `env()` function instead of `System.getenv()`:
235252

236253
```nextflow
237254
println "PWD = ${env('PWD')}"
238255
```
239-
:::
240256

241257
## Restricted syntax
242258

0 commit comments

Comments
 (0)