Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
404 changes: 404 additions & 0 deletions adr/20251017-typed-processes.md

Large diffs are not rendered by default.

113 changes: 81 additions & 32 deletions adr/20260306-record-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@
- Date: 2026-03-06
- Tags: lang, static-types

## Updates

### Version 1.1 (2026-03-23)

- Replaced inline record type syntax (`Record { ... }`) with destructuring syntax (`record(...)`) for better continuity with legacy syntax and record output syntax.

## Summary

Provide a way to model composite data types in the Nextflow language.
Expand Down Expand Up @@ -155,54 +161,57 @@ When a record is supplied as input to a process, the process needs to know how t

Typed processes can stage inputs using the `stage:` section, but ideally the files in a record should be automatically detected and staged.

A typed process can declare a record using an *inline record type*:
A typed process can declare a record input using a record type:

```groovy
process FASTQC {
input:
sample: Record {
id: String
fastq_1: Path
fastq_2: Path
}
sample: FastqPair

// ...
}

record FastqPair {
id: String
fastq_1: Path
fastq_2: Path
}
```

All record fields that are a `Path` or `Path` collection (e.g. `Set<Path>`) are automatically staged. The record itself is declared in the process body as `sample`, like any other input, and record fields are accessed as `sample.id`, `sample.fastq_1`, and so on.

A typed process can also use an explicit record type to achieve the same behavior:
Alternatively, a typed process can declare a *destructured* record input:

```groovy
process FASTQC {
input:
sample: FastqPair
record(
id: String,
fastq_1: Path,
fastq_2: Path
)

// ...
}

record FastqPair {
id: String
fastq_1: Path
fastq_2: Path
}
```

The only difference between these two aprooaches is that the `FastqPair` type can be used elsewhere in pipeline code because it is declared externally.
This approach allows record inputs to be declared without the need for external record types. Each record field is acessed directly as `id`, `fastq_1`, and so on.

### Process outputs

Typed processes can declare outputs with arbitrary expressions, so no new syntax is required to support record outputs. Simply use the `record()` function to create a record:

```groovy
process FASTQC {
// ...
// ...

output:
record(id: id, fastqc: file('fastqc_logs'))
output:
record(
id: id,
fastqc: file('fastqc_logs')
)

// ...
// ...
}
```

Expand Down Expand Up @@ -258,6 +267,46 @@ println sample.id // -> '1'
println sample2.id // -> '2'
```

### Inline record input type

A process can declare a destructured record input as shown above:

```groovy
process FASTQC {
input:
record(
id: String,
fastq_1: Path,
fastq_2: Path
)

// ...
}
```

One alternative is to declare an *inline record type*:

```groovy
process FASTQC {
input:
sample: Record {
id: String
fastq_1: Path
fastq_2: Path
}

// ...
}
```

This approach was considered because it uses the same syntax as a `record` definition, making it easy to switch between inline and external record types. The block syntax is also slightly better suited for a type definition since it doesn't require commas.

However, this approach creates an asymmetry between record inputs and outputs (`Record { ... }` vs `record(...)`). It also removes the ability to destructure a record input.

Declaring a record input with `record()` can be understood as a reverse constructor, mirroring the `record()` function used to construct a record output in the `output:` section.

While both approaches have pros and cons, the `record()` approach was ultimately chosen for its continuity with the existing tuple syntax and its similarity with the record output syntax.

### Implicit process record output

A process record output can be defined using the `record()` function as shown above:
Expand Down Expand Up @@ -348,16 +397,16 @@ process PROKKA {
// ...

input:
sample: Record {
meta: Map
record(
meta: Map,
fasta: Path
}
)
proteins: Path
prodigal_tf: Path

output:
record(
meta: sample.meta,
meta: meta,
gff: file("${prefix}/*.gff"),
gbk: file("${prefix}/*.gbk"),
fna: file("${prefix}/*.fna"),
Expand All @@ -376,7 +425,7 @@ process PROKKA {
file("versions.yml") >> 'versions'

script:
prefix = sample.meta.id
prefix = meta.id
// ...
}
```
Expand All @@ -396,23 +445,23 @@ These processes would be defined as follows:
process FOO {

input:
sample: Record {
meta: Map
record(
meta: Map,
gff: Path
}
)

// ...
}

process BAR {

input:
sample: Record {
meta: Map
fna: Path
faa: Path
record(
meta: Map,
fna: Path,
faa: Path,
tbl: Path
}
)

// ...
}
Expand Down
63 changes: 31 additions & 32 deletions docs/process-typed.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ The `input:` section declares process inputs. In typed processes, each input dec
```nextflow
process fastqc {
input:
(meta, fastq): Tuple<Map,Path>
meta: Map
fastq: Path
extra_args: String

script:
Expand Down Expand Up @@ -89,62 +90,62 @@ process cat_opt {

### Record inputs

Inputs with type `Record` can declare the name and type of each record field:
Record inputs can be declared using a record type:

```nextflow
process fastqc {
input:
sample: Record {
id: String
fastq: Path
}
sample: Sample

script:
"""
echo 'id: ${sample.id}'
echo 'fastq: ${sample.fastq}'
"""
}
```

In this example, the record is staged into the task as `sample`, and `sample.fastq` is staged as an input file since the `fastq` field is declared with type `Path`.
record Sample {
id: String
fastq: Path
}
```

When the process is invoked, the incoming record should contain the specified fields, or else the run will fail. If the record has additional fields not declared by the process input, they are ignored.
In this example, the record input is staged as `sample`, and `sample.fastq` is staged as an input file since it is declared with type `Path` in the `Sample` record type. Each field in the record type is staged into the task the same way as an individual input.

:::{tip}
Record inputs are a useful way to select a subset of fields from a larger record. This way, the process only stages what it needs, allowing you to keep related data together in your workflow logic.
:::
When the process is invoked, the incoming record should contain the specified fields, or else the run will fail. If the incoming record has additional fields not declared by the process input, they are ignored.

You can achieve the same behavior using an external record type:
Record inputs can also be declared as a *destructured* input:

```nextflow
process fastqc {
input:
sample: Sample
record(
id: String,
fastq: Path
)

script:
"""
echo 'id: ${sample.id}'
echo 'fastq: ${sample.fastq}'
echo 'id: ${id}'
echo 'fastq: ${fastq}'
"""
}

record Sample {
id: String
fastq: Path
}
```

This approach is useful when the record type can be re-used elsewhere in the pipeline.
This pattern mirrors the standard `record()` function used to construct records. In this example, `fastq` is staged as an input file since the `fastq` field is declared with type `Path`.

:::{tip}
Record inputs are a useful way to select a subset of fields from a larger record. This way, the process stages only what it needs, keeping related data together in your workflow logic.
:::

### Tuple inputs

Inputs with type `Tuple` can declare the name of each tuple component:
Tuple inputs can be declared as a *destructured* input:

```nextflow
process fastqc {
input:
(id, fastq): Tuple<String,Path>
tuple(id: String, fastq: Path)

script:
"""
Expand All @@ -154,9 +155,7 @@ process fastqc {
}
```

This pattern is called *tuple destructuring*. Each tuple component is staged into the task the same way as an individual input.

The generic types inside the `Tuple<...>` annotation specify the type of each tuple compomnent and should match the component names. In the above example, `id` has type `String` and `fastq` has type `Path`.
This pattern mirrors the standard `tuple()` function used to construct tuples. Each tuple component is staged into the task the same way as an individual input.

## Stage directives

Expand Down Expand Up @@ -314,14 +313,14 @@ The `record()` standard library function can be used to create a record:
```nextflow
process fastqc {
input:
sample: Record {
id: String
record(
id: String,
fastq: Path
}
)

output:
record(
id: sample.id,
id: id,
fastqc: file('fastqc_logs')
)

Expand All @@ -335,7 +334,7 @@ The `tuple()` standard library function can be used to create a tuple:
```nextflow
process fastqc {
input:
(id, fastq): Tuple<String,Path>
tuple(id: String, fastq: Path)

output:
tuple(id, file('fastqc_logs'))
Expand Down
Loading
Loading