diff --git a/adr/20251017-typed-processes.md b/adr/20251017-typed-processes.md new file mode 100644 index 0000000000..24fe99ec91 --- /dev/null +++ b/adr/20251017-typed-processes.md @@ -0,0 +1,404 @@ +# Typed processes + +- Authors: Ben Sherman +- Status: accepted +- Deciders: Ben Sherman, Paolo Di Tommaso +- Date: 2025-10-17 +- Tags: lang, static-types, processes + +## Updates + +### Version 1.1 (2026-03-23) + +- Changed the method signature for `stageAs` from `(filePattern, value)` to `(value, filePattern)` to mirror commands like `cp`, `mv`, etc. + +- Replaced annotation-based tuple syntax (`(...): Tuple<...>`) with destructuring syntax (`record(...)`) for better continuity with legacy syntax and record input syntax. + +## Summary + +Support static typing for process inputs and outputs. + +## Problem Statement + +The legacy process syntax uses *qualifiers* to describe both the type and staging behavior of each input and output: + +```groovy +process FASTQC { + input: + tuple val(id), path(fastq_1), path(fastq_2) + path index + + output: + tuple val(id), path("fastqc_${id}_logs") + + script: + // ... +} +``` + +This syntax has several limimtations: + +- **No static typing**: The `val` qualifier can not specify a type, so there is no way to validate input values. The `path` qualifier can not distinguish between a file and a file collection. The `arity` option was introduced to address this ambiguity, but it is cumbersome and rarely used. + +- **Type and staging behavior are coupled**: Qualifiers like `path` describe both the type *and* the staging behavior (link into task directory). There is no way to specify staging behavior separately, such as staging a tuple element or record field as an environment variable. + +- **No nullability**: There is no way to declare that an input may be `null`. The `path` qualifier raises a runtime error if a null value is received. Outputs can be marked optional, but optional outputs are handled by emitting nothing rather than emitting `null`. A tuple output can be optional, but a tuple element can not. + +- **Limited output expressiveness**: Outputs must be expressed in terms of qualifiers that mirror the input qualifiers. It is difficult to express many kinds of output values, and it is unclear to the user whether a given expression is valid or not. + +## Goals + +- Provide a way to model process inputs and outputs with types from the Nextflow standard library. + +- Separate the *type* of an input from its *staging behavior*. + +- Provide first-class support for nullable inputs and outputs. + +- Allow outputs to be arbitrary expressions, ensuring consistency with the rest of the language. + +- Enable compile-time type checking for processes. + +## Non-goals + +- Removing the legacy qualifier syntax -- legacy processes must continue to work without modification. + +- Enforcing type checking -- static type checking will be introduced progressively as an opt-in feature. + +## Decision + +Introduce **typed processes**, which use a new syntax for inputs and outputs based on type annotations instead of qualifiers. + +All other process sections (directives, script, stub, etc) are supported by typed processes without changes. Only the `input:` and `output:` sections are changed. + +## Core Capabilities + +### Typed inputs + +Each input is declared as `name: Type`: + +```groovy +process FASTQC { + input: + id: String + fastq_1: Path + fastq_2: Path + + script: + """ + fastqc -o fastqc_${id}_logs ${fastq_1} ${fastq_2} + """ +} +``` + +All standard library types except `Channel` and `Value` are valid input types. Inputs of type `Path` (or `Path` collections such as `Set`) are automatically staged into the task directory. + +### Nullable inputs + +Appending `?` to a type annotation allows the input to be `null`: + +```groovy +process CAT_OPT { + input: + input: Path? + + stage: + stageAs input, 'input.txt' + + output: + stdout() + + script: + ''' + [[ -f input.txt ]] && cat input.txt || echo 'empty input' + ''' +} +``` + +By default, a task fails if any input receives `null`. + +### Stage directives + +Staging behavior is moved to a dedicated `stage:` section that appears after `input:`. This replaces the staging aspects of legacy qualifiers: + +| Legacy qualifier | Stage directive | +|-------------------|--------------------| +| `env('NAME')` | `env 'NAME', value` | +| `stdin` | `stdin value` | +| `path('name.fa')` | `stageAs file, 'name.fa'` | + +For example: + +```groovy +process BLAST { + input: + fasta: Path + + stage: + stageAs fasta, 'query.fa' + + script: + """ + blastp -query query.fa -db nr + """ +} +``` + +Separating staging from type declaration keeps the inputs clean and makes it easier to specify staging behavior independently of the input type. + +### Tuple inputs + +Tuples are declared inline using `tuple(name: Type, ...)`: + +```groovy +process FASTQC { + input: + tuple(id: String, fastq_1: Path, fastq_2: Path) + + script: + // ... +} +``` + +Each component is destructured into a local variable. This mirrors the `tuple()` constructor used in the output section and in workflow logic, making the syntax consistent. + +### Typed outputs + +Each output declaration consists of an optional name and type, and a value expression: + +```groovy +process ECHO { + input: + message: String + + output: + out_file: Path = file('message.txt') + out_std: String = stdout() + + script: + """ + echo '${message}' | tee message.txt + """ +} +``` + +When there is only one output, the name and type can be omitted: + +```groovy +process ECHO { + input: + message: String + + output: + file('message.txt') + + script: + """ + echo '${message}' > message.txt + """ +} +``` + +Outputs can be arbitrary expressions, rather that being restricted to specific qualifiers such as `tuple` and `val`. Special functions such as `file()`, `files()`, `env()`, and `stdout()` can be composed into the desired output structure. + +### Nullable outputs + +By default, the `file()` and `files()` function raise an error if the given file is missing. These functions can be called with `optionel: true` to allow missing files. This way, it is possible to declare a tuple output that contains nullable values: + +```groovy +process MAYBE { + input: + id: String + + output: + tuple(id, file('result.txt')) + + script: + """ + [[ '$id' == 42 ]] && touch result.txt + """ +} +``` + +### Topic emissions + +A `topic:` section emits values to topic channels using the `>>` operator: + +```groovy +process CAT { + input: + message: Path + + output: + stdout() + + topic: + tuple('cat', eval('cat --version')) >> 'versions' + + script: + """ + cat ${message} + """ +} +``` + +Moving topic emissions to a dedicated section allows them to be defined without having to include them in the process outputs. + +## Distinguishing between typed and legacy processes + +Typed processes are gated behind the `nextflow.preview.types` feature flag. This flag will be replaced by `nextflow.enable.types` when the feature becomes stable, which will be used to distinguish between typed and legacy processes in the language. + +When a script enables this feature flag, its processes are treated as typed processes; otherwise, its processes are treated as legacy processes. This way, typed and legacy processes cannot be mixed in the same script, but they can be used together as long as they are declared in different scripts. + +While typed and legacy processes are syntactically distinct and could theoretically be allowed in the same script, the feature flag helps distinguish typed vs legacy to the reader (whether human or agent). + +## Alternatives + +### Implicit tuple input + +The syntax for typed process inputs aims to be consistent with typed inputs throughout the rest of the language, such as the `params` block and workflow inputs, which use the pattern of `: `. The `tuple` input qualifier does not fit neatly into this pattern, since it specifies multiple tuple *components*: + +```groovy +process QUANT { + input: + tuple(id: String, fastq_1: Path, fastq_2: Path) + index: Path + + // ... +} + +workflow { + ch_samples = channel.of( tuple('1', file('1_1.fq'), file('1_2.fq')) ) + index = file('index.fa') + QUANT(ch_samples, index) +} +``` + +One alternative is to remove tuple inputs altogether and treat the entire `input:` section as an implicit tuple input: + +```groovy +process QUANT { + input: + id: String + fastq_1: Path + fastq_2: Path + index: Path + + // ... +} + +workflow { + ch_samples = channel.of( tuple('1', file('1_1.fq'), file('1_2.fq')) ) + index = file('index.fa') + QUANT( ch_samples.combine(index) ) +} +``` + +With this approach, a process would always be called with a single input, and multiple sources (e.g. `ch_samples` and `index`) would need to be combined into a single input. This could be done explicitly with the `combine` operator or implicitly by the runtime. + +However, this approach would be a significant change to process call semantics, even if only applied to typed processes. It would likely be difficult to validate for processes with many inputs. + +The tuple destructuring syntax makes it possible to migrate legacy processes to typed processes without changing workflow logic or call semantics. While the `tuple(...)` syntax is a deviation from the typed input syntax used by the rest of the language, such deviations can be appropriate and even advantageous when used judiciously in a custom language. + +### Type annotation syntax for tuple inputs + +Another alternative for tuple inputs is to use a type annotation: + +```groovy +process FASTQC { + input: + (id, fastq_1, fastq_2): Tuple + index: Path + + // ... +} +``` + +This approach attempts to bring the syntax closer to the `: ` pattern while maintaining support for tuple destructuring. This syntax was used in the first preview of typed processes in Nextflow 25.10. + +However, this syntax needlessly separates the component name from its corresponding type, making it harder to read and validate. Although it is semantically equivalent to the legacy syntax, it looks and feels very different, which can be jarring for users. + +With the introduction of records, the `tuple(...)` destructuring syntax emerged as a clear pattern to follow for both records and tuples: + +**Legacy process:** +```groovy +process FASTQC { + input: + tuple val(id), path(fastq_1), path(fastq_2) + + output: + tuple val(id), path("fastqc_${id}_logs") + + // ... +} +``` + +**Typed process (tuple):** +```groovy +process FASTQC { + input: + tuple(id: String, fastq_1: Path, fastq_2: Path) + + output: + tuple(id, file("fastqc_${id}_logs")) + + // ... +} +``` + +**Typed process (record):** +```groovy +process FASTQC { + input: + record( + id: String, + fastq_1: Path, + fastq_2: Path + ) + + output: + record( + id: id, + fastqc: file("fastqc_${id}_logs") + ) + + // ... +} +``` + +This pattern provides the best balance of continuity with the old way and consistency with static typing: + +- A legacy process can be migrated to a typed process by replacing the `tuple` input/output qualifier with the `tuple` destructor/constructor. +- A typed process can be migrated from tuples to records by replacing `tuple` with `record` and adding fields to the record output. +- The `tuple` and `record` destructors use the same `: ` pattern used by the rest of the language. +- At each stage, the inputs and outputs mirror each other without creating syntactic confusion. + +## Consequences + +**Positive:** + +- Type annotations make processes self-documenting and provide the information needed to perform static type checking. + +- Separating type from staging behavior (the `stage:` section) makes each concern independently clear. + +- Nullable types (`?`) provide first-class support for nullable input files. + +- Outputs can be structured arbitrarily and can contain nullable files. + +**Negative:** + +- The `each` qualifier is not supported; pipelines using it must be refactored to use the `combine` operator before migrating to typed processes. + +- The typed syntax must be maintained alongside the legacy syntax, which makes the codebase more complex and may cause confusion as the community transitions to the new syntax. + +**Neutral:** + +- Typed processes use the same standard types as the rest of the language, so no additional type vocabulary is introduced. + +- Typed processes are enabled by a feature flag, which introduces new functionality without breaking existing code and helps distinguish between typed and legacy code. + +## Links + +- [Nextflow standard types](https://nextflow.io/docs/latest/reference/stdlib-types.html) +- Community issues: #1694, #2678 +- Related nf-core discussion: https://github.com/nf-core/modules/issues/4311 +- Original implementation: #4553 diff --git a/adr/20260306-record-types.md b/adr/20260306-record-types.md index b04a07a6a4..74530b0f6e 100644 --- a/adr/20260306-record-types.md +++ b/adr/20260306-record-types.md @@ -6,6 +6,12 @@ - Date: 2026-03-06 - Tags: lang, static-types +## Updates + +### Version 1.1 (2026-03-23) + +- Replaced inline record type syntax (`Record { ... }`) with destructuring syntax (`record(...)`) for better continuity with legacy syntax and record output syntax. + ## Summary Provide a way to model composite data types in the Nextflow language. @@ -155,41 +161,41 @@ When a record is supplied as input to a process, the process needs to know how t Typed processes can stage inputs using the `stage:` section, but ideally the files in a record should be automatically detected and staged. -A typed process can declare a record using an *inline record type*: +A typed process can declare a record input using a record type: ```groovy process FASTQC { input: - sample: Record { - id: String - fastq_1: Path - fastq_2: Path - } + sample: FastqPair // ... } + +record FastqPair { + id: String + fastq_1: Path + fastq_2: Path +} ``` All record fields that are a `Path` or `Path` collection (e.g. `Set`) are automatically staged. The record itself is declared in the process body as `sample`, like any other input, and record fields are accessed as `sample.id`, `sample.fastq_1`, and so on. -A typed process can also use an explicit record type to achieve the same behavior: +Alternatively, a typed process can declare a *destructured* record input: ```groovy process FASTQC { input: - sample: FastqPair + record( + id: String, + fastq_1: Path, + fastq_2: Path + ) // ... } - -record FastqPair { - id: String - fastq_1: Path - fastq_2: Path -} ``` -The only difference between these two aprooaches is that the `FastqPair` type can be used elsewhere in pipeline code because it is declared externally. +This approach allows record inputs to be declared without the need for external record types. Each record field is acessed directly as `id`, `fastq_1`, and so on. ### Process outputs @@ -197,12 +203,15 @@ Typed processes can declare outputs with arbitrary expressions, so no new syntax ```groovy process FASTQC { - // ... + // ... - output: - record(id: id, fastqc: file('fastqc_logs')) + output: + record( + id: id, + fastqc: file('fastqc_logs') + ) - // ... + // ... } ``` @@ -258,6 +267,46 @@ println sample.id // -> '1' println sample2.id // -> '2' ``` +### Inline record input type + +A process can declare a destructured record input as shown above: + +```groovy +process FASTQC { + input: + record( + id: String, + fastq_1: Path, + fastq_2: Path + ) + + // ... +} +``` + +One alternative is to declare an *inline record type*: + +```groovy +process FASTQC { + input: + sample: Record { + id: String + fastq_1: Path + fastq_2: Path + } + + // ... +} +``` + +This approach was considered because it uses the same syntax as a `record` definition, making it easy to switch between inline and external record types. The block syntax is also slightly better suited for a type definition since it doesn't require commas. + +However, this approach creates an asymmetry between record inputs and outputs (`Record { ... }` vs `record(...)`). It also removes the ability to destructure a record input. + +Declaring a record input with `record()` can be understood as a reverse constructor, mirroring the `record()` function used to construct a record output in the `output:` section. + +While both approaches have pros and cons, the `record()` approach was ultimately chosen for its continuity with the existing tuple syntax and its similarity with the record output syntax. + ### Implicit process record output A process record output can be defined using the `record()` function as shown above: @@ -348,16 +397,16 @@ process PROKKA { // ... input: - sample: Record { - meta: Map + record( + meta: Map, fasta: Path - } + ) proteins: Path prodigal_tf: Path output: record( - meta: sample.meta, + meta: meta, gff: file("${prefix}/*.gff"), gbk: file("${prefix}/*.gbk"), fna: file("${prefix}/*.fna"), @@ -376,7 +425,7 @@ process PROKKA { file("versions.yml") >> 'versions' script: - prefix = sample.meta.id + prefix = meta.id // ... } ``` @@ -396,10 +445,10 @@ These processes would be defined as follows: process FOO { input: - sample: Record { - meta: Map + record( + meta: Map, gff: Path - } + ) // ... } @@ -407,12 +456,12 @@ process FOO { process BAR { input: - sample: Record { - meta: Map - fna: Path - faa: Path + record( + meta: Map, + fna: Path, + faa: Path, tbl: Path - } + ) // ... } diff --git a/docs/process-typed.md b/docs/process-typed.md index b6d0660ea1..aa560d84b8 100644 --- a/docs/process-typed.md +++ b/docs/process-typed.md @@ -47,7 +47,8 @@ The `input:` section declares process inputs. In typed processes, each input dec ```nextflow process fastqc { input: - (meta, fastq): Tuple + meta: Map + fastq: Path extra_args: String script: @@ -89,15 +90,12 @@ process cat_opt { ### Record inputs -Inputs with type `Record` can declare the name and type of each record field: +Record inputs can be declared using a record type: ```nextflow process fastqc { input: - sample: Record { - id: String - fastq: Path - } + sample: Sample script: """ @@ -105,46 +103,49 @@ process fastqc { echo 'fastq: ${sample.fastq}' """ } -``` -In this example, the record is staged into the task as `sample`, and `sample.fastq` is staged as an input file since the `fastq` field is declared with type `Path`. +record Sample { + id: String + fastq: Path +} +``` -When the process is invoked, the incoming record should contain the specified fields, or else the run will fail. If the record has additional fields not declared by the process input, they are ignored. +In this example, the record input is staged as `sample`, and `sample.fastq` is staged as an input file since it is declared with type `Path` in the `Sample` record type. Each field in the record type is staged into the task the same way as an individual input. -:::{tip} -Record inputs are a useful way to select a subset of fields from a larger record. This way, the process only stages what it needs, allowing you to keep related data together in your workflow logic. -::: +When the process is invoked, the incoming record should contain the specified fields, or else the run will fail. If the incoming record has additional fields not declared by the process input, they are ignored. -You can achieve the same behavior using an external record type: +Record inputs can also be declared as a *destructured* input: ```nextflow process fastqc { input: - sample: Sample + record( + id: String, + fastq: Path + ) script: """ - echo 'id: ${sample.id}' - echo 'fastq: ${sample.fastq}' + echo 'id: ${id}' + echo 'fastq: ${fastq}' """ } - -record Sample { - id: String - fastq: Path -} ``` -This approach is useful when the record type can be re-used elsewhere in the pipeline. +This pattern mirrors the standard `record()` function used to construct records. In this example, `fastq` is staged as an input file since the `fastq` field is declared with type `Path`. + +:::{tip} +Record inputs are a useful way to select a subset of fields from a larger record. This way, the process stages only what it needs, keeping related data together in your workflow logic. +::: ### Tuple inputs -Inputs with type `Tuple` can declare the name of each tuple component: +Tuple inputs can be declared as a *destructured* input: ```nextflow process fastqc { input: - (id, fastq): Tuple + tuple(id: String, fastq: Path) script: """ @@ -154,9 +155,7 @@ process fastqc { } ``` -This pattern is called *tuple destructuring*. Each tuple component is staged into the task the same way as an individual input. - -The generic types inside the `Tuple<...>` annotation specify the type of each tuple compomnent and should match the component names. In the above example, `id` has type `String` and `fastq` has type `Path`. +This pattern mirrors the standard `tuple()` function used to construct tuples. Each tuple component is staged into the task the same way as an individual input. ## Stage directives @@ -314,14 +313,14 @@ The `record()` standard library function can be used to create a record: ```nextflow process fastqc { input: - sample: Record { - id: String + record( + id: String, fastq: Path - } + ) output: record( - id: sample.id, + id: id, fastqc: file('fastqc_logs') ) @@ -335,7 +334,7 @@ The `tuple()` standard library function can be used to create a tuple: ```nextflow process fastqc { input: - (id, fastq): Tuple + tuple(id: String, fastq: Path) output: tuple(id, file('fastqc_logs')) diff --git a/docs/tutorials/records.md b/docs/tutorials/records.md index b1f6eb369d..48ecdee5ba 100644 --- a/docs/tutorials/records.md +++ b/docs/tutorials/records.md @@ -112,7 +112,7 @@ The `reads_ch` input is used by `FASTQC` and `QUANT`, which both have the follow ```nextflow input: - (id, fastq_1, fastq_2): Tuple + tuple(id: String, fastq_1: Path, fastq_2: Path) ``` Therefore, you can construct a record type that models these requirements. Update the `reads_ch` input as follows: @@ -151,7 +151,7 @@ process FASTQC { conda 'bioconda::fastqc=0.12.1' input: - (id, fastq_1, fastq_2): Tuple + tuple(id: String, fastq_1: Path, fastq_2: Path) output: tuple(id, file("fastqc_${id}_logs")) @@ -167,34 +167,32 @@ To migrate this process, rewrite the inputs and outputs as follows: ```nextflow process FASTQC { - tag sample.id + tag id conda 'bioconda::fastqc=0.12.1' input: - sample: Record { - id: String - fastq_1: Path + record( + id: String, + fastq_1: Path, fastq_2: Path - } + ) output: record( - id: sample.id, - fastqc: file("fastqc_${sample.id}_logs") + id: id, + fastqc: file("fastqc_${id}_logs") ) script: """ - fastqc.sh "${sample.id}" "${sample.fastq_1} ${sample.fastq_2}" + fastqc.sh "${id}" "${fastq_1} ${fastq_2}" """ } ``` In the above: -- The tuple input is converted to a record input using the type `Record`. The field types are specified alongside the field names. - -- Since the record input cannot be destructured like a tuple, you must define a name for the record itself (e.g., `sample`), and you must update all references to tuple inputs (e.g., replace `id` with `sample.id`). +- The tuple input is converted to a record input by simply replacing `tuple` with `record`. - The tuple output is converted to a record by using the `record()` function and specifying a name for each record field. @@ -210,7 +208,7 @@ process QUANT { conda 'bioconda::salmon=1.10.3' input: - (id, fastq_1, fastq_2): Tuple + tuple(id: String, fastq_1: Path, fastq_2: Path) index: Path output: @@ -233,21 +231,21 @@ To migrate this process, rewrite the inputs and outputs as follows: ```nextflow process QUANT { - tag sample.id + tag id conda 'bioconda::salmon=1.10.3' input: - sample: Record { - id: String - fastq_1: Path + record( + id: String, + fastq_1: Path, fastq_2: Path - } + ) index: Path output: record( - id: sample.id, - quant: file("quant_${sample.id}") + id: id, + quant: file("quant_${id}") ) script: @@ -256,9 +254,9 @@ process QUANT { --threads ${task.cpus} \ --libType=U \ -i ${index} \ - -1 ${sample.fastq_1} \ - -2 ${sample.fastq_2} \ - -o quant_${sample.id} + -1 ${fastq_1} \ + -2 ${fastq_2} \ + -o quant_${id} """ } ``` diff --git a/docs/tutorials/static-types.md b/docs/tutorials/static-types.md index 9b326037aa..05cfeb894b 100644 --- a/docs/tutorials/static-types.md +++ b/docs/tutorials/static-types.md @@ -240,7 +240,7 @@ process FASTQC { // ... input: - (id, fastq_1, fastq_2): Tuple + tuple(id: String, fastq_1: Path, fastq_2: Path) output: file("fastqc_${id}_logs") @@ -287,7 +287,7 @@ process QUANT { // ... input: - (id, fastq_1, fastq_2): Tuple + tuple(id: String, fastq_1: Path, fastq_2: Path) index: Path output: diff --git a/modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy b/modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy index f0ff2152f0..3c46f95287 100644 --- a/modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy +++ b/modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy @@ -96,11 +96,13 @@ import nextflow.script.params.ValueOutParam import nextflow.script.params.v2.ProcessInput import nextflow.script.params.v2.ProcessTupleInput import nextflow.script.types.Record +import nextflow.script.types.Tuple import nextflow.script.types.Types import nextflow.trace.TraceRecord import nextflow.util.Escape import nextflow.util.HashBuilder import nextflow.util.LockManager +import nextflow.util.RecordMap import nextflow.util.TestOnly import org.codehaus.groovy.control.CompilerConfiguration import org.codehaus.groovy.control.customizers.ASTTransformationCustomizer @@ -1767,7 +1769,9 @@ class TaskProcessor { for( int i = 0; i < declaredInputs.getParams().size(); i++ ) { final param = declaredInputs.getParams()[i] final value = values[i] - if( param instanceof ProcessTupleInput ) + if( param instanceof ProcessTupleInput && param.getType() == Record.class ) + assignTaskRecordInput(task, param, value, i) + else if( param instanceof ProcessTupleInput && param.getType() == Tuple.class ) assignTaskTupleInput(task, param, value, i) else assignTaskInput(task, param, value, i) @@ -1824,10 +1828,25 @@ class TaskProcessor { task.config.context = ctx } + @CompileStatic + private void assignTaskRecordInput(TaskRun task, ProcessTupleInput param, Object value, int index) { + if( value == null && !param.optional ) { + throw new ProcessUnrecoverableException("[${safeTaskName(task)}] input at index ${index} cannot be null") + } + if( value !instanceof RecordMap ) { + throw new ProcessUnrecoverableException("[${safeTaskName(task)}] input at index ${index} expected a record but received: ${value} [${value.class.simpleName}]") + } + final recordParams = param.getComponents() + final record = value as Map + for( final recordParam : recordParams ) { + assignTaskInput(task, recordParam, record[recordParam.getName()], index) + } + } + @CompileStatic private void assignTaskTupleInput(TaskRun task, ProcessTupleInput param, Object value, int index) { if( value == null && !param.optional ) { - throw new ProcessUnrecoverableException("[${safeTaskName(task)}] input at index ${index} cannot be null -- append `?` to the type annotation to mark it as nullable") + throw new ProcessUnrecoverableException("[${safeTaskName(task)}] input at index ${index} cannot be null") } if( value !instanceof List ) { throw new ProcessUnrecoverableException("[${safeTaskName(task)}] input at index ${index} expected a tuple but received: ${value} [${value.class.simpleName}]") diff --git a/modules/nf-lang/src/main/antlr/ScriptLexer.g4 b/modules/nf-lang/src/main/antlr/ScriptLexer.g4 index 268c7c2beb..8b151b1359 100644 --- a/modules/nf-lang/src/main/antlr/ScriptLexer.g4 +++ b/modules/nf-lang/src/main/antlr/ScriptLexer.g4 @@ -351,6 +351,7 @@ SHELL : 'shell'; STAGE : 'stage'; STUB : 'stub'; TOPIC : 'topic'; +TUPLE : 'tuple'; WHEN : 'when'; // -- workflow definition diff --git a/modules/nf-lang/src/main/antlr/ScriptParser.g4 b/modules/nf-lang/src/main/antlr/ScriptParser.g4 index a3b1ac9083..18bf76c75c 100644 --- a/modules/nf-lang/src/main/antlr/ScriptParser.g4 +++ b/modules/nf-lang/src/main/antlr/ScriptParser.g4 @@ -246,13 +246,11 @@ processInput ; processRecordInput - : identifier (COLON type)? nls LBRACE - nls recordBody? - nls RBRACE + : RECORD LPAREN nls nameTypePair (COMMA nls nameTypePair)* COMMA? nls rparen ; processTupleInput - : LPAREN identifier (COMMA identifier)* rparen (COLON type)? + : TUPLE LPAREN nls nameTypePair (COMMA nls nameTypePair)* COMMA? nls rparen ; processStage @@ -609,6 +607,7 @@ identifier | STAGE | STUB | TOPIC + | TUPLE | WHEN | WORKFLOW | EMIT @@ -812,6 +811,7 @@ keywords | STAGE | STUB | TOPIC + | TUPLE | WHEN | WORKFLOW | EMIT diff --git a/modules/nf-lang/src/main/java/nextflow/script/control/ProcessToGroovyVisitorV2.java b/modules/nf-lang/src/main/java/nextflow/script/control/ProcessToGroovyVisitorV2.java index 627735ab81..6d30b179be 100644 --- a/modules/nf-lang/src/main/java/nextflow/script/control/ProcessToGroovyVisitorV2.java +++ b/modules/nf-lang/src/main/java/nextflow/script/control/ProcessToGroovyVisitorV2.java @@ -153,8 +153,6 @@ private void visitProcessInputType(Variable param, Expression target, BlockState stagers.addStatement(stager); } else if( isRecordType(cn) ) { - if( cn.getNameWithoutPackage().startsWith("__Record") ) - moduleNode.addClass(cn); for( var fn : cn.getFields() ) visitProcessInputType(fn, propX(target, fn.getName()), stagers); } diff --git a/modules/nf-lang/src/main/java/nextflow/script/control/ScriptResolveVisitor.java b/modules/nf-lang/src/main/java/nextflow/script/control/ScriptResolveVisitor.java index dff6c09f90..fc9e0ac735 100644 --- a/modules/nf-lang/src/main/java/nextflow/script/control/ScriptResolveVisitor.java +++ b/modules/nf-lang/src/main/java/nextflow/script/control/ScriptResolveVisitor.java @@ -148,11 +148,8 @@ private void resolveTypedOutputs(Statement block) { @Override public void visitProcessV2(ProcessNodeV2 node) { - for( var input : node.inputs ) { - var type = input.getType(); - resolver.resolveOrFail(type, input); - if( type.getNameWithoutPackage().startsWith("__Record") ) - visitRecord((RecordNode) type.redirect()); + for( var input : asFlatParams(node.inputs) ) { + resolver.resolveOrFail(input.getType(), input); } resolver.visit(node.directives); resolver.visit(node.stagers); diff --git a/modules/nf-lang/src/main/java/nextflow/script/control/VariableScopeVisitor.java b/modules/nf-lang/src/main/java/nextflow/script/control/VariableScopeVisitor.java index 08063e9ce5..a336506420 100644 --- a/modules/nf-lang/src/main/java/nextflow/script/control/VariableScopeVisitor.java +++ b/modules/nf-lang/src/main/java/nextflow/script/control/VariableScopeVisitor.java @@ -321,8 +321,7 @@ public void visitProcessV2(ProcessNodeV2 node) { visitDirectives(node.stagers, "stage directive", false); vsc.popScope(); - if( !(node.when instanceof EmptyExpression) ) - vsc.addParanoidWarning("Process `when` section will not be supported in a future version", node.when); + // deprecation warning reported during ast construction visit(node.when); visit(node.exec); diff --git a/modules/nf-lang/src/main/java/nextflow/script/formatter/ScriptFormattingVisitor.java b/modules/nf-lang/src/main/java/nextflow/script/formatter/ScriptFormattingVisitor.java index 7441245d3a..0a6c80814d 100644 --- a/modules/nf-lang/src/main/java/nextflow/script/formatter/ScriptFormattingVisitor.java +++ b/modules/nf-lang/src/main/java/nextflow/script/formatter/ScriptFormattingVisitor.java @@ -37,9 +37,7 @@ import nextflow.script.ast.ScriptVisitorSupport; import nextflow.script.ast.TupleParameter; import nextflow.script.ast.WorkflowNode; -import nextflow.script.types.Types; import org.codehaus.groovy.ast.ClassNode; -import org.codehaus.groovy.ast.FieldNode; import org.codehaus.groovy.ast.Parameter; import org.codehaus.groovy.ast.expr.EmptyExpression; import org.codehaus.groovy.ast.expr.Expression; @@ -235,7 +233,7 @@ public void visitParams(ParamBlockNode node) { private static int maxParameterWidth(Parameter[] parameters) { return Arrays.stream(parameters) - .map(param -> parameterWidth(param)) + .map(param -> param.getName().length()) .max(Integer::compare).orElse(0); } @@ -312,49 +310,55 @@ public void visitWorkflow(WorkflowNode node) { } private void visitTypedInputs(Parameter[] inputs) { - var alignmentWidth = options.harshilAlignment() - ? maxParameterWidth(inputs) - : 0; - for( var input : inputs ) { fmt.appendIndent(); if( input instanceof TupleParameter tp ) { - var components = Arrays.stream(tp.components) - .map(p -> p.getName()) - .collect(Collectors.joining(", ")); - fmt.append('('); - fmt.append(components); - fmt.append(')'); + visitStructuredInput(tp); } else { - fmt.append(input.getName()); - } - if( fmt.hasType(input) ) { - if( alignmentWidth > 0 ) { - var padding = alignmentWidth - parameterWidth(input) + 1; - fmt.append(" ".repeat(padding)); - } - fmt.append(": "); - var type = input.getType(); - if( type.getNameWithoutPackage().startsWith("__Record") ) - visitProcessInputRecordType((RecordNode) type.redirect()); - else - fmt.visitTypeAnnotation(type); + visitTypedInput(input); } fmt.appendTrailingComment(input); fmt.appendNewLine(); } } - private static int parameterWidth(Parameter param) { - return param instanceof TupleParameter tp - ? Arrays.stream(tp.components).mapToInt(p -> 2 + p.getName().length()).sum() - : param.getName().length(); + private void visitStructuredInput(TupleParameter tp) { + var isRecord = "Record".equals(tp.getType().getNameWithoutPackage()); + var wrap = isRecord; + + fmt.append(isRecord ? "record" : "tuple"); + fmt.append('('); + if( wrap ) + fmt.incIndent(); + for( int i = 0; i < tp.components.length; i++ ) { + var p = tp.components[i]; + if( wrap ) { + fmt.appendNewLine(); + fmt.appendIndent(); + } + fmt.append(p.getName()); + if( fmt.hasType(p) ) { + fmt.append(": "); + fmt.visitTypeAnnotation(p.getType()); + } + if( i < tp.components.length - 1 ) + fmt.append(wrap ? "," : ", "); + } + if( wrap ) { + fmt.appendNewLine(); + fmt.decIndent(); + fmt.appendIndent(); + } + fmt.append(')'); } - private void visitProcessInputRecordType(RecordNode type) { - fmt.append("Record"); - visitRecordBody(type); + private void visitTypedInput(Parameter node) { + fmt.append(node.getName()); + if( fmt.hasType(node) ) { + fmt.append(": "); + fmt.visitTypeAnnotation(node.getType()); + } } private void visitTypedOutputs(List outputs) { @@ -599,20 +603,12 @@ public void visitRecord(RecordNode node) { } private void visitRecordBody(RecordNode node) { - var alignmentWidth = options.harshilAlignment() - ? maxFieldWidth(node.getFields()) - : 0; - fmt.append(" {\n"); fmt.incIndent(); for( var fn : node.getFields() ) { fmt.appendIndent(); fmt.append(fn.getName()); if( fmt.hasType(fn) ) { - if( alignmentWidth > 0 ) { - var padding = alignmentWidth - fn.getName().length() + 1; - fmt.append(" ".repeat(padding)); - } fmt.append(": "); fmt.visitTypeAnnotation(fn.getType()); } @@ -623,12 +619,6 @@ private void visitRecordBody(RecordNode node) { fmt.append("}"); } - private int maxFieldWidth(List fields) { - return fields.stream() - .map(fn -> fn.getName().length()) - .max(Integer::compare).orElse(0); - } - @Override public void visitEnum(ClassNode node) { fmt.appendLeadingComments(node); diff --git a/modules/nf-lang/src/main/java/nextflow/script/parser/ScriptAstBuilder.java b/modules/nf-lang/src/main/java/nextflow/script/parser/ScriptAstBuilder.java index 13cc7509ae..085a85957f 100644 --- a/modules/nf-lang/src/main/java/nextflow/script/parser/ScriptAstBuilder.java +++ b/modules/nf-lang/src/main/java/nextflow/script/parser/ScriptAstBuilder.java @@ -46,6 +46,8 @@ import nextflow.script.ast.ScriptNode; import nextflow.script.ast.TupleParameter; import nextflow.script.ast.WorkflowNode; +import nextflow.script.types.Record; +import nextflow.script.types.Tuple; import org.antlr.v4.runtime.ANTLRErrorListener; import org.antlr.v4.runtime.CharStream; import org.antlr.v4.runtime.CharStreams; @@ -556,60 +558,38 @@ else if( ctx.processTupleInput() != null ) { } private Parameter processRecordInput(ProcessRecordInputContext ctx) { - var name = identifier(ctx.identifier()); - var type = type(ctx.type()); - if( !"Record".equals(type.getUnresolvedName()) ) - collectSyntaxError(new SyntaxException("Process record input must have type `Record`", ast( new EmptyStatement(), ctx ))); - - var recordNode = ast( new RecordNode(nextRecordName()), ctx ); - if( ctx.recordBody() != null ) - recordBody(ctx.recordBody(), recordNode); - else - collectSyntaxError(new SyntaxException("Missing record body", recordNode)); - - return ast( new Parameter(recordNode.getPlainNodeReference(), name), ctx ); - } - - private static AtomicInteger nextRecordId = new AtomicInteger(1); - - private static String nextRecordName() { - var id = nextRecordId.getAndIncrement(); - return "__Record_" + id; - } - - private TupleParameter processTupleInput(ProcessTupleInputContext ctx) { - var type = type(ctx.type()); - var numComponents = ctx.identifier().size(); - var componentTypes = tupleComponentTypes(type, numComponents); - var components = new Parameter[numComponents]; - for( int i = 0; i < numComponents; i++ ) { - var ident = ctx.identifier().get(i); - var name = identifier(ident); - var componentType = componentTypes != null ? componentTypes.get(i) : ClassHelper.dynamicType(); - var component = ast( param(componentType, name), ident ); - checkInvalidVarName(component.getName(), component); - components[i] = component; - } - var result = ast( new TupleParameter(type, components), ctx ); - if( !"Tuple".equals(type.getUnresolvedName()) ) - collectSyntaxError(new SyntaxException("Process tuple input must have type `Tuple<...>`", result)); - else if( numComponents == 1 ) + var components = ctx.nameTypePair().stream() + .map((ntp) -> { + var name = identifier(ntp.identifier()); + var fieldType = type(ntp.type()); + var field = ast( param(fieldType, name), ntp ); + checkInvalidVarName(field.getName(), field); + if( ntp.type() == null ) + collectWarning("Record field should have a type annotation", name, field); + return field; + }) + .toArray(Parameter[]::new); + return ast( new TupleParameter(new ClassNode(Record.class), components), ctx ); + } + + private Parameter processTupleInput(ProcessTupleInputContext ctx) { + var components = ctx.nameTypePair().stream() + .map((ntp) -> { + var name = identifier(ntp.identifier()); + var componentType = type(ntp.type()); + var component = ast( param(componentType, name), ntp ); + checkInvalidVarName(component.getName(), component); + if( ntp.type() == null ) + collectWarning("Tuple component should have a type annotation", name, component); + return component; + }) + .toArray(Parameter[]::new); + var result = ast( new TupleParameter(new ClassNode(Tuple.class), components), ctx ); + if( ctx.nameTypePair().size() == 1 ) collectSyntaxError(new SyntaxException("Process tuple input must have more than one component", result)); - else if( !type.isUsingGenerics() || type.getGenericsTypes().length != numComponents ) - collectSyntaxError(new SyntaxException("Process tuple input type must have " + numComponents + " type arguments (one for each tuple component)", result)); return result; } - private List tupleComponentTypes(ClassNode type, int n) { - if( !"Tuple".equals(type.getUnresolvedName()) ) - return null; - if( !type.isUsingGenerics() || type.getGenericsTypes().length != n ) - return null; - return Arrays.stream(type.getGenericsTypes()) - .map(gt -> gt.getType()) - .toList(); - } - private Statement processInputsV1(ProcessInputsContext ctx) { if( ctx == null || previewTypes ) return EmptyStatement.INSTANCE; diff --git a/modules/nf-lang/src/test/groovy/nextflow/script/formatter/ScriptFormatterTest.groovy b/modules/nf-lang/src/test/groovy/nextflow/script/formatter/ScriptFormatterTest.groovy index 3b7c96e3c0..ecbecbff98 100644 --- a/modules/nf-lang/src/test/groovy/nextflow/script/formatter/ScriptFormatterTest.groovy +++ b/modules/nf-lang/src/test/groovy/nextflow/script/formatter/ScriptFormatterTest.groovy @@ -207,7 +207,7 @@ class ScriptFormatterTest extends Specification { nextflow.preview.types=true process hello{ - debug(true) ; input: (id,infile):Tuple ; index:Path ; stage: stageAs(infile,'input.txt') ; output: result=tuple(id,file('output.txt')) ; script: 'cat input.txt > output.txt' + debug(true) ; input: tuple(id:String,infile:Path) ; index:Path ; stage: stageAs(infile,'input.txt') ; output: result=tuple(id,file('output.txt')) ; script: 'cat input.txt > output.txt' } ''', '''\ @@ -217,7 +217,7 @@ class ScriptFormatterTest extends Specification { debug true input: - (id, infile): Tuple + tuple(id: String, infile: Path) index: Path stage: @@ -237,7 +237,7 @@ class ScriptFormatterTest extends Specification { nextflow.preview.types=true process hello{ - input: sample:Record{id:String;infile:Path} ; script: 'cat input.txt > output.txt' + input: record(id:String,infile:Path) ; script: 'cat input.txt > output.txt' } ''', '''\ @@ -245,10 +245,10 @@ class ScriptFormatterTest extends Specification { process hello { input: - sample: Record { - id: String + record( + id: String, infile: Path - } + ) script: 'cat input.txt > output.txt' diff --git a/modules/nf-lang/src/test/groovy/nextflow/script/parser/ScriptAstBuilderTest.groovy b/modules/nf-lang/src/test/groovy/nextflow/script/parser/ScriptAstBuilderTest.groovy index ba74730d50..a31918c9bf 100644 --- a/modules/nf-lang/src/test/groovy/nextflow/script/parser/ScriptAstBuilderTest.groovy +++ b/modules/nf-lang/src/test/groovy/nextflow/script/parser/ScriptAstBuilderTest.groovy @@ -345,27 +345,7 @@ class ScriptAstBuilderTest extends Specification { process hello { input: - (id): List - - script: - "" - } - ''' - ) - then: - errors.size() == 1 - errors[0].getStartLine() == 5 - errors[0].getStartColumn() == 5 - errors[0].getOriginalMessage() == "Process tuple input must have type `Tuple<...>`" - - when: - errors = check( - '''\ - nextflow.preview.types = true - - process hello { - input: - (id): Tuple + tuple(id: String) script: "" @@ -385,27 +365,7 @@ class ScriptAstBuilderTest extends Specification { process hello { input: - (id, fastq): Tuple - - script: - "" - } - ''' - ) - then: - errors.size() == 1 - errors[0].getStartLine() == 5 - errors[0].getStartColumn() == 5 - errors[0].getOriginalMessage() == "Process tuple input type must have 2 type arguments (one for each tuple component)" - - when: - errors = check( - '''\ - nextflow.preview.types = true - - process hello { - input: - (id, fastq): Tuple + tuple(id: String, fastq: Path) script: "" @@ -424,53 +384,10 @@ class ScriptAstBuilderTest extends Specification { process hello { input: - sample: Map { - id: String - fastq: Path - } - - script: - "" - } - ''' - ) - then: - errors.size() == 1 - errors[0].getStartLine() == 5 - errors[0].getStartColumn() == 5 - errors[0].getOriginalMessage() == "Process record input must have type `Record`" - - when: - errors = check( - '''\ - nextflow.preview.types = true - - process hello { - input: - sample: Record {} - - script: - "" - } - ''' - ) - then: - errors.size() == 1 - errors[0].getStartLine() == 5 - errors[0].getStartColumn() == 5 - errors[0].getOriginalMessage() == "Missing record body" - - when: - errors = check( - '''\ - nextflow.preview.types = true - - process hello { - input: - sample: Record { - id: String - fastq: Path - } + record( + id: String, + fastq: Path, + ) script: "" diff --git a/tests/collect-tuple-typed.nf b/tests/collect-tuple-typed.nf index 64fd862d98..dddbed108f 100644 --- a/tests/collect-tuple-typed.nf +++ b/tests/collect-tuple-typed.nf @@ -9,7 +9,7 @@ process align { debug true input: - (barcode, seq_id): Tuple + tuple(barcode: String, seq_id: String) output: tuple(barcode, seq_id, file('bam'), file('bai')) @@ -29,7 +29,7 @@ process merge { debug true input: - (barcode, seq_ids, bam, bai): Tuple, Bag, Bag> + tuple(barcode: String, seq_ids: Bag, bam: Bag, bai: Bag) stage: stageAs bam, 'bam?' diff --git a/tests/dynamic-filename-typed.nf b/tests/dynamic-filename-typed.nf index 9148d18b80..7e2d0698e1 100644 --- a/tests/dynamic-filename-typed.nf +++ b/tests/dynamic-filename-typed.nf @@ -24,7 +24,7 @@ process foo { stageInMode 'copy' input: - (name, txt): Tuple + tuple(name: String, txt: Path) stage: stageAs txt, "${params.prefix}_${name}.txt" diff --git a/tests/records.nf b/tests/records.nf index 49e032dcf1..5d2c7dbe57 100644 --- a/tests/records.nf +++ b/tests/records.nf @@ -22,23 +22,23 @@ process TOUCH { process FASTQC { input: - sample: Record { - id: String - fastq_1: Path + record( + id: String, + fastq_1: Path, fastq_2: Path - } + ) output: record( - id: sample.id, + id: id, html: file('*.html'), zip: file('*.zip') ) script: """ - touch ${sample.id}.html - touch ${sample.id}.zip + touch ${id}.html + touch ${id}.zip """ }