You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: adr/20260310-typed-dataflow.md
+113Lines changed: 113 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -320,6 +320,119 @@ Named arguments can be used with a process under the following conditions:
320
320
321
321
This approach resolves the aforementioned trade-off, allowing the process definition to be maximally flexible (one big record input) without making the process call more verbose in the most common use case.
322
322
323
+
## How to distinguish between typed and legacy dataflow?
324
+
325
+
Static typing has been introduced as multiple independent features:
326
+
327
+
- Type annotations
328
+
- Typed parameters (`params` block)
329
+
- Typed outputs (`output` block)
330
+
- Typed processes
331
+
- Record types
332
+
- Typed dataflow (this proposal)
333
+
334
+
This approach was done in contrast to DSL2, which was a monolithic change that required an entire pipeline to be updated at once. With static typing, each new feature can be adopted independently of the others, rather than requiring all new features to be adopted at once (e.g. "DSL3").
335
+
336
+
However, the challenge with this approach is to make sure that it is easy for users (and agents) to distinguish between new and old syntax.
337
+
338
+
Several alternative approaches are considered below:
339
+
340
+
### Option 1: Use `nextflow.enable.types` to enable typed processes and typed dataflow
341
+
342
+
Most of the features for static typing are *purely additive* -- they are new concepts that can be used alongside existing code. However, typed processes and typed dataflow modify existing concepts (`process` and `workflow` definitions), so they require the `nextflow.preview.types` feature flag.
343
+
344
+
This preview flag will be replaced by `nextflow.enable.types` once the feature set is stable, and this flag would likely be used indefinitely to distinguish between typed and legacy code. It would only be removed if the support for legacy syntax was removed, which is unlikely since DSL2 has been the standard Nextflow syntax for many years.
345
+
346
+
The syntax for typed processes is significantly different, such that a feature flag seems appropriate. However, typed dataflow looks similar to legacy dataflow, but has slightly different semantics. A feature flag may not be enough to signal the difference to users and agents, even if it is sufficient for the compiler and language server.
347
+
348
+
### Option 2: Use `nextflow.enable.types` to enable all static typing
349
+
350
+
Now that the entire language has been updated to support static typing, it could make sense to provide it as a single feature controlled by a single feature flag:
351
+
352
+
```groovy
353
+
// "dynamically typed" code
354
+
// nextflow.enable.types = true
355
+
356
+
// "statically typed" code
357
+
nextflow.enable.types = true
358
+
```
359
+
360
+
Even though these features can be adopted independently in principle, they are designed to work together, and in practice it is difficult to adopt one feature without the others:
361
+
362
+
- Migrating a large pipeline to workflow outputs is very difficult without also migrating to typed processes and record types.
363
+
- Adopting type annotations (e.g. for workflow takes and emits) can provide some basic documentation and validation, but most workflow logic still cannot be effectively validated by the type checker without typed processes.
364
+
365
+
Enabling all static typing features via `nextflow.enable.types` would establish a clear boundary between *statically typed* code and *dynamically typed* code. This way, the poor distinction between typed dataflow vs legacy dataflow is made up by the clear distinction of type annotations, record types, etc in the same context.
366
+
367
+
Since type annotations and typed parameters were introduced in Nextflow 25.10 as stable, requiring a feature flag for them in 26.04 would be a breaking change. However, this break might be acceptable for now since these features are still new and many users are waiting for full static typing anyway. These features could be allowed with a warning in 26.04 to ease the transition.
368
+
369
+
See also: static compilation in Groovy via `@CompileStatic`
370
+
371
+
### Option 3: Enable new operators via `include` declaration
372
+
373
+
Since operators are methods of the `Channel` type, new operators can be understood as a new `Channel` type / `channel` namespace. Therefore, the new operators could be introduced simply by including a different version of `Channel` or `channel`:
374
+
375
+
```groovy
376
+
// legacy operators (default)
377
+
// include { channel } from 'dataflow/v1'
378
+
379
+
// typed operators
380
+
include { channel } from 'dataflow/v2'
381
+
```
382
+
383
+
This approach is similar to using a feature flag, but it more clearly expresses the intent of using the new operators. The other aspects of typed dataflow -- removal of certain syntax patterns, process named arguments -- could also be enabled by this include or by the `nextflow.enable.types` feature flag.
384
+
385
+
Either way, the feature flag will still be needed to enable typed processes, so users will end up using both the feature flag and include across their scripts. This might be more complicated than just using a feature flag.
386
+
387
+
### Option 4: Use new operator names in typed dataflow
388
+
389
+
Typed dataflow could simply rename all operators that were changed. This would clearly distinguish typed dataflow from legacy dataflow.
390
+
391
+
The problem of semantic changes essentially comes down to `cross` and `join`:
392
+
393
+
-`groupBy` was renamed from `groupTuple`
394
+
- all other operators are effectively identical, with minor differences that amount to bug fixes
395
+
396
+
Even `join` will be distinct in typed dataflow because it will join on record fields instead of tuple indices:
Ironically, the new operators are more true to their names than the old ones:
412
+
413
+
-`cross` now performs a true cross product (the legacy `cross` implicitly joined on matching keys)
414
+
-`join` now performs a true relational join (the legacy `join` did not handle duplicates correctly)
415
+
416
+
Ultimately, new operator names alone might not be enough to signal the other aspects of typed dataflow, such as the removal of many other operators and syntax patterns.
417
+
418
+
### Option 5: Replace `process` and `workflow` with `task` and `flow`
419
+
420
+
The key issue is that typed processes and typed dataflow modify existing concepts (processes and workflows) rather than adding to them. Instead, we could make these features purely additive by introducing them as new top-level concepts:
421
+
422
+
```groovy
423
+
// legacy semantics
424
+
process FASTQC { /* ... */ }
425
+
workflow RNASEQ { /* ... */ }
426
+
427
+
// typed semantics
428
+
task FASTQC { /* ... */ }
429
+
flow RNASEQ { /* ... */ }
430
+
```
431
+
432
+
This approach would make the distinction very clear. However, this change would be a significant break, since *processes* and *workflows* are long-standing and fundamental ideas in Nextflow. Users think about Nextflow pipelines in terms of *processes* and *workflows*, so introducing new terminology would be confusing.
433
+
434
+
See also: [Prefect](https://www.prefect.io/prefect/open-source) (uses `@task` and `@flow` in their DSL)
0 commit comments