Skip to content

Commit 4718b9e

Browse files
authored
Merge branch 'master' into fix-aborted-submitted-count
2 parents 2ea2e38 + ababdda commit 4718b9e

37 files changed

Lines changed: 318 additions & 151 deletions

build.gradle

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,8 +102,8 @@ allprojects {
102102

103103
// Documentation required libraries
104104
groovyDoc 'org.fusesource.jansi:jansi:2.4.0'
105-
groovyDoc "org.apache.groovy:groovy-groovydoc:4.0.27"
106-
groovyDoc "org.apache.groovy:groovy-ant:4.0.27"
105+
groovyDoc "org.apache.groovy:groovy-groovydoc:4.0.28"
106+
groovyDoc "org.apache.groovy:groovy-ant:4.0.28"
107107
}
108108

109109
test {

docs/aws.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -281,7 +281,7 @@ There are several reasons why you might need to create your own [AMI (Amazon Mac
281281
### Create your custom AMI
282282

283283
From the EC2 Dashboard, select **Launch Instance**, then select **Browse more AMIs**. In the new page, select
284-
**AWS Marketplace AMIs**, and then search for **Amazon ECS-Optimized Amazon Linux 2 (AL2) x86_64 AMI**. Select the AMI and continue as usual to configure and launch the instance.
284+
**AWS Marketplace AMIs**, and then search for `Amazon ECS-Optimized Amazon Linux 2 (AL2) x86_64 AMI`. Select the AMI and continue as usual to configure and launch the instance.
285285

286286
:::{note}
287287
The selected instance has a root volume of 30GB. Make sure to increase its size or add a second EBS volume with enough storage for real genomic workloads.

docs/cache-and-resume.md

Lines changed: 49 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -72,75 +72,86 @@ For this reason, it is important to preserve both the task cache (`.nextflow/cac
7272

7373
## Troubleshooting
7474

75-
Cache failures happen when either (1) a task that was supposed to be cached was re-executed, or (2) a task that was supposed to be re-executed was cached.
75+
Cache failures occur when a task that was supposed to be cached was re-executed or a task that was supposed to be re-executed was cached.
7676

77-
When this happens, consider the following questions:
77+
Common causes of cache failures include:
7878

79-
- Is resume enabled via `-resume`?
80-
- Is the {ref}`process-cache` directive set to a non-default value?
81-
- Is the task still present in the task cache and work directory?
82-
- Were any of the task inputs changed?
79+
- [Resume not enabled](#resume-not-enabled)
80+
- [Cache directive disabled](#cache-directive-disabled)
81+
- [Modified inputs](#modified-inputs)
82+
- [Inconsistent file attributes](#inconsistent-file-attributes)
83+
- [Race condition on a global variable](#race-condition-on-a-global-variable)
84+
- [Non-deterministic process inputs](#non-deterministic-process-inputs)
8385

84-
Changing any of the inputs included in the [task hash](#task-hash) will invalidate the cache, for example:
86+
### Resume not enabled
8587

86-
- Resuming from a different session ID
87-
- Changing the process name
88-
- Changing the task container image or Conda environment
89-
- Changing the task script
90-
- Changing an input file or bundled script used by the task
88+
The `-resume` option is required to resume a pipeline. Ensure you enable `-resume` in your run command or your Nextflow configuration file.
89+
90+
### Cache directive disabled
9191

92-
While the following examples would not invalidate the cache:
92+
The `cache` directive is enabled by default. However, you can disable or modify its behavior for a specific process. For example:
9393

94-
- Changing the value of a directive (other than {ref}`process-ext`), even if that directive is used in the task script
94+
```nextflow
95+
process FOO {
96+
cache false
97+
// ...
98+
}
99+
```
95100

96-
In many cases, cache failures happen because of a change to the pipeline script or configuration, or because the pipeline itself has some non-deterministic behavior.
101+
Ensure that the `cache` directive has not been disabled. See {ref}`process-cache` for more information.
97102

98-
Here are some common reasons for cache failures:
103+
### Modified inputs
99104

100-
### Modified input files
105+
Modifying inputs that are used in the task hash invalidates the cache. Common causes of modified inputs include:
101106

102-
Make sure that your input files have not been changed. Keep in mind that the default caching mode uses the complete file path, the last modified timestamp, and the file size. If any of these attributes change, the task will be re-executed, even if the file content is unchanged.
107+
- Changing input files
108+
- Resuming from a different session ID
109+
- Changing the process name
110+
- Changing the calling workflow name
111+
- Changing the task container image or Conda environment
112+
- Changing the task script
113+
- Changing a bundled script used by the task
103114

104-
### Process that modifies its inputs
115+
Nextflow calculates a hash for an input file using its full path, last modified timestamp, and file size. If any of these attributes change, Nextflow re-executes the task.
105116

106-
If a process modifies its own input files, it cannot be resumed for the reasons described in the previous point. As a result, processes that modify their own input files are considered an anti-pattern and should be avoided.
117+
:::{warning}
118+
If a process modifies its input files, it cannot be resumed. Avoid processes that modify their own input files as this is considered an anti-pattern.
119+
:::
107120

108121
### Inconsistent file attributes
109122

110-
Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache. If you encounter this problem, you can avoid it by using the `'lenient'` {ref}`caching mode <process-cache>`, which ignores the last modified timestamp and uses only the file path and size.
123+
Some shared file systems, such as NFS, may report inconsistent file timestamps, which can invalidate the cache when using the standard caching mode.
124+
125+
To resolve this issue, use the `'lenient'` {ref}`caching mode <process-cache>` to ignore the last modified timestamp and use only the file path and size.
111126

112127
(cache-global-var-race-condition)=
113128

114129
### Race condition on a global variable
115130

116-
While Nextflow tries to make it easy to write safe concurrent code, it is still possible to create race conditions, which can in turn impact the caching behavior of your pipeline.
117-
118-
Consider the following example:
131+
Race conditions can disrupt the caching behavior of your pipeline. For example:
119132

120133
```nextflow
121-
channel.of(1,2,3) | map { v -> X=v; X+=2 } | view { v -> "ch1 = $v" }
122-
channel.of(1,2,3) | map { v -> X=v; X*=2 } | view { v -> "ch2 = $v" }
134+
channel.of(1,2,3).map { v -> X=v; X+=2 }.view { v -> "ch1 = $v" }
135+
channel.of(1,2,3).map { v -> X=v; X*=2 }.view { v -> "ch2 = $v" }
123136
```
124137

125-
The problem here is that `X` is declared in each `map` closure without the `def` keyword (or other type qualifier). Using the `def` keyword makes the variable local to the enclosing scope; omitting the `def` keyword makes the variable global to the entire script.
138+
In the above example, `X` is declared in each `map` closure. Without the `def` keyword, the variable `X` is global to the entire script. Because operators are executed concurrently and `X` is global, there is a *race condition* that causes the emitted values to vary depending on the order of the concurrent operations. If these values were passed to a process as inputs, the process would execute different tasks during each run due to the race condition.
126139

127-
Because `X` is global, and operators are executed concurrently, there is a *race condition* on `X`, which means that the emitted values will vary depending on the particular order of the concurrent operations. If the values were passed as inputs into a process, the process would execute different tasks on each run due to the race condition.
128-
129-
The solution is to not use a global variable where a local variable is enough (or in this simple example, avoid the variable altogether):
140+
To resolve this issue, avoid declaring global variables in closures:
130141

131142
```nextflow
132-
// local variable
133-
channel.of(1,2,3) | map { v -> def X=v; X+=2 } | view { v -> "ch1 = $v" }
134-
135-
// no variable
136-
channel.of(1,2,3) | map { v -> v * 2 } | view { v -> "ch2 = $v" }
143+
channel.of(1,2,3).map { v -> def X=v; X+=2 }.view { v -> "ch1 = $v" }
137144
```
138145

146+
:::{versionadded} 25.04.0
147+
The {ref}`strict syntax <strict-syntax-page>` does not allow global variables to be declared in closures.
148+
:::
149+
139150
(cache-nondeterministic-inputs)=
140151

141152
### Non-deterministic process inputs
142153

143-
Sometimes a process needs to merge inputs from different sources. Consider the following example:
154+
A process that merges inputs from different sources non-deterministically may invalidate the cache. For example:
144155

145156
```nextflow
146157
workflow {
@@ -161,9 +172,9 @@ process check_bam_bai {
161172
}
162173
```
163174

164-
It is tempting to assume that the process inputs will be matched by `id` like the {ref}`operator-join` operator. But in reality, they are simply merged like the {ref}`operator-merge` operator. As a result, not only will the process inputs be incorrect, they will also be non-deterministic, thus invalidating the cache.
175+
In the above example, the inputs will be merged without matching on `id`, in a similar manner as the {ref}`operator-merge` operator. As a result, the inputs are incorrect and non-deterministic.
165176

166-
The solution is to explicitly join the two channels before the process invocation:
177+
To resolve this issue, use the `join` operator to join the channels into a single input channel before invoking the process:
167178

168179
```nextflow
169180
workflow {

docs/conf.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,11 @@
5151
'operator.md': 'reference/operator.md',
5252
'dsl1.md': 'migrations/dsl1.md',
5353
'updating-syntax.md': 'strict-syntax.md',
54-
'updating-spot-retries.md': 'guides/updating-spot-retries.md'
54+
'updating-spot-retries.md': 'guides/updating-spot-retries.md',
55+
'metrics.md': 'tutorials/metrics.md',
56+
'data-lineage.md' : 'tutorials/data-lineage.md',
57+
'workflow-outputs.md': 'tutorials/workflow-outputs.md',
58+
'flux.md': 'tutorials/flux.md'
5559
}
5660

5761
# Add any paths that contain templates here, relative to this directory.

docs/fusion.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Fusion is a virtual, lightweight, distributed file system that bridges the gap b
1616
See [Fusion file system](https://docs.seqera.io/fusion) for more information about Fusion features.
1717

1818
:::{note}
19-
Fusion requires a license for use in Seqera Platform compute environments or directly in Nextflow. Fusion can be trialed at no cost. [Contact Seqera](https://seqera.io/contact-us/) for more details.
19+
Fusion requires a license for use in Seqera Platform compute environments or directly in Nextflow. Fusion can be trialed at no cost. See the [Fusion licensing documentation](https://docs.seqera.io/fusion/licensing) for more information.
2020
:::
2121

2222
## Get started

docs/migrations/24-04.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ The `nf-ga4gh` plugin has since been moved into its own repository, [nextflow-io
112112
```groovy
113113
conda.channels = ['seqera', 'conda-forge', 'bioconda', 'defaults']
114114
115-
## Miscellanous
115+
## Miscellaneous
116116
117117
- New config option: `azure.batch.pools.<name>.lowPriority`
118118
- New config option: `azure.batch.pools.<name>.startTask.script`

docs/migrations/24-10.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Nextflow now supports managed identities for the Azure Batch executor. See {ref}
3535

3636
<h3>Task previous execution trace</h3>
3737

38-
The `task` variable in the process definition has two new proprties, `task.previousTrace` and `task.previousException`, which allows a task to access the runtime metadata of the previous attempt. See {ref}`task-previous-execution-trace` for details.
38+
The `task` variable in the process definition has two new properties, `task.previousTrace` and `task.previousException`, which allows a task to access the runtime metadata of the previous attempt. See {ref}`task-previous-execution-trace` for details.
3939

4040
## Breaking changes
4141

@@ -53,7 +53,7 @@ The `task` variable in the process definition has two new proprties, `task.previ
5353

5454
- The use of `addParams` and `params` clauses in include declarations is deprecated. See {ref}`module-params` for details.
5555

56-
## Miscellanous
56+
## Miscellaneous
5757

5858
- New config option: `aws.client.requesterPays`
5959
- New config option: `google.batch.autoRetryExitCodes`

docs/migrations/25-04.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ The third preview of workflow outputs introduces the following breaking changes
3030

3131
- The syntax for dynamic publish paths has changed. Instead of defining a closure that returns a closure with the `path` directive, the outer closure should use the `>>` operator to publish individual files. See {ref}`workflow-publishing-files` for details.
3232

33-
- The `mapper` index directive has been removed. Use a `map` operator in the workflwo body instead.
33+
- The `mapper` index directive has been removed. Use a `map` operator in the workflow body instead.
3434

3535
See {ref}`migrating-workflow-outputs` to get started.
3636

docs/reports.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,7 @@ The following table shows the fields that can be included in the execution repor
302302
: The value of the process `scratch` directive.
303303

304304
`error_action`
305-
: The action applied on errof task failure.
305+
: The action applied on error for task failure.
306306

307307
`hostname`
308308
: :::{versionadded} 22.05.0-edge

modules/nextflow/build.gradle

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,12 @@ dependencies {
2525
api(project(':nf-commons'))
2626
api(project(':nf-httpfs'))
2727
api(project(':nf-lang'))
28-
api "org.apache.groovy:groovy:4.0.27"
29-
api "org.apache.groovy:groovy-nio:4.0.27"
30-
api "org.apache.groovy:groovy-xml:4.0.27"
31-
api "org.apache.groovy:groovy-json:4.0.27"
32-
api "org.apache.groovy:groovy-templates:4.0.27"
33-
api "org.apache.groovy:groovy-yaml:4.0.27"
28+
api "org.apache.groovy:groovy:4.0.28"
29+
api "org.apache.groovy:groovy-nio:4.0.28"
30+
api "org.apache.groovy:groovy-xml:4.0.28"
31+
api "org.apache.groovy:groovy-json:4.0.28"
32+
api "org.apache.groovy:groovy-templates:4.0.28"
33+
api "org.apache.groovy:groovy-yaml:4.0.28"
3434
api "org.slf4j:jcl-over-slf4j:2.0.17"
3535
api "org.slf4j:jul-to-slf4j:2.0.17"
3636
api "org.slf4j:log4j-over-slf4j:2.0.17"
@@ -57,7 +57,7 @@ dependencies {
5757
testImplementation 'org.subethamail:subethasmtp:3.1.7'
5858
testImplementation (project(':nf-lineage'))
5959
// test configuration
60-
testFixturesApi ("org.apache.groovy:groovy-test:4.0.27") { exclude group: 'org.apache.groovy' }
60+
testFixturesApi ("org.apache.groovy:groovy-test:4.0.28") { exclude group: 'org.apache.groovy' }
6161
testFixturesApi ("org.objenesis:objenesis:3.4")
6262
testFixturesApi ("net.bytebuddy:byte-buddy:1.14.17")
6363
testFixturesApi ("org.spockframework:spock-core:2.3-groovy-4.0") { exclude group: 'org.apache.groovy' }

0 commit comments

Comments
 (0)