Skip to content

Commit c956ca7

Browse files
authored
Merge branch 'master' into 251117-module-system
2 parents 8e681e9 + bbbb2b3 commit c956ca7

103 files changed

Lines changed: 5592 additions & 1003 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
25.11.0-edge
1+
25.12.0-edge
Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
# Git Multi-Revision Asset Management with Strategy Pattern
2+
3+
- Authors: Jorge Ejarque
4+
- Status: Approved
5+
- Deciders: Jorge Ejarque, Ben Sherman, Paolo Di Tommaso
6+
- Date: 2025-12-05
7+
- Tags: scm, asset-management, multi-revision
8+
9+
## Summary
10+
11+
Nextflow's asset management system has been refactored to support multiple revisions of the same pipeline concurrently through a bare repository approach with shared object storage, while maintaining backward compatibility with legacy direct-clone repositories using the Strategy design pattern.
12+
13+
## Problem Statement
14+
15+
The original asset management system (`AssetManager`) cloned each pipeline directly to `~/.nextflow/assets/<org>/<project>/.git`, creating several limitations:
16+
17+
1. **No concurrent Git multi-revision support**: Only one revision of a pipeline could be checked out at a time, preventing concurrent execution of different versions
18+
2. **Update conflicts**: Pulling updates while a pipeline was running could cause conflicts or corruption
19+
3. **Testing limitations**: Users couldn't easily test different versions of a pipeline side-by-side
20+
21+
The goal was to enable running multiple revisions of the same pipeline concurrently (e.g., production on v1.0, testing on v2.0-dev) while maintaining efficient disk usage through object sharing.
22+
23+
## Goals or Decision Drivers
24+
25+
- **Concurrent multi-revision execution**: Must support running different revisions of the same pipeline simultaneously
26+
- **Efficient disk usage**: Share Git objects between revisions to minimize storage overhead
27+
- **Backward compatibility**: Must not break existing pipelines using the legacy direct-clone approach
28+
- **API stability**: Maintain the existing `AssetManager` API for external consumers (K8s plugin, CLI commands, etc.)
29+
- **Minimal migration impact**: Existing repositories should continue to work without user intervention
30+
- **JGit compatibility**: Solution must work within JGit's capabilities to avoid relying on Git client installations
31+
- **Atomic updates**: Downloading new revisions should not interfere with running pipelines
32+
33+
## Non-goals
34+
35+
- **Migration of existing legacy repositories**: Legacy repos continue to work as-is; no forced migration
36+
- **Native Git worktree support**: Due to JGit limitations, not using Git's worktree feature
37+
- **Revision garbage collection**: No automatic cleanup of old revisions (users can manually drop)
38+
- **Multi-hub support**: Still tied to a single repository provider per pipeline
39+
40+
## Considered Options
41+
42+
### Option 1: Bare Repository with Git Worktrees
43+
44+
Use Git's worktree feature to create multiple working directories from a single bare repository.
45+
46+
**Implementation**:
47+
- One bare repository at `~/.nextflow/assets/<org>/<project>/.git`
48+
- Multiple worktrees at `~/.nextflow/assets/<org>/<project>/<revision>/`
49+
50+
- Good, because it's the native Git solution for multiple checkouts
51+
- Good, because worktrees are space-efficient
52+
- Good, because Git handles all the complexity
53+
- **Bad, because JGit doesn't support worktrees** (deal-breaker)
54+
- Bad, because requires native Git installation
55+
56+
**Decision**: Rejected due to JGit incompatibility
57+
58+
### Option 2: Bare Repository + Clones per Commit + Revision Map File
59+
60+
Use a bare repository for storage and create clones for each commit, tracking them in a separate file.
61+
62+
**Implementation**:
63+
- Bare repository at `~/.nextflow/assets/<org>/<project>/.nextflow/bare_repo/`
64+
- Clones at `~/.nextflow/assets/<org>/<project>/.nextflow/commits/<commit-sha>/`
65+
- Revision map file at `~/.nextflow/assets/<org>/<project>/.nextflow/revisions.json` mapping revision names to commit SHAs
66+
67+
- Good, because it works with JGit
68+
- Good, because bare repo reduces remote repository interactions to checkout commits
69+
- Good, because explicit revision tracking
70+
- Bad, because disk space as git objects replicated in clones
71+
- Bad, because revision map file can become stale
72+
- Bad, because requires file I/O for every revision lookup
73+
- Bad, because potential race conditions on map file updates
74+
- Bad, because adds complexity of maintaining external state
75+
76+
**Decision**: Initially implemented but later refined
77+
78+
### Option 3: Bare Repository + Shared Clones with Strategy Pattern
79+
80+
Similar to Option 2 but eliminate the separate revision map file by using the bare repository itself as the source of truth. Additionally, use the Strategy pattern to maintain backward compatibility with existing legacy repositories without requiring migration.
81+
82+
**Implementation**:
83+
- Bare repository at `~/.nextflow/assets/.repos/<org>/<project>/bare/`
84+
- Shared clones at `~/.nextflow/assets/.repos/<org>/<project>/clones/<commit-sha>/`
85+
- Use bare repository refs to resolve revisions to commit SHAs dynamically
86+
- JGit alternates mechanism for object sharing
87+
- `AssetManager` as facade with unchanged public API
88+
- `RepositoryStrategy` interface defining repository operations
89+
- `LegacyRepositoryStrategy` for existing direct-clone behavior
90+
- `MultiRevisionRepositoryStrategy` for new bare-repo approach
91+
- Strategy selection based on environment variable or repository state detection
92+
93+
- Good, because no external state file to maintain
94+
- Good, because bare repository is always in sync (fetched on updates)
95+
- Good, because simpler and more reliable
96+
- Good, because atomic updates (Git operations are atomic)
97+
- Good, because works entirely within JGit
98+
- Good, because zero migration needed for existing repositories
99+
- Good, because maintains API compatibility
100+
- Good, because allows gradual adoption
101+
- Good, because isolates legacy code
102+
- Good, because makes future strategies easy to add
103+
- Neutral, because adds abstraction layer
104+
- Bad, because requires resolution on every access (minimal overhead)
105+
- Bad, because increases codebase size initially
106+
107+
**Decision**: Selected
108+
109+
## Solution or decision outcome
110+
111+
Implemented **Option 3 (Bare Repository + Shared Clones with Strategy Pattern)** for multi-revision support with backward compatibility. Multi-revision is the default for new repositories, while legacy mode is available via `NXF_SCM_LEGACY` environment variable.
112+
113+
## Rationale & discussion
114+
115+
### Git Multi-Revision Implementation
116+
117+
The bare repository approach provides efficient multi-revision support:
118+
119+
```
120+
~/.nextflow/assets/.repos/nextflow-io/hello/
121+
├── bare/ # Bare repository (shared objects)
122+
│ ├── objects/ # All Git objects stored here
123+
│ ├── refs/
124+
│ │ ├── heads/
125+
│ │ └── tags/
126+
│ └── config
127+
128+
└── clones/ # Revisions-specific clones
129+
├── abc123.../ # Clone for commit abc123
130+
│ └── .git/
131+
│ ├── objects/ # (uses alternates → bare/objects)
132+
│ └── info/
133+
│ └── alternates # Points to bare/objects
134+
135+
└── def456.../ # Clone for commit def456
136+
└── .git/
137+
138+
~/.nextflow/assets/nextflow-io/hello/
139+
└── .git/ # Legacy repo location (HYBRID state)
140+
```
141+
142+
**Key mechanisms:**
143+
144+
1. **Bare repository as source of truth**: The bare repo is fetched/updated from the remote, keeping refs current
145+
2. **Dynamic resolution**: Revisions (branch/tag names) are resolved to commit SHAs using the bare repo's refs
146+
3. **Object sharing**: Clones use Git alternates to reference the bare repo's objects, avoiding duplication
147+
4. **Atomic operations**: Each clone is independent; downloading a new revision doesn't affect existing ones
148+
5. **Lazy creation**: Clones are created on-demand when a specific revision is requested
149+
150+
**Advantages over revision map file:**
151+
- No external state to maintain or keep in sync
152+
- Bare repo fetch automatically updates all refs
153+
- Resolution is simple: `bareRepo.resolve(revision)` returns commit SHA
154+
- No race conditions on file updates
155+
- Simpler code with fewer failure modes
156+
157+
### Strategy Pattern for Backward Compatibility
158+
159+
The Strategy pattern provides clean separation and backward compatibility:
160+
161+
```
162+
┌─────────────────────────┐
163+
│ AssetManager │ ← Public API (unchanged)
164+
│ (Facade) │
165+
└───────────┬─────────────┘
166+
167+
│ delegates to
168+
169+
┌─────────────────────────┐
170+
│ RepositoryStrategy │ ← Interface
171+
└───────────┬─────────────┘
172+
173+
│ implements
174+
┌───────┴────────┐
175+
│ │
176+
┌───────────┐ ┌─────────────────┐
177+
│ Legacy │ │ MultiRevision │ ← Concrete strategies
178+
│ Strategy │ │ Strategy │
179+
└───────────┘ └─────────────────┘
180+
```
181+
182+
**Strategy selection logic:**
183+
184+
1. Check `NXF_SCM_LEGACY` environment variable → Use legacy if set
185+
2. Check if there is only the legacy asset of the repository (`isOnlyLegacy` method) → Use legacy (preserve existing)
186+
3. Otherwise -> Use multi-revision
187+
188+
189+
**Backward compatibility guarantees:**
190+
191+
- Existing repositories continue to work without changes
192+
- `AssetManager` API remains identical
193+
- CLI commands work with both strategies transparently
194+
- Tests pass with minimal modifications
195+
- No forced migration; users opt-in naturally when creating new repos
196+
197+
### Hybrid State Handling
198+
199+
The system gracefully handles hybrid states where both legacy and multi-revision repositories coexist:
200+
201+
- **Detection**: In hybrid states, a multi-revision strategy is selected by default.
202+
- **Fallback logic**: Multi-revision strategy can fall back to legacy repo for operations if needed
203+
- **No conflicts**: Strategies are designed to coexist; operations target different directories
204+
- **Explicit control**: Users can force a specific strategy via `setStrategyType()` or `NXF_SCM_LEGACY` environment variable
205+
206+
### Migration Path
207+
208+
Users naturally migrate as they pull new revisions:
209+
210+
1. **Existing users**: Can continue with legacy repos (`NXF_SCM_LEGACY` state detected)
211+
2. **New users**: Get multi-revision by default
212+
3. **Opt-in migration**: Delete project directory to switch to multi-revision or pull with --migrate
213+
4. **Opt-out**: Set `NXF_SCM_LEGACY=true` to force legacy mode
214+
215+
### Implementation Details
216+
217+
**Key classes:**
218+
219+
- `RepositoryStrategy`: Interface defining repository operations
220+
- `AbstractRepositoryStrategy`: Base class with shared helper methods
221+
- `LegacyRepositoryStrategy`: Direct clone implementation (original behavior)
222+
- `MultiRevisionRepositoryStrategy`: Bare repo + shared clones implementation
223+
224+
**Critical methods:**
225+
226+
- `download()`: Equivalent for both strategies (legacy pulls, multi-revision creates shared clone)
227+
- `getLocalPath()`: Returns appropriate working directory based on strategy
228+
- `getGit()`: Returns appropriate Git instance (legacy git, bare git, or commit git)
229+
230+
### Performance Characteristics
231+
232+
**Disk usage:**
233+
- Legacy: ~100% per repository (full clone with all git objects) + Worktree
234+
- Multi-revision: ~100% for bare + ~100K (.git with alternates) per revision + Worktree per revision
235+
236+
**Operation speed:**
237+
- First download: Similar (both clone from remote)
238+
- Additional revisions: Multi-revision faster (only fetches new objects once, creates cheap clones)
239+
- Switching revisions: Multi-revision instant (different directories), legacy requires checkout
240+
241+
### Known Limitations
242+
243+
- No automatic migration of legacy repositories
244+
- Bare repository overhead even for users who only need one revision
245+
- JGit alternates slightly more complex than worktrees
246+
- Manual cleanup required for old revision clones
247+
248+
## Links
249+
- [GitHub Issue #2870 - Multiple revisions of the same pipeline for concurrent execution](https://github.com/nextflow-io/nextflow/issues/2870)
250+
- [PR #6620 - Implementation of multiple revisions without revisions map](https://github.com/nextflow-io/nextflow/pull/6620)
251+
- Related PRs implementing the multi-revision approach (linked in #6620)
252+

changelog.txt

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,34 @@
11
NEXTFLOW CHANGE-LOG
22
===================
3+
25.12.0-edge - 19 Dec 2025
4+
- Add `listDirectory()` to Path type and deprecate `listFiles()` (#6581) [56f0f007]
5+
- Add default maxSpotAttempts for fusion snapshots in Google Batch (#6652) [458ef97a]
6+
- Add onlyJobState option for SLURM executor (#6659) [3c3e9f52]
7+
- Add README files for all plugins (#6660) [bee8cff6]
8+
- Add runtimeClassName to the pod options (#6633) [ddcef4f4]
9+
- Add spot interruption tracking to trace records (#6606) [eecd8167]
10+
- Add URL encoding when revision name is used as HTTP query parameter (#6598) [7894e097]
11+
- Add warnings to JSON output in lint command (#6625) [bb066969]
12+
- Add wave.build.template config option (#6639) [d08a8952]
13+
- Check Nextflow version before loading plugins (#6591) [03da64eb]
14+
- Fix GitHub repository provider when providing token with auth property (#6662) [d01cbde1]
15+
- Fix optional param in params block (#6657) [bd8de5ca]
16+
- Fix String.format error when plugin URL contains percent chars (#6651) [59c4f4e1]
17+
- Fix validation of numeric types in params block (#6656) [664a26eb]
18+
- Fix WaveClient sending Bearer token to public S3 URLs (#6672) [ffaef0b6]
19+
- Fix: tolerate spaces in `$NXF_TASK_WORKDIR` (#6421) [7b386025]
20+
- Implementation of Git multiple revisions (#6620) [ce9d7b59]
21+
- Refactor Google Batch getExitCode to imperative style (#6649) [addd59e9]
22+
- Set local task exit status when time limit is exceeded (#6592) [d3f8e135]
23+
- Add Nextflow Development Constitution (#6578) [7047e6be]
24+
- docs: Add extra warnings as 25.10 is added to platform (#6655) [ae0e844f]
25+
- docs: Add longer NXF_SYNTAX_PARSER descriptions (#6637) [23c277ad]
26+
- docs: Document best practices for script and config params (#6631) [3421734d]
27+
- docs: Fix typos (#6641) [20f4631e]
28+
- docs: Improve preview feature warnings in documentation (#6663) [cdc7a586]
29+
- docs: Update note about AWS CLI (#6626) [bb7aecf8]
30+
- docs: Update NXF_SYNTAX_PARSER callouts (#6640) [1b284a19]
31+
332
25.11.0-edge - 28 Nov 2025
433
- Add Google Batch LogsPolicy PATH option for logging to GCS (#6431) [5b61afe0]
534
- Add default value to Apptainer pull timeout config paramter (#6534) [f4548bd1]
-134 KB
Loading
-199 KB
Loading

docs/_static/report-tasks-min.png

-242 KB
Loading

docs/_static/timeline-min.png

11.5 KB
Loading

docs/cache-and-resume.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,21 +25,18 @@ The task hash is computed from the following metadata:
2525
- Task {ref}`Conda environment <process-conda>` (if applicable)
2626
- Task {ref}`Spack environment <process-spack>` and {ref}`CPU architecture <process-arch>` (if applicable)
2727
- Task {ref}`inputs <process-input>`
28+
- *New in 25.10:* Task {ref}`eval commands <process-out-eval>`
2829
- Task {ref}`script <process-script>`
29-
- Any global variables referenced in the task script
30-
- Any task {ref}`process-ext` properties referenced in the task script
30+
- Global variables referenced in the task script
31+
- *New in 23.10:* Task {ref}`process-ext` properties referenced in the task script
3132
- Any {ref}`bundled scripts <bundling-executables>` used in the task script
3233
- Whether the task is a {ref}`stub run <process-stub>`
3334

3435
:::{note}
3536
Nextflow also includes an incrementing component in the hash generation process, which allows it to iterate through multiple hash values until it finds one that does not match an existing execution directory. This mechanism typically aligns with task retries (i.e., task attempts), however this is not guaranteed.
3637
:::
3738

38-
:::{versionchanged} 23.09.2-edge
39-
The {ref}`process-ext` directive was added to the task hash.
40-
:::
41-
42-
Nextflow computes this hash for every task when it is created but before it is executed. If resumability is enabled and there is an entry in the task cache with the same hash, Nextflow tries to recover the previous task execution. A cache hit does not guarantee that the task will be resumed, because it must also recover the task outputs from the [work directory](#work-directory).
39+
Nextflow computes this hash for every task before it is executed. If resumability is enabled, Nextflow checks whether the task cache contains a matching hash and whether the task outputs are still present in the [work directory](#work-directory). If both conditions are met, the task is resumed; otherwise, it is re-executed.
4340

4441
Files are hashed differently depending on the caching mode. See the {ref}`process-cache` directive for more details.
4542

docs/cli.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,13 @@ $ nextflow run nextflow-io/hello -r v1.1
7979
$ nextflow run nextflow-io/hello -r dev-branch
8080
$ nextflow run nextflow-io/hello -r a3f5c8e
8181
```
82+
:::{versionadded} 25.12.0-edge
83+
:::
84+
Nextflow downloads and stores each explicitly requested Git branch, tag, or commit ID in a separate directory path, enabling you to run multiple revisions of the same pipeline simultaneously. Downloaded revisions are stored in a subdirectory of the local project: `$NXF_ASSETS/.repos/<org>/<repo>/clones/<commitId>`.
85+
86+
:::{tip}
87+
Use tags or commit IDs instead of branches for reproducible pipeline runs. Branch references change as development progresses over time.
88+
:::
8289

8390
(cli-params)=
8491

@@ -171,12 +178,12 @@ Use this to understand a project's structure, see available versions, or verify
171178
$ nextflow info hello
172179
project name: nextflow-io/hello
173180
repository : https://github.com/nextflow-io/hello
174-
local path : $HOME/.nextflow/assets/nextflow-io/hello
181+
local path : $HOME/.nextflow/assets/.repos/nextflow-io/hello
175182
main script : main.nf
176183
revisions :
177-
* master (default)
184+
> master (default)
178185
mybranch
179-
v1.1 [t]
186+
> v1.1 [t]
180187
v1.2 [t]
181188
```
182189

@@ -186,7 +193,7 @@ This shows:
186193
- Where it's cached locally
187194
- Which script runs by default
188195
- Available revisions (branches and tags marked with `[t]`)
189-
- Which revision is currently checked out (marked with `*`)
196+
- Which revisions are currently checked out (marked with `>`)
190197

191198
### Pulling or updating projects
192199

0 commit comments

Comments
 (0)