Skip to content

Commit 650a7f4

Browse files
committed
Merge branch 'master' into docs-modules-guide
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
2 parents e9f9551 + da06e9a commit 650a7f4

133 files changed

Lines changed: 10436 additions & 3978 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
26.03.2-edge
1+
26.03.3-edge
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# NIO Filesystem for Seqera Platform Datasets
2+
3+
- Authors: Jorge Ejarque
4+
- Status: draft
5+
- Date: 2026-03-10
6+
- Tags: nio, filesystem, seqera, datasets, nf-tower
7+
8+
Technical Story: Enable Nextflow pipelines to read Seqera Platform datasets as ordinary file paths using `seqera://` URIs.
9+
10+
## Summary
11+
12+
Add a Java NIO `FileSystemProvider` to the `nf-tower` plugin that registers the `seqera://` scheme, allowing pipelines to reference Seqera Platform datasets (CSV/TSV) as standard file paths without manual download steps. The implementation reuses the existing `TowerClient` for all HTTP communication, inheriting authentication and retry behaviour.
13+
14+
## Problem Statement
15+
16+
Nextflow users managing datasets on the Seqera Platform must currently download dataset files manually or through custom scripts before referencing them in pipelines. There is no native integration between Nextflow's file abstraction and the Seqera Platform dataset API. This creates friction in workflows where datasets are the primary input and forces users to handle authentication, versioning, and file staging outside the pipeline definition.
17+
18+
## Goals or Decision Drivers
19+
20+
- Transparent access to Seqera Platform datasets using standard Nextflow file path syntax
21+
- Reuse of existing nf-tower plugin infrastructure (authentication, HTTP client, retry/backoff)
22+
- Hierarchical path browsing matching the platform's org/workspace/dataset structure
23+
- Extensible architecture that can support future Seqera-managed resource types (e.g. data-links)
24+
- No new plugin or module — feature lives within nf-tower
25+
26+
## Non-goals
27+
28+
- Streaming large datasets — the Platform API does not support streaming; content is fully buffered on download
29+
- Implementing resource types beyond `datasets` — only the extensible architecture is required
30+
- Local caching across pipeline runs — Nextflow's standard task staging handles caching
31+
- Dataset management operations (delete, rename) — the filesystem is read-only in the initial implementation
32+
33+
## Considered Options
34+
35+
### Option 1: Standalone plugin with dedicated HTTP client
36+
37+
A new `nf-seqera-fs` plugin with its own HTTP client configuration and authentication setup.
38+
39+
- Good, because it isolates the filesystem code from the nf-tower plugin
40+
- Bad, because it duplicates authentication configuration and HTTP client setup
41+
- Bad, because two separate HTTP clients sharing a refresh token would corrupt each other's auth state
42+
43+
### Option 2: NIO filesystem within nf-tower using TowerClient delegation
44+
45+
Add the filesystem to nf-tower, delegating all HTTP through the existing `TowerClient` singleton via a typed `SeqeraDatasetClient` wrapper.
46+
47+
- Good, because it shares authentication and token refresh with TowerClient
48+
- Good, because it reuses existing retry/backoff configuration
49+
- Good, because no new dependencies are needed
50+
51+
### Option 3: Direct HxClient usage within nf-tower
52+
53+
Add the filesystem to nf-tower but use `HxClient` directly rather than going through TowerClient.
54+
55+
- Good, because it gives full control over request construction
56+
- Bad, because exposing HxClient internals couples the filesystem to implementation details
57+
- Bad, because token refresh coordination with TowerClient becomes manual
58+
59+
## Solution or decision outcome
60+
61+
Option 2 — NIO filesystem within nf-tower using TowerClient delegation. All HTTP calls go through `TowerClient.sendApiRequest()`, ensuring a single point of authentication and retry logic.
62+
63+
## Rationale & discussion
64+
65+
### Path Hierarchy
66+
67+
The `seqera://` path encodes the Platform's organizational structure directly:
68+
69+
```
70+
seqera:// → ROOT (directory, depth 0)
71+
└── <org>/ → ORGANIZATION (directory, depth 1)
72+
└── <workspace>/ → WORKSPACE (directory, depth 2)
73+
└── datasets/ → RESOURCE TYPE (directory, depth 3)
74+
└── <name>[@<version>] → DATASET (file, depth 4)
75+
```
76+
77+
Each level is a directory except the leaf dataset, which is a file. Version pinning uses an `@version` suffix on the dataset name segment (e.g. `seqera://acme/research/datasets/samples@2`). Without it, the latest non-disabled version is resolved.
78+
79+
### Name-to-ID Resolution
80+
81+
The path uses human-readable names but the Platform API requires numeric IDs. Resolution is built from two API calls at filesystem initialization:
82+
83+
1. `GET /user-info` → obtain `userId`
84+
2. `GET /user/{userId}/workspaces` → returns all accessible org/workspace pairs
85+
86+
This single source provides both directory listing content and name→ID mapping. Results are cached in `SeqeraFileSystem` with invalidation on write operations. `GET /orgs` is intentionally not used as it returns all platform orgs, not scoped to user membership.
87+
88+
### Component Structure
89+
90+
```
91+
plugins/nf-tower/src/main/io/seqera/tower/plugin/
92+
├── fs/ ← NIO layer
93+
│ ├── SeqeraFileSystemProvider ← FileSystemProvider (scheme: "seqera")
94+
│ ├── SeqeraFileSystem ← FileSystem with org/workspace/dataset caches
95+
│ ├── SeqeraPath ← Path implementation (depth 0–4)
96+
│ ├── SeqeraFileAttributes ← BasicFileAttributes
97+
│ ├── SeqeraPathFactory ← PF4J FileSystemPathFactory extension
98+
│ └── DatasetInputStream ← SeekableByteChannel over InputStream
99+
├── dataset/ ← API client layer
100+
│ ├── SeqeraDatasetClient ← Typed HTTP client wrapping TowerClient
101+
│ ├── DatasetDto ← Dataset API response model
102+
│ ├── DatasetVersionDto ← Version API response model
103+
│ ├── OrgAndWorkspaceDto ← Org/workspace list model
104+
│ └── WorkspaceOrgDto ← Workspace/org mapping model
105+
└── resources/META-INF/services/
106+
└── java.nio.file.spi.FileSystemProvider
107+
```
108+
109+
### Key Design Decisions
110+
111+
1. **TowerClient delegation**: `SeqeraDatasetClient` delegates all HTTP through `TowerFactory.client()``TowerClient.sendApiRequest()`. This ensures shared authentication state and avoids the token refresh corruption that would occur with separate HTTP client instances.
112+
113+
2. **One filesystem per JVM**: `SeqeraFileSystemProvider` maintains a single `SeqeraFileSystem` keyed by scheme. This matches the `TowerClient` singleton-per-session pattern.
114+
115+
3. **Read-only initial scope**: The filesystem reports `isReadOnly()=true`. Write support (dataset upload via multipart POST) is deferred to a future iteration.
116+
117+
4. **Download filename constraint**: The Platform API's download endpoint (`GET /datasets/{id}/v/{version}/n/{fileName}`) requires the exact filename from upload time. The implementation always resolves `DatasetVersionDto.fileName` from `GET /datasets/{id}/versions` before constructing the download URL.
118+
119+
5. **Extensible resource types**: The path hierarchy reserves depth 3 for a resource type segment (currently only `datasets`). Adding support for data-links or other resource types requires only a new handler at the directory listing and I/O layers, with no changes to path resolution or authentication.
120+
121+
6. **Thread safety**: `SeqeraFileSystem` cache methods and `SeqeraFileSystemProvider` lifecycle methods are `synchronized`. The filesystem map uses `LinkedHashMap` with external synchronization rather than `ConcurrentHashMap`, matching the low-contention access pattern.
122+
123+
### Limitations
124+
125+
- **No size metadata**: `SeqeraFileAttributes.size()` returns 0 for all paths because the Platform API does not expose content length in dataset metadata.
126+
- **Single endpoint per JVM**: The filesystem key is scheme-only; concurrent access to different Platform endpoints in the same JVM is not supported.
127+
128+
### Streaming Downloads
129+
130+
Dataset downloads use `TowerClient.sendStreamingRequest()` which calls `HxClient.sendAsStream()` — the response body is returned as an `InputStream` streamed directly from the HTTP connection. This avoids the triple-buffering problem (`String``getBytes()``ByteArrayInputStream`) that would otherwise consume ~40 MB heap per 10 MB dataset. The `HxClient.sendAsStream()` method goes through the same `sendWithRetry()` path as `sendAsString()`, so retry logic and token refresh are preserved.
131+
132+
## Links
133+
134+
- [Spec](../specs/260310-seqera-dataset-fs/spec.md)
135+
- [Implementation plan](../specs/260310-seqera-dataset-fs/plan.md)
136+
- [Data model](../specs/260310-seqera-dataset-fs/data-model.md)

changelog.txt

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,37 @@
11
NEXTFLOW CHANGE-LOG
22
===================
3+
26.03.3-edge - 20 Apr 2026
4+
- Add -files-from option to lint command to avoid ARG_MAX limit (#6858) [5a3cd830c]
5+
- Add 26.04 migration docs (#7000) [89ec31bbf]
6+
- Add option to disable printing workflow outputs (#7018) [791bb449c]
7+
- Allow cloning from local Git repositories when `--offline` (#7035) [0fa6b5dbd]
8+
- Allow running pipeline from URL and main script path (#6602) [83196d4be]
9+
- Apply socket timeout to S3 CRT connections (#7024) [6f4a21764]
10+
- Filter autoLabels to selected workflow-metadata fields (#7049) [ddc974fe6]
11+
- Fix S3FileSystemProvider.newInputStream() draining full object on close (#7046) [cf3867604]
12+
- Fix formatting issues with complex expressions (#7027) [ce661d1d8]
13+
- Fix generated process name in `module create` command (#7008) [f3d8de796]
14+
- Fix inconsistent indentation in nf-amazon (#7047) [df6855d7d]
15+
- Fix module info formatting separator (#7033) [44dff8fcc]
16+
- Fix nextflowVersion for nf-tower and nf-seqera plugins [cbc0a2d8e]
17+
- Fix resolution of `-with-tower` with `TOWER_API_ENDPOINT` (#7045) [ce962e882]
18+
- Fix saveCacheFiles early return skipping log file uploads (#7015) [6fb704838]
19+
- Fusion GPU metrics collection (#7022) [6289635b8]
20+
- Honour process.resourceLabels in nf-seqera executor (#7048) [979f684ff]
21+
- Manage AWS SDK exceptions to convert to the appropriate IO exceptions (#6707) [39c755663]
22+
- Rename `module info` subcommand to `module view` (#7052) [7fa1109aa]
23+
- Resolve structured process input types (#7014) [583935d88]
24+
- Simplify demo module README template (#7051) [6d04c9ebc]
25+
- Suppress lint progress logging with `-q` flag (#6880) [61793bb6e]
26+
- Update missing pf4j updates (#7016) [f38f0067d]
27+
- Use Fusion trace metrics to replace bash command-trace wrapper (#7041) [de4376649]
28+
- Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 (#7042) [59d847d52]
29+
- Bump nf-amazon@3.8.3
30+
- Bump nf-k8s@1.5.2
31+
- Bump nf-seqera@0.18.0
32+
- Bump nf-tower@1.25.0
33+
- Bump nf-wave@1.19.1
34+
335
26.03.2-edge - 7 Apr 2026
436
- Add `module create` subcommand (#6992) [d6639a5e0]
537
- Add `module spec` command (#6859) [049e2a40e]

docs/cli.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -323,19 +323,19 @@ See {ref}`cli-module-list` for more information.
323323

324324
### Viewing module information
325325

326-
The `module info` command displays detailed metadata and usage information for a specific module from the registry.
326+
The `module view` command displays detailed metadata and usage information for a specific module from the registry.
327327

328328
Use this to understand module requirements, view input/output specifications, see available tools, or generate usage templates before installing or running a module.
329329

330330
```console
331-
$ nextflow module info nf-core/fastqc
332-
$ nextflow module info nf-core/fastqc -version 1.0.0
333-
$ nextflow module info nf-core/fastqc -output json
331+
$ nextflow module view nf-core/fastqc
332+
$ nextflow module view nf-core/fastqc -version 1.0.0
333+
$ nextflow module view nf-core/fastqc -output json
334334
```
335335

336336
The output includes the module's version, description, authors, keywords, tools, input/output channels, and a generated usage template showing how to run the module. Use `-json` for machine-readable output suitable for programmatic access.
337337

338-
See {ref}`cli-module-info` for more information.
338+
See {ref}`cli-module-view` for more information.
339339

340340
### Running modules directly
341341

@@ -358,7 +358,7 @@ $ nextflow module run nf-core/salmon \
358358
-resume
359359
```
360360

361-
Process inputs can be specified like params on the command line. For example, `--reads reads.fq` corresponds to the `reads` input in the `nf-core/salmon` module. Run `nextflow module info nf-core/salmon` to see the available params for the module.
361+
Process inputs can be specified like params on the command line. For example, `--reads reads.fq` corresponds to the `reads` input in the `nf-core/salmon` module. Run `nextflow module view nf-core/salmon` to see the available params for the module.
362362

363363
See {ref}`cli-module-run` for more information.
364364

docs/migrations/26-04.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@ nextflow module run nf-core/fastqc --meta.id 1 --reads sample1.fastq
2323
# Search for modules in registry
2424
nextflow module search bwa
2525

26-
# Get info about a module
27-
nextflow module info nf-core/bwa/mem
26+
# View info about a module
27+
nextflow module view nf-core/bwa/mem
2828

2929
# Install a module
3030
nextflow module install nf-core/bwa/mem

docs/modules/using-modules.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -71,17 +71,17 @@ See {ref}`cli-module-list` for the full command reference.
7171

7272
## Viewing module information
7373

74-
Use the `module info` command to view metadata and a usage template for a module:
74+
Use the `module view` command to view metadata and a usage template for a module:
7575

7676
```console
77-
$ nextflow module info nf-core/fastqc
78-
$ nextflow module info nf-core/fastqc -version 0.0.0-0c7146d
77+
$ nextflow module view nf-core/fastqc
78+
$ nextflow module view nf-core/fastqc -version 0.0.0-0c7146d
7979
```
8080

8181
The output includes the module's version, URL, description, authors, maintainers, keywords, tools, input/output channels, and a generated usage template.
8282
Use `-output json` for machine-readable output.
8383

84-
See {ref}`cli-module-info` for the full command reference.
84+
See {ref}`cli-module-view` for the full command reference.
8585

8686
## Running modules directly
8787

@@ -92,7 +92,7 @@ $ nextflow module run nf-core/fastqc --meta.id=test_sample --reads sample1_R1.fa
9292
```
9393

9494
:::{tip}
95-
Run `nextflow module info` to see the available inputs for a module.
95+
Run `nextflow module view` to see the available inputs for a module.
9696
:::
9797

9898
The command automatically downloads the module if it is not already installed.

0 commit comments

Comments
 (0)