nextflow-io
diff --git a/‎VERSION‎
Lines changed: 1 addition & 1 deletion b/‎VERSION‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎adr/20260310-seqera-dataset-filesystem.md‎
Lines changed: 136 additions & 0 deletions b/‎adr/20260310-seqera-dataset-filesystem.md‎
Lines changed: 136 additions & 0 deletions
diff --git a/‎adr/20260323-hints-process-directive.md‎
Lines changed: 180 additions & 0 deletions b/‎adr/20260323-hints-process-directive.md‎
Lines changed: 180 additions & 0 deletions
diff --git a/‎changelog.txt‎
Lines changed: 56 additions & 0 deletions b/‎changelog.txt‎
Lines changed: 56 additions & 0 deletions
@@ -1 +1 @@
-26.03.2-edge
+26.03.4-edge
@@ -0,0 +1,136 @@
+# NIO Filesystem for Seqera Platform Datasets
+
+- Authors: Jorge Ejarque
+- Status: draft
+- Date: 2026-03-10
+- Tags: nio, filesystem, seqera, datasets, nf-tower
+
+Technical Story: Enable Nextflow pipelines to read Seqera Platform datasets as ordinary file paths using `seqera://` URIs.
+
+## Summary
+
+Add a Java NIO `FileSystemProvider` to the `nf-tower` plugin that registers the `seqera://` scheme, allowing pipelines to reference Seqera Platform datasets (CSV/TSV) as standard file paths without manual download steps. The implementation reuses the existing `TowerClient` for all HTTP communication, inheriting authentication and retry behaviour.
+
+## Problem Statement
+
+Nextflow users managing datasets on the Seqera Platform must currently download dataset files manually or through custom scripts before referencing them in pipelines. There is no native integration between Nextflow's file abstraction and the Seqera Platform dataset API. This creates friction in workflows where datasets are the primary input and forces users to handle authentication, versioning, and file staging outside the pipeline definition.
+
+## Goals or Decision Drivers
+
+- Transparent access to Seqera Platform datasets using standard Nextflow file path syntax
+- Reuse of existing nf-tower plugin infrastructure (authentication, HTTP client, retry/backoff)
+- Hierarchical path browsing matching the platform's org/workspace/dataset structure
+- Extensible architecture that can support future Seqera-managed resource types (e.g. data-links)
+- No new plugin or module — feature lives within nf-tower
+
+## Non-goals
+
+- Streaming large datasets — the Platform API does not support streaming; content is fully buffered on download
+- Implementing resource types beyond `datasets` — only the extensible architecture is required
+- Local caching across pipeline runs — Nextflow's standard task staging handles caching
+- Dataset management operations (delete, rename) — the filesystem is read-only in the initial implementation
+
+## Considered Options
+
+### Option 1: Standalone plugin with dedicated HTTP client
+
+A new `nf-seqera-fs` plugin with its own HTTP client configuration and authentication setup.
+
+- Good, because it isolates the filesystem code from the nf-tower plugin
+- Bad, because it duplicates authentication configuration and HTTP client setup
+- Bad, because two separate HTTP clients sharing a refresh token would corrupt each other's auth state
+
+### Option 2: NIO filesystem within nf-tower using TowerClient delegation
+
+Add the filesystem to nf-tower, delegating all HTTP through the existing `TowerClient` singleton via a typed `SeqeraDatasetClient` wrapper.
+
+- Good, because it shares authentication and token refresh with TowerClient
+- Good, because it reuses existing retry/backoff configuration
+- Good, because no new dependencies are needed
+
+### Option 3: Direct HxClient usage within nf-tower
+
+Add the filesystem to nf-tower but use `HxClient` directly rather than going through TowerClient.
+
+- Good, because it gives full control over request construction
+- Bad, because exposing HxClient internals couples the filesystem to implementation details
+- Bad, because token refresh coordination with TowerClient becomes manual
+
+## Solution or decision outcome
+
+Option 2 — NIO filesystem within nf-tower using TowerClient delegation. All HTTP calls go through `TowerClient.sendApiRequest()`, ensuring a single point of authentication and retry logic.
+
+## Rationale & discussion
+
+### Path Hierarchy
+
+The `seqera://` path encodes the Platform's organizational structure directly:
+
+```
+seqera://                                        → ROOT (directory, depth 0)
+  └── <org>/                                     → ORGANIZATION (directory, depth 1)
+        └── <workspace>/                         → WORKSPACE (directory, depth 2)
+              └── datasets/                      → RESOURCE TYPE (directory, depth 3)
+                    └── <name>[@<version>]        → DATASET (file, depth 4)
+```
+
+Each level is a directory except the leaf dataset, which is a file. Version pinning uses an `@version` suffix on the dataset name segment (e.g. `seqera://acme/research/datasets/samples@2`). Without it, the latest non-disabled version is resolved.
+
+### Name-to-ID Resolution
+
+The path uses human-readable names but the Platform API requires numeric IDs. Resolution is built from two API calls at filesystem initialization:
+
+1. `GET /user-info` → obtain `userId`
+2. `GET /user/{userId}/workspaces` → returns all accessible org/workspace pairs
+
+This single source provides both directory listing content and name→ID mapping. Results are cached in `SeqeraFileSystem` with invalidation on write operations. `GET /orgs` is intentionally not used as it returns all platform orgs, not scoped to user membership.
+
+### Component Structure
+
+```
+plugins/nf-tower/src/main/io/seqera/tower/plugin/
+├── fs/                             ← NIO layer
+│   ├── SeqeraFileSystemProvider    ← FileSystemProvider (scheme: "seqera")
+│   ├── SeqeraFileSystem            ← FileSystem with org/workspace/dataset caches
+│   ├── SeqeraPath                  ← Path implementation (depth 0–4)
+│   ├── SeqeraFileAttributes        ← BasicFileAttributes
+│   ├── SeqeraPathFactory           ← PF4J FileSystemPathFactory extension
+│   └── DatasetInputStream          ← SeekableByteChannel over InputStream
+├── dataset/                        ← API client layer
+│   ├── SeqeraDatasetClient         ← Typed HTTP client wrapping TowerClient
+│   ├── DatasetDto                  ← Dataset API response model
+│   ├── DatasetVersionDto           ← Version API response model
+│   ├── OrgAndWorkspaceDto          ← Org/workspace list model
+│   └── WorkspaceOrgDto             ← Workspace/org mapping model
+└── resources/META-INF/services/
+    └── java.nio.file.spi.FileSystemProvider
+```
+
+### Key Design Decisions
+
+1. **TowerClient delegation**: `SeqeraDatasetClient` delegates all HTTP through `TowerFactory.client()` → `TowerClient.sendApiRequest()`. This ensures shared authentication state and avoids the token refresh corruption that would occur with separate HTTP client instances.
+
+2. **One filesystem per JVM**: `SeqeraFileSystemProvider` maintains a single `SeqeraFileSystem` keyed by scheme. This matches the `TowerClient` singleton-per-session pattern.
+
+3. **Read-only initial scope**: The filesystem reports `isReadOnly()=true`. Write support (dataset upload via multipart POST) is deferred to a future iteration.
+
+4. **Download filename constraint**: The Platform API's download endpoint (`GET /datasets/{id}/v/{version}/n/{fileName}`) requires the exact filename from upload time. The implementation always resolves `DatasetVersionDto.fileName` from `GET /datasets/{id}/versions` before constructing the download URL.
+
+5. **Extensible resource types**: The path hierarchy reserves depth 3 for a resource type segment (currently only `datasets`). Adding support for data-links or other resource types requires only a new handler at the directory listing and I/O layers, with no changes to path resolution or authentication.
+
+6. **Thread safety**: `SeqeraFileSystem` cache methods and `SeqeraFileSystemProvider` lifecycle methods are `synchronized`. The filesystem map uses `LinkedHashMap` with external synchronization rather than `ConcurrentHashMap`, matching the low-contention access pattern.
+
+### Limitations
+
+- **No size metadata**: `SeqeraFileAttributes.size()` returns 0 for all paths because the Platform API does not expose content length in dataset metadata.
+- **Single endpoint per JVM**: The filesystem key is scheme-only; concurrent access to different Platform endpoints in the same JVM is not supported.
+
+### Streaming Downloads
+
+Dataset downloads use `TowerClient.sendStreamingRequest()` which calls `HxClient.sendAsStream()` — the response body is returned as an `InputStream` streamed directly from the HTTP connection. This avoids the triple-buffering problem (`String` → `getBytes()` → `ByteArrayInputStream`) that would otherwise consume ~40 MB heap per 10 MB dataset. The `HxClient.sendAsStream()` method goes through the same `sendWithRetry()` path as `sendAsString()`, so retry logic and token refresh are preserved.
+
+## Links
+
+- [Spec](../specs/260310-seqera-dataset-fs/spec.md)
+- [Implementation plan](../specs/260310-seqera-dataset-fs/plan.md)
+- [Data model](../specs/260310-seqera-dataset-fs/data-model.md)
@@ -0,0 +1,180 @@
+# `hints` process directive for executor-specific scheduling hints
+
+- Authors: Rob Syme
+- Status: accepted
+- Deciders: Paolo Di Tommaso, Ben Sherman, Rob Syme
+- Date: 2026-03-23
+- Tags: directive, executor, scheduling
+
+## Summary
+
+Introduce a `hints` process directive for executor-specific scheduling hints that don't map to existing directives.
+
+## Problem Statement
+
+Many executors can be configured in various ways on a per-task basis. For example:
+
+- AWS Batch jobs can use *consumable resources* to limit concurrent job execution based on non-standard resources such as software license seats.
+
+- Google Batch jobs can specify a *provisioning model* to control the use of spot vs on-demand VMs on a per-task basis.
+
+- Seqera Scheduler supports a variety of resource and scheduling settings, including spot/on-demand provisioning.
+
+These settings can be exposed by Nextflow as executor-specific config options, such as `google.batch.spot`, but config options are applied globally. In order to apply a setting to specific processes or tasks, it must be exposed as a process directive.
+
+Process directives in Nextflow aim to provide a common vocabulary for executing tasks in many different environments. Directives such as `cpus`, `memory`, and `time` have broadly the same meaning across most executors, making it easier for users to write portable pipelines.
+
+At the same time, many executors have custom settings not shared by other executors, and it is not practical to create a new process directive for every new setting. There are over 40 [process directives](https://docs.seqera.io/nextflow/reference/process#directives) at the time of writing, and every new directive adds cognitive load when a user is trying to find the right directive for a given situation.
+
+There exist a few generic process directives already:
+
+- The `clusterOptions` directive can be used to specify command-line arguments, primarily for HPC schedulers
+- The `ext` directive supports arbitrary key-values, but is designed primarily to customize the task script (e.g. tool arguments), not executor behavior
+- The `resourceLabels` directive also supports arbitrary key-values, but is intended for tagging and tracking resources, not controlling them
+
+A new directive is needed to support executor-specific settings at a per-task level in a structured manner, without bloating the process directives for every new custom setting.
+
+## Goals
+
+- Provide a way to apply executor-specific settings to individual processes or tasks
+
+- Avoid the proliferation of narrow, executor-specific directives (e.g. `consumableResources`, `schedulingPolicy`, etc.)
+
+- Provide a single extension point that executors can consume selectively
+
+- Allow settings to be specified as key-values, providing validation where possible
+
+## Non-goals
+
+- Replacing existing directives (`cpus`, `memory`, `accelerator`, `queue`) — those remain the right place for standard resources
+
+## Decision
+
+Introduce a `hints` process directive with namespaced keys. Executors consume the hints they understand and silently ignore the rest.
+
+## Core Capabilities
+
+### Syntax
+
+The `hints` directive accepts a map of key-value pairs:
+
+```groovy
+// process definition
+process runDragen {
+    cpus 4
+    memory '16 GB'
+    hints consumableResources: ['my-dragen-license': 1, 'other-license': 2]
+
+    script:
+    """
+    dragen --ref-dir /ref ...
+    """
+}
+```
+
+```groovy
+// process config
+process {
+    withName: 'runDragen' {
+        hints = [
+            consumableResources: ['my-dragen-license': 1, 'other-license': 2]
+        ]
+    }
+}
+```
+
+Keys are strings. Values may be any raw data type: strings, numbers, booleans, lists, or maps. Executors are responsible for defining which hints they recognize and what value type each hint expects.
+
+In the above example, the `consumableResources` hint is given as a map of resource name to quantity. The AWS Batch executor supplies it to each job request using `ConsumableResourceProperties`.
+
+### Namespacing
+
+Keys can use dot-separated scopes to namespace settings as needed:
+
+```groovy
+hints consumableResources: ['my-dragen-license': 1]
+hints 'scheduling.priority': 10
+hints 'scheduling.provisioningModel': 'spot'
+```
+
+Keys can be routed to specific executors by prefixing with the executor name and a slash (`/`):
+
+```groovy
+hints 'awsbatch/consumableResources': ['my-dragen-license': 1]
+hints 'seqera/scheduling.provisioningModel': 'spot'
+hints 'k8s/nodeSelector': 'gpu=true'
+```
+
+The executor prefix gives pipeline developers the ability to target specific executors and have assurance that it won't accidentally apply to other executors (e.g. if another executor adds support for the same hint in the future).
+
+### Validation
+
+Nextflow should validate hints to the best of its ability, to catch errors such as typos:
+
+- **Prefixed hints** can be validated against the set of hints declared by the corresponding executor. Unrecognized hints should be reported as errors.
+
+- **Unprefixed hints** can be validated against the union of hints declared by all executors. Since unprefixed hints might be supported by executors that aren't currently loaded, unrecognized hints should be reported as warnings.
+
+### Multiple hint resolution
+
+The `hints` directive uses *replacement semantics* when specified multiple times, meaning that each `hints` setting completely replaces any previous settings:
+
+```groovy
+process {
+    // generic hint
+    hints = [provisioningModel: 'spot']
+
+    // specific hint replaces generic hint
+    withLabel: 'dragen' {
+        hints = [consumableResources: ['my-dragen-license': 1]]
+    }
+}
+```
+
+Within a process definition, the `hints` directive uses *accumulation semantics*, meaning that subsequent `hints` directives are accumulated:
+
+```groovy
+process runDragen {
+    // multiple separate hints
+    hints provisioningModel: 'spot'
+    hints consumableResources: ['my-dragen-license': 1, 'other-license': 2]
+
+    // equivalent to...
+    hints (
+        provisioningModel: 'spot',
+        consumableResources: ['my-dragen-license': 1, 'other-license': 2]
+    )
+
+    // ...
+}
+```
+
+This behavior is consistent with other directives such as `pod` and `resourceLabels`. In practice, this means that a given `hints` setting should specify all relevant hints for the given context.
+
+For example, the `withLabel` selector above should also specify the `provisioningModel` hint if the intention is to preserve that hint for the selected processes:
+
+```groovy
+process {
+    hints = [provisioningModel: 'spot']
+
+    withLabel: 'dragen' {
+        hints = [provisioningModel: 'spot', consumableResources: ['my-dragen-license': 1]]
+    }
+}
+```
+
+While this approach may lead to duplication, it gives users and developers more control over which hints are applied in a given context.
+
+### Initial hint catalog
+
+The following hints should be supported initially:
+
+| Hint name | Value type | Executors | Use case |
+|--|--|--|--|
+| `consumableResources` | `Map<String, Integer>` | AWS Batch | License-aware scheduling ([#5917](https://github.com/nextflow-io/nextflow/issues/5917)) |
+| `scheduling.priority` | `Integer` | AWS Batch | Job scheduling priority ([#6998](https://github.com/nextflow-io/nextflow/issues/6998)) |
+| `scheduling.provisioningModel` | `String` | Google Batch | Spot VM scheduling ([#3530](https://github.com/nextflow-io/nextflow/issues/3530)) |
+
+## Links
+
+- [Community issue](https://github.com/nextflow-io/nextflow/issues/5917)
@@ -1,5 +1,61 @@
 NEXTFLOW CHANGE-LOG
 ===================
+26.03.4-edge - 25 Apr 2026
+- Abort execution when platform telemetry error (#6827) [b1ad3f720]
+- Add $schema ref to generated module spec (#7056) [c40d742f3]
+- Add Apple container engine support (#7073) [2f7a3c455]
+- Add hints process directive for executor-specific scheduling hints (#7034) [406358e03]
+- Add Seqera NIO filesystem for datasets and refactor TowerClient/TowerObserver (#6946) [433b10a1f]
+- Add workspaceId/computeEnvId to nf-seqera auto labels (#7059) [5e8276c00]
+- Allow `-with-docker` to be used without a default container image (#7054) [41759d36e]
+- Allow module run to run modules with local path (#7057) [e2c77c6b7]
+- Default NXF_FUSION_TRACE to false (#7071) [5b4c8f0c1]
+- Fix IllegalArgumentException when process.resourceLabels is a closure (#7068) [944977e3f]
+- Fix resolution of params in resolved config text (#7072) [cb7133def]
+- Propagate task.containerPlatform through Fusion container command (#7074) [b58a590bd]
+- Remove arch config option from Seqera MachineRequirement (#7063) [da06e9a9d]
+- Replace current cloud info URL call with cloudInfo client (#7065) [629184251]
+- Restructure modules docs as a section and add registry steps (#7030) [29370f4bc]
+- Update workflow outputs tutorial (#7060) [68d144b9c]
+- Use toUriString for paths in work-dir and FilesEx error messages (#7075) [b535377cc]
+- Bump nf-amazon@3.9.0
+- Bump nf-google@1.27.2
+- Bump nf-seqera@0.19.0
+- Bump nf-tower@1.26.0
+- Bump nf-wave@1.20.0
+
+26.03.3-edge - 20 Apr 2026
+- Add -files-from option to lint command to avoid ARG_MAX limit (#6858) [5a3cd830c]
+- Add 26.04 migration docs (#7000) [89ec31bbf]
+- Add option to disable printing workflow outputs (#7018) [791bb449c]
+- Allow cloning from local Git repositories when `--offline` (#7035) [0fa6b5dbd]
+- Allow running pipeline from URL and main script path (#6602) [83196d4be]
+- Apply socket timeout to S3 CRT connections (#7024) [6f4a21764]
+- Filter autoLabels to selected workflow-metadata fields (#7049) [ddc974fe6]
+- Fix S3FileSystemProvider.newInputStream() draining full object on close (#7046) [cf3867604]
+- Fix formatting issues with complex expressions (#7027) [ce661d1d8]
+- Fix generated process name in `module create` command (#7008) [f3d8de796]
+- Fix inconsistent indentation in nf-amazon (#7047) [df6855d7d]
+- Fix module info formatting separator (#7033) [44dff8fcc]
+- Fix nextflowVersion for nf-tower and nf-seqera plugins [cbc0a2d8e]
+- Fix resolution of `-with-tower` with `TOWER_API_ENDPOINT` (#7045) [ce962e882]
+- Fix saveCacheFiles early return skipping log file uploads (#7015) [6fb704838]
+- Fusion GPU metrics collection (#7022) [6289635b8]
+- Honour process.resourceLabels in nf-seqera executor (#7048) [979f684ff]
+- Manage AWS SDK exceptions to convert to the appropriate IO exceptions (#6707) [39c755663]
+- Rename `module info` subcommand to `module view` (#7052) [7fa1109aa]
+- Resolve structured process input types (#7014) [583935d88]
+- Simplify demo module README template (#7051) [6d04c9ebc]
+- Suppress lint progress logging with `-q` flag (#6880) [61793bb6e]
+- Update missing pf4j updates (#7016) [f38f0067d]
+- Use Fusion trace metrics to replace bash command-trace wrapper (#7041) [de4376649]
+- Bump org.bouncycastle:bcpkix-jdk18on from 1.79 to 1.84 (#7042) [59d847d52]
+- Bump nf-amazon@3.8.3
+- Bump nf-k8s@1.5.2
+- Bump nf-seqera@0.18.0
+- Bump nf-tower@1.25.0
+- Bump nf-wave@1.19.1
+
 26.03.2-edge - 7 Apr 2026
 - Add `module create` subcommand (#6992) [d6639a5e0]
 - Add `module spec` command (#6859) [049e2a40e]