Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
180 changes: 180 additions & 0 deletions adr/20260323-hints-process-directive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# `hints` process directive for executor-specific scheduling hints

- Authors: Rob Syme
- Status: accepted
- Deciders: Paolo Di Tommaso, Ben Sherman, Rob Syme
- Date: 2026-03-23
- Tags: directive, executor, scheduling

## Summary

Introduce a `hints` process directive for executor-specific scheduling hints that don't map to existing directives.

## Problem Statement

Many executors can be configured in various ways on a per-task basis. For example:

- AWS Batch jobs can use *consumable resources* to limit concurrent job execution based on non-standard resources such as software license seats.

- Google Batch jobs can specify a *provisioning model* to control the use of spot vs on-demand VMs on a per-task basis.

- Seqera Scheduler supports a variety of resource and scheduling settings, including spot/on-demand provisioning.

These settings can be exposed by Nextflow as executor-specific config options, such as `google.batch.spot`, but config options are applied globally. In order to apply a setting to specific processes or tasks, it must be exposed as a process directive.

Process directives in Nextflow aim to provide a common vocabulary for executing tasks in many different environments. Directives such as `cpus`, `memory`, and `time` have broadly the same meaning across most executors, making it easier for users to write portable pipelines.

At the same time, many executors have custom settings not shared by other executors, and it is not practical to create a new process directive for every new setting. There are over 40 [process directives](https://docs.seqera.io/nextflow/reference/process#directives) at the time of writing, and every new directive adds cognitive load when a user is trying to find the right directive for a given situation.

There exist a few generic process directives already:

- The `clusterOptions` directive can be used to specify command-line arguments, primarily for HPC schedulers
- The `ext` directive supports arbitrary key-values, but is designed primarily to customize the task script (e.g. tool arguments), not executor behavior
- The `resourceLabels` directive also supports arbitrary key-values, but is intended for tagging and tracking resources, not controlling them

A new directive is needed to support executor-specific settings at a per-task level in a structured manner, without bloating the process directives for every new custom setting.

## Goals

- Provide a way to apply executor-specific settings to individual processes or tasks

- Avoid the proliferation of narrow, executor-specific directives (e.g. `consumableResources`, `schedulingPolicy`, etc.)

- Provide a single extension point that executors can consume selectively

- Allow settings to be specified as key-values, providing validation where possible

## Non-goals

- Replacing existing directives (`cpus`, `memory`, `accelerator`, `queue`) — those remain the right place for standard resources

## Decision

Introduce a `hints` process directive with namespaced keys. Executors consume the hints they understand and silently ignore the rest.

## Core Capabilities

### Syntax

The `hints` directive accepts a map of key-value pairs:

```groovy
// process definition
process runDragen {
cpus 4
memory '16 GB'
hints consumableResources: ['my-dragen-license': 1, 'other-license': 2]

script:
"""
dragen --ref-dir /ref ...
"""
}
```

```groovy
// process config
process {
withName: 'runDragen' {
hints = [
consumableResources: ['my-dragen-license': 1, 'other-license': 2]
]
}
}
```

Keys are strings. Values may be any raw data type: strings, numbers, booleans, lists, or maps. Executors are responsible for defining which hints they recognize and what value type each hint expects.

In the above example, the `consumableResources` hint is given as a map of resource name to quantity. The AWS Batch executor supplies it to each job request using `ConsumableResourceProperties`.

### Namespacing

Keys can use dot-separated scopes to namespace settings as needed:

```groovy
hints consumableResources: ['my-dragen-license': 1]
hints 'scheduling.priority': 10
hints 'scheduling.provisioningModel': 'spot'
```

Keys can be routed to specific executors by prefixing with the executor name and a slash (`/`):

```groovy
hints 'awsbatch/consumableResources': ['my-dragen-license': 1]
hints 'seqera/scheduling.provisioningModel': 'spot'
hints 'k8s/nodeSelector': 'gpu=true'
```

The executor prefix gives pipeline developers the ability to target specific executors and have assurance that it won't accidentally apply to other executors (e.g. if another executor adds support for the same hint in the future).

### Validation

Nextflow should validate hints to the best of its ability, to catch errors such as typos:

- **Prefixed hints** can be validated against the set of hints declared by the corresponding executor. Unrecognized hints should be reported as errors.

- **Unprefixed hints** can be validated against the union of hints declared by all executors. Since unprefixed hints might be supported by executors that aren't currently loaded, unrecognized hints should be reported as warnings.

### Multiple hint resolution

The `hints` directive uses *replacement semantics* when specified multiple times, meaning that each `hints` setting completely replaces any previous settings:

```groovy
process {
// generic hint
hints = [provisioningModel: 'spot']

// specific hint replaces generic hint
withLabel: 'dragen' {
hints = [consumableResources: ['my-dragen-license': 1]]
}
}
```

Within a process definition, the `hints` directive uses *accumulation semantics*, meaning that subsequent `hints` directives are accumulated:

```groovy
process runDragen {
// multiple separate hints
hints provisioningModel: 'spot'
hints consumableResources: ['my-dragen-license': 1, 'other-license': 2]

// equivalent to...
hints (
provisioningModel: 'spot',
consumableResources: ['my-dragen-license': 1, 'other-license': 2]
)

// ...
}
```

This behavior is consistent with other directives such as `pod` and `resourceLabels`. In practice, this means that a given `hints` setting should specify all relevant hints for the given context.

For example, the `withLabel` selector above should also specify the `provisioningModel` hint if the intention is to preserve that hint for the selected processes:

```groovy
process {
hints = [provisioningModel: 'spot']

withLabel: 'dragen' {
hints = [provisioningModel: 'spot', consumableResources: ['my-dragen-license': 1]]
}
}
```

While this approach may lead to duplication, it gives users and developers more control over which hints are applied in a given context.

### Initial hint catalog

The following hints should be supported initially:

| Hint name | Value type | Executors | Use case |
|--|--|--|--|
| `consumableResources` | `Map<String, Integer>` | AWS Batch | License-aware scheduling ([#5917](https://github.com/nextflow-io/nextflow/issues/5917)) |
| `scheduling.priority` | `Integer` | AWS Batch | Job scheduling priority ([#6998](https://github.com/nextflow-io/nextflow/issues/6998)) |
| `scheduling.provisioningModel` | `String` | Google Batch | Spot VM scheduling ([#3530](https://github.com/nextflow-io/nextflow/issues/3530)) |

## Links

- [Community issue](https://github.com/nextflow-io/nextflow/issues/5917)
39 changes: 39 additions & 0 deletions docs/executor.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,14 @@ Resource requests and other job characteristics can be controlled via the follow
- {ref}`process-resourcelabels`
- {ref}`process-time`

The following {ref}`hints <process-hints>` are supported:

- `consumableResources`: Specify [AWS Batch consumable resources](https://docs.aws.amazon.com/batch/latest/userguide/resource-aware-scheduling.html) as a list of name-value pairs. For example:

```nextflow
hints consumableResources: ['my-license-a': 1, 'my-license-b': 2]
```

See {ref}`aws-batch` for more information.

(azurebatch-executor)=
Expand Down Expand Up @@ -441,6 +449,37 @@ Resource requests and other job characteristics can be controlled via the follow
- {ref}`process-memory`
- {ref}`process-time`

The following {ref}`hints <process-hints>` are supported:

- `machineRequirement.capacityMode`
- `machineRequirement.diskAllocation`
- `machineRequirement.diskEncrypted`
- `machineRequirement.diskIops`
- `machineRequirement.diskMountPath`
- `machineRequirement.diskSize`
- `machineRequirement.diskThroughputMiBps`
- `machineRequirement.diskType`
- `machineRequirement.machineTypes`
- `machineRequirement.maxSpotAttempts`
- `machineRequirement.provisioning`

Each hint overrides the corresponding field of the `seqera.executor.machineRequirement` config scope on a per-process basis. Keys may be used as-is or with the `seqera/` prefix to restrict them to this executor.

For example, to override the provisioning mode for a single process:

```nextflow
process hello {
hints 'seqera/machineRequirement.provisioning': 'spotFirst'

script:
"""
your_command --here
"""
}
```

See {ref}`config-seqera` for the full config reference.

### Disk support

When the {ref}`process-disk` directive is specified, the Seqera executor provisions storage for the task container. There are two disk allocation strategies:
Expand Down
48 changes: 24 additions & 24 deletions docs/reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -1403,6 +1403,12 @@ The `seqera.executor` scope configures the Seqera scheduler service for the {ref

The following settings are available:

`seqera.executor.autoLabels`
: When `true`, automatically adds workflow metadata labels to the session with the `nextflow.io/` prefix (default: `false`). The following labels are added: `projectName`, `userName`, `runName`, `sessionId`, `resume`, `revision`, `commitId`, `repository`, `manifestName`, `runtimeVersion`. A `seqera.io/runId` label is also added, computed as a SipHash of the session ID and run name.

`seqera.executor.computeEnvId`
: The Seqera Platform compute environment ID. When specified, the scheduler resolves the compute environment directly by this ID instead of inferring a suitable compute environment. Used as a fallback when the workflow launch does not include a compute environment reference.

`seqera.executor.endpoint`
: The Seqera scheduler service endpoint URL (required).

Expand All @@ -1412,41 +1418,35 @@ The following settings are available:
`seqera.executor.region`
: The cloud region for task execution.

`seqera.executor.computeEnvId`
: The Seqera Platform compute environment ID. When specified, the scheduler resolves the compute environment directly by this ID instead of inferring a suitable compute environment. Used as a fallback when the workflow launch does not include a compute environment reference.

`seqera.executor.autoLabels`
: When `true`, automatically adds workflow metadata labels to the session with the `nextflow.io/` prefix (default: `false`). The following labels are added: `projectName`, `userName`, `runName`, `sessionId`, `resume`, `revision`, `commitId`, `repository`, `manifestName`, `runtimeVersion`. A `seqera.io/runId` label is also added, computed as a SipHash of the session ID and run name.

`seqera.executor.machineRequirement.provisioning`
: The instance provisioning mode. Can be `'spot'`, `'ondemand'`, or `'spotFirst'`.

`seqera.executor.machineRequirement.maxSpotAttempts`
: The maximum number of spot retry attempts before falling back to on-demand. Only used when `provisioning` is `'spot'` or `'spotFirst'`.

`seqera.executor.machineRequirement.machineFamilies`
: List of acceptable EC2 instance families, e.g. `['m5', 'c5', 'r5']`.
`seqera.executor.taskEnvironment`
: Custom environment variables to apply to all tasks submitted by the Seqera executor. These are merged with the Fusion environment variables, with Fusion variables taking precedence. For example: `taskEnvironment = [MY_VAR: 'value']`.

`seqera.executor.machineRequirement.diskAllocation`
: The disk allocation strategy. Can be `'task'` (default) for per-task EBS volumes, or `'node'` for per-node instance storage. When using `'node'` allocation, EBS-specific options (`diskType`, `diskIops`, `diskThroughputMiBps`, `diskEncrypted`) are not applicable.

`seqera.executor.machineRequirement.diskType`
: The EBS volume type for task scratch disk. Supported types: `'ebs/gp3'` (default), `'ebs/gp2'`, `'ebs/io1'`, `'ebs/io2'`, `'ebs/st1'`, `'ebs/sc1'`. Only applicable when `diskAllocation` is `'task'`.

`seqera.executor.machineRequirement.diskThroughputMiBps`
: The throughput in MiB/s for gp3 volumes (125-1000). Default: `325` (Fusion recommended). Only applicable when `diskAllocation` is `'task'`.
`seqera.executor.machineRequirement.diskEncrypted`
: Enable KMS encryption for the EBS volume (default: `false`). Only applicable when `diskAllocation` is `'task'`.

`seqera.executor.machineRequirement.diskIops`
: The IOPS for io1/io2/gp3 volumes. Required for io1/io2 volume types. Only applicable when `diskAllocation` is `'task'`.

`seqera.executor.machineRequirement.diskEncrypted`
: Enable KMS encryption for the EBS volume (default: `false`). Only applicable when `diskAllocation` is `'task'`.

`seqera.executor.machineRequirement.diskMountPath`
: The container path where the disk is mounted (default: `'/tmp'`). Applicable to all disk allocation strategies.

`seqera.executor.taskEnvironment`
: Custom environment variables to apply to all tasks submitted by the Seqera executor. These are merged with the Fusion environment variables, with Fusion variables taking precedence. For example: `taskEnvironment = [MY_VAR: 'value']`.
`seqera.executor.machineRequirement.diskThroughputMiBps`
: The throughput in MiB/s for gp3 volumes (125-1000). Default: `325` (Fusion recommended). Only applicable when `diskAllocation` is `'task'`.

`seqera.executor.machineRequirement.diskType`
: The EBS volume type for task scratch disk. Supported types: `'ebs/gp3'` (default), `'ebs/gp2'`, `'ebs/io1'`, `'ebs/io2'`, `'ebs/st1'`, `'ebs/sc1'`. Only applicable when `diskAllocation` is `'task'`.

`seqera.executor.machineRequirement.machineTypes`
: List of acceptable EC2 instance families. For example, `['m5', 'c5', 'r5']`.

`seqera.executor.machineRequirement.maxSpotAttempts`
: The maximum number of spot retry attempts before falling back to on-demand. Only used when `provisioning` is `'spot'` or `'spotFirst'`.

`seqera.executor.machineRequirement.provisioning`
: The instance provisioning mode. Can be `'spot'`, `'ondemand'`, or `'spotFirst'`.

`seqera.executor.retryPolicy.delay`
: The initial delay when a failing HTTP request is retried (default: `'450ms'`).
Expand Down
31 changes: 31 additions & 0 deletions docs/reference/process.md
Original file line number Diff line number Diff line change
Expand Up @@ -840,6 +840,37 @@ The above example produces:
[4, D]
```

(process-hints)=

### hints

The `hints` directive specifies executor-specific hints as key-value pairs. Each executor uses the hints it recognizes and ignores the rest. Hint values can be any raw value (i.e., numbers, strings, booleans, lists, and maps).

Unprefixed keys are available to **every** executor. Any executor that recognizes the key consumes it. Prefixing a key with an executor name (e.g., `awsbatch/...`) restricts the hint to that executor only. For example:

```nextflow
process hello {
hints consumableResources: ['my-license': 1]

script:
"""
your_command --here
"""
}
```

To restrict a hint to a single executor, prefix the key with the executor name:

```nextflow
hints 'awsbatch/consumableResources': ['my-license': 1]
```

When the same hint is provided both unprefixed and with a matching executor prefix, the prefixed form takes precedence for that executor.

Calling `hints` multiple times in a process definition accumulates entries, with later calls overwriting entries for the same key. Setting `hints` via configuration (e.g., in `nextflow.config`) replaces the entire map.

See {ref}`executor-page` to see which hints are recognized by each executor.

(process-label)=

### label
Expand Down
Loading
Loading