Skip to content

Ability to dedicate accelerator directive (GPU) over range #5570

@adamrtalbot

Description

@adamrtalbot

New feature

When submitting to nodes with >1 GPU, Nextflow has very limited capabilities to split the work over separate GPUs.

Usage scenario

Let's imagine running on AWS Batch. We submit multiple GPU enabled tasks to the Batch service. AWS allocates them to a single, large instance with multiple GPUs, as per it's allocation strategy which prioritises the cheapest per CPU price.

In this instance, all tasks would be able to use all GPUs at the same time, leading to collisions and GPU memory issues.

We have some strategies to deal with this:

  • Use a specific machine size which can only fit a single GPU enabled task
  • Use environment variables such as NVIDIA_VISIBLE_DEVICES to point each task at a specific GPU
  • Set maxForks to 1 to ensure only a single task is executed at once

However, we lack a way of saying "for a queue of x GPUs, assign each task to 1 GPU available"

What I want is each task to know which GPU we can use, then only use that GPU.

Suggest implementation

I don't actually have a good fix here. Perhaps using a process array with an index might help? Perhaps it's specific to each executor? But I feel like Nextflow could expose a variable to help us here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions