Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
**/unsw_katana** @jscgh
**/seadragon** @jiawku
**/fred_hutch** @derrik-gratz
**/nci_gadi** @georgiesamaha
**/roslin** @sguizard @donalddunbar
**/lrz_cm4** @nschan
**/crg** @joseespinosa
Expand Down
22 changes: 5 additions & 17 deletions conf/nci_gadi.config
Original file line number Diff line number Diff line change
@@ -1,43 +1,31 @@
// NCI Gadi nf-core configuration profile
params {
config_profile_description = 'NCI Gadi HPC profile provided by nf-core/configs'
config_profile_contact = 'Georgie Samaha (@georgiesamaha), Matthew Downton (@mattdton)'
config_profile_contact = 'Georgie Samaha (@georgiesamaha), Kisaru Liyanage (@kisarur), Matthew Downton (@mattdton)'
Comment thread
jfy133 marked this conversation as resolved.
config_profile_url = 'https://opus.nci.org.au/display/Help/Gadi+User+Guide'
project = System.getenv("PROJECT")
storage = "gdata/${params.project}+scratch/${params.project}"
Comment thread
jfy133 marked this conversation as resolved.
Outdated
Comment thread
kisarur marked this conversation as resolved.
Outdated
}

// Enable use of Singularity to run containers
singularity {
enabled = true
autoMounts = true
cacheDir = "/scratch/${params.project}/${System.getenv('USER')}/nxf_singularity_cache"
}

// Submit up to 300 concurrent jobs (Gadi exec max)
// pollInterval and queueStatInterval of every 5 minutes
// submitRateLimit of 20 per minute
executor {
queueSize = 300
pollInterval = '5 min'
queueStatInterval = '5 min'
submitRateLimit = '20 min'
}

// Define process resource limits
process {
executor = 'pbspro'
storage = "scratch/${params.project}"
project = "${params.project}"
storage = "${params.storage}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Process doesn't have any of these directives?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version of Nextflow installed on Gadi has been modified to introduce these two new directives (see the docs). A note has been added to nci_gadi.md about this too.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's certainly... creative.

I'd strongly recommend using clusterOptions instead, both because it allows users to bring their own versions of nextflow and use them with the profile (e.g. if there's a lag in providing updated versions required for running some pipeline). It would also help if linting starts complaining for unknown directives (I don't know anything that currently does this, but it seems useful for catching typos).

Copy link
Copy Markdown
Contributor Author

@kisarur kisarur Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pontus.

The version of Nextflow installed on Gadi includes more changes than just the introduction of these new directives (e.g. adding support for job submission and monitoring with the route/exec queue structure available on Gadi). Therefore, users may not be able to bring their own unmodified version of Nextflow and expect it to work on Gadi.

Also, with Gadi's PBS Pro, storage locations must be specified using -l storage resource directive in the PBS script. If we just add -l storage=${params.storage} to clusterOptions, Nextflow will override the resources specified via directives like cpus, memory, etc. A workaround would be to reconstruct the entire -l resource directive within clusterOptions (e.g., -l storage=${params.storage},ncpus=${task.cpus},mem=${task.memory},...), but I feel our current approach is cleaner.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kisarur I sort of feel uncomfortable with it (was it not possible to make a plugin rather than patch nextflow?) but it thats what you need then we have to accept that.

However please could you comment very clearly everywhere in the config wherever not standard nextflow is being used.

We would not want other users copying it over into their configs where they are not using your special patched version of nextflow.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info, I also would feel rather uncomfortable with that setup but as there's no linter complaint I guess it's fine for now. I see Maxime has suggested comments to mark this as special (hopefully meaning people won't try to copy it), it would be great if you implement those.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with James that this seems much better suited to a plugin that provides a custom executor than simply monkey-patching nextflow.

It would be possible that custom process directives would still exist then but I think it would be far clearer for users and the plugin + executor would be embedded in the config too which gives additional indication that things are non-standard

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, we have been supporting nextflow since before plugins were available. This is how it has been done for a while and the initial focus was around getting nextflow working at all.

Comment thread
kisarur marked this conversation as resolved.
Outdated
module = 'singularity'
cache = 'lenient'
stageInMode = 'symlink'
queue = { task.memory < 128.GB ? 'normalbw' : (task.memory >= 128.GB && task.memory <= 190.GB ? 'normal' : (task.memory > 190.GB && task.memory <= 1020.GB ? 'hugemembw' : '')) }
beforeScript = 'module load singularity'
}

// Write custom trace file with outputs required for SU calculation
def trace_timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss')
trace {
enabled = true
overwrite = false
file = "./gadi-nf-core-trace-${trace_timestamp}.txt"
fields = 'name,status,exit,duration,realtime,cpus,%cpu,memory,%mem,rss'
}
74 changes: 45 additions & 29 deletions docs/nci_gadi.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,71 +2,87 @@

nf-core pipelines have been successfully configured for use on the [Gadi HPC](https://opus.nci.org.au/display/Help/Gadi+User+Guide) at the National Computational Infrastructure (NCI), Canberra, Australia.

To run an nf-core pipeline at NCI Gadi, run the pipeline with `-profile singularity,nci_gadi`. This will download and launch the [`nci_gadi.config`](../conf/nci_gadi.config) which has been pre-configured with a setup suitable for the NCI Gadi HPC cluster. Using this profile, a docker image containing all of the required software will be downloaded, and converted to a Singularity image before execution of the pipeline.
To run an nf-core pipeline at NCI Gadi, run the pipeline with `-profile singularity,nci_gadi`. This will download and launch the [`nci_gadi.config`](https://github.com/nf-core/configs/blob/master/conf/nci_gadi.config) which has been pre-configured with a setup suitable for the NCI Gadi HPC cluster.

## Access to NCI Gadi

Please be aware that you will need to have a user account, be a member of an Gadi project, and have a service unit allocation to your project in order to use this infrastructure. See the [NCI user guide](https://opus.nci.org.au/display/Help/Getting+Started+at+NCI) for details on getting access to Gadi.

## Launch an nf-core pipeline on Gadi

### Prerequisites

Before running the pipeline you will need to load Nextflow and Singularity, both of which are globally installed modules on Gadi. You can do this by running the commands below:
Before running the pipeline, you will need to load Nextflow and Singularity, both of which are globally installed modules on Gadi (under `/apps`). You can do this by running the commands below:

```bash
module purge
module load nextflow singularity
module load nextflow
module load singularity
```

Comment thread
kisarur marked this conversation as resolved.
### Execution command
You can then run the pipeline using:

```bash
module load nextflow
module load singularity

nextflow run <nf-core_pipeline>/main.nf \
-profile singularity,nci_gadi \
<additional flags>
```

### Cluster considerations

Please be aware that as of July 2023, NCI Gadi HPC queues **do not** have external network access. This means you will not be able to pull the workflow code base or containers if you submit your `nextflow run` command as a job on any of the standard job queues. NCI currently recommends you run your Nextflow head job either in a GNU screen or tmux session from the login node or submit it as a job to the [copyq](https://opus.nci.org.au/display/Help/Queue+Structure). See the [nf-core documentation](https://nf-co.re/docs/usage/offline) for instructions on running pipelines offline.
#### External network access

Please be aware that NCI Gadi HPC compute nodes **do not** have external network access. This means you will not be able to pull the workflow codebase or containers if you submit your `nextflow run` command as a job on any of the standard job queues (see the [nf-core documentation](https://nf-co.re/docs/usage/offline) for instructions on running pipelines offline). NCI currently recommends you run your Nextflow head job either in a GNU screen or tmux session within a [persistent session](https://opus.nci.org.au/spaces/Help/pages/241926895/Persistent+Sessions), or submit it as a job to the [copyq](https://opus.nci.org.au/display/Help/Queue+Structure).

For example, to run Nextflow in a GNU screen session within a persistent session:

```bash
persistent-sessions start -p <project> <ps_name>
ssh <ps_name>.<user>.<project>.ps.gadi.nci.org.au
screen -S <screen_name>
nextflow run ...
```

You can detach from the screen session using Ctrl+A, then D, and log out of the persistent session while the pipeline continues to run. Later, you can reconnect to the persistent session using the same `ssh` command and reattach to the screen session with: `screen -r <screen_name>`.

#### Downloading containers

This config requires Nextflow to use [Singularity](https://www.nextflow.io/docs/latest/container.html#singularity) to execute processes. Before any process can be executed, the nf-core pipeline will first download the required container image to a local cache. This cache location can be specified using either `$NXF_SINGULARITY_CACHEDIR` environment variable or the `singularity.cacheDir` setting in the Nextflow config file. `nci_gadi.config` specifies the download and storage location with:

```
singularity.cacheDir = "/scratch/${params.project}/${System.getenv('USER')}/nxf_singularity_cache"
```

See the [project accounting](#project-accounting) section below for details on `params.project`.

Furthermore, Singularity uses the `$SINGULARITY_CACHEDIR` directory to store intermediate image layers and files during pulls (note that this cache is only used when the required container is not already available in Nextflow's own Singularity cache, specified by `$NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`). By default, `$SINGULARITY_CACHEDIR` is set to `$HOME/.singularity/cache`. For pipelines involving a large number and/or large size of first-time container downloads, we recommend setting this environment variable to a scratch location to avoid exceeding your home filesystem quota. For example, before running your nextflow run command, you can set the environment variable to a location in the scratch filesystem with:

```
export SINGULARITY_CACHEDIR=/scratch/$PROJECT/$USER/singularity_cache
```

#### Gadi queues and job submission

This config currently determines which Gadi queue to submit your task jobs to based on the amount of memory required. For the sake of resource and cost (service unit) efficiency, the following rules are applied by this config:

- Tasks requesting **less than 128 Gb** will be submitted to the normalbw queue
- Tasks requesting **more than 128 Gb and less than 190 Gb** will be submitted to the normal queue
- Tasks requesting **more than 190 Gb and less than 1020 Gb** will be submitted to the hugemembw queue

See the NCI Gadi [queue limit documentation](https://opus.nci.org.au/display/Help/Queue+Limits) for details on charge rates for each queue.
Note that these are only baseline queue settings and may be adjusted depending on the goals of your pipeline run and the most efficient use of the HPC. You can make a local copy of the `nci_gadi.config` and modify the queue assignments as needed for specific processes or process groups. See the NCI Gadi [queue limit documentation](https://opus.nci.org.au/display/Help/Queue+Limits) for more information on the available queues and their associated charge rates.

### Project accounting

This config uses the PBS environmental variable `$PROJECT` to assign a project code to all task job submissions for billing purposes. If you are a member of multiple Gadi projects, you should confirm which project will be charged for your pipeline execution. You can do this using:
This config uses `params.project` to assign a project code to all task job submissions for billing purposes. By default, this is set to the environment variable `$PROJECT`. If you are a member of multiple Gadi projects, you can choose which project will be charged for your pipeline execution by setting `params.project` (`--project` on the command line) to the desired project code.

```bash
echo $PROJECT
```
Similarly, `params.storage` (`--storage` on the command line) is used to specify the storage locations that the pipeline needs to access. By default, this is set to `gdata/${params.project}+scratch/${params.project}`.

The version of Nextflow installed on Gadi has been modified to make it easier to specify resource options for jobs submitted to the cluster. See NCI's [Gadi user guide](https://opus.nci.org.au/display/DAE/Nextflow) for more details. You can manually override the `$PROJECT` specification by editing your local copy of the `nci_gadi.config` and replacing `$PROJECT` with your project code. For example:

```nextflow
process {
executor = 'pbspro'
project = 'aa00'
storage = 'scratch/aa00+gdata/aa00'
...
}
```
Note: The version of Nextflow installed on Gadi has been modified to make it easier to specify resource options for jobs submitted to the cluster through the Nextflow process block (see NCI's [Gadi user guide](https://opus.nci.org.au/display/DAE/Nextflow) for more details). The values specified through the parameters above are passed into the process block in the `nci_gadi.config`.

## Resource usage

The NCI Gadi config summarises resource usage in a custom trace file that will be saved to your execution directory. However, for accounting or resource benchmarking purposes you may need to collect per-task service unit (SU) charges. Upon workflow completion, you can run the Sydney Informatics Hub's [gadi_nfcore_report.sh](https://github.com/Sydney-Informatics-Hub/HPC_usage_reports/blob/master/Scripts/gadi_nfcore_report.sh) script in your workflow execution directory with:
To help monitor the service unit (SU) cost of running workflows on Gadi, a plugin has been developed to generate a report in CSV or JSON format upon workflow completion. The `nf-gadi` plugin is available via the Nextflow plugin registry and can be enabled by adding `-plugins nf-gadi` to your Nextflow run command. See the [plugin project repository](https://github.com/AustralianBioCommons/nf-gadi) for more details.

Additionally, Sydney Informatics Hub also provides a script to collect per-task SU costs. Upon workflow completion, you can run the [gadi_nfcore_report.sh](https://github.com/Sydney-Informatics-Hub/HPC_usage_reports/blob/master/Scripts/gadi_nfcore_report.sh) in your workflow execution directory to collect resources from the PBS log files printed to each task's `.command.log`. Resource requests and usage for each process are summarised in the output `gadi-nf-core-joblogs.tsv` file. To run it, execute the following in your workflow execution directory:

```bash
bash gadi_nfcore_report.sh
```

This script will collect resources from the PBS log files printed to each task's `.command.log`. Resource requests and usage for each process is summarised in the output `gadi-nf-core-joblogs.tsv` file. This is useful for resource benchmarking and SU accounting.
```
Loading