Add Purdue RCAC institutional profiles (Bell, Gautschi, Negishi) by aseetharam · Pull Request #1085 · nf-core/configs

aseetharam · 2026-04-13T14:33:35Z

name: New Config
about: A new cluster config

Please follow these steps before submitting your PR:

If your PR is a work in progress, include [WIP] in its title
Your PR targets the master branch
You've included links to relevant issues, if any

Steps for adding a new config profile:

Add your custom config file to the conf/ directory
Add your documentation file to the docs/ directory
Add your custom profile to the nfcore_custom.config file in the top-level directory
Add your profile name to the profile: scope in .github/workflows/main.yml
Add your custom profile path and GitHub user name to .github/CODEOWNERS

Summary

Adds three institutional profiles for Purdue University Rosen Center for Advanced Computing (RCAC) HPC clusters:

purdue_bell — Bell (AMD EPYC 7662 Rome, 128c/256GB CPU)
purdue_gautschi — Gautschi (AMD EPYC 9654 Genoa, 192c/384GB CPU + NVIDIA L40/H100 GPU)
purdue_negishi — Negishi (AMD EPYC 7763 Milan, 128c/256GB CPU)

Design notes

Separate profiles per cluster; shared structure, cluster-specific partition and resource mappings.
Container runtime: Apptainer (system default on all three; /usr/bin/singularity is a symlink).
Required user param: --cluster_account (hard-fails if unset; RCAC mandates --account on all jobs). Added to validation.ignoreParams to suppress schema warnings.
Opt-in --use_standby true routes eligible jobs through the 4 h standby QoS.
Gautschi exposes process_gpu label routing to smallgpu (L40, default) or ai (H100, via --gpu_partition=ai).
Bell and Negishi intentionally do not expose GPU labels: their GPU partitions are AMD ROCm hardware, incompatible with CUDA-only nf-core GPU modules.
Shared iGenomes mirror at /depot/itap/datasets/igenomes.
Container cache and work dir use $RCAC_SCRATCH with $SCRATCH and $HOME fallbacks (works in CI without RCAC env).

Testing

Each profile validated on its target cluster with:

nextflow run nf-core/demo -profile test,purdue_<cluster> --cluster_account <acct> --outdir $RCAC_SCRATCH/...

Live runs on Bell, Gautschi, and Negishi produced sacct records with the expected Partition, Account, and QOS values. Gautschi additionally validated with --use_standby true to confirm QoS propagation.

Not included

purdue_gilbreth (GPU-only cluster; CPU-heavy nf-core steps would waste GPU nodes). Can be added later if a GPU-pipeline use case emerges.
Anvil (ACCESS resource); separate effort.

Contact

Arun Seetharam, @aseetharam, aseethar@purdue.edu

aseetharam · 2026-04-13T15:55:54Z

The three Purdue profile tests (purdue_bell, purdue_gautschi, purdue_negishi) all pass. The 15 failing checks are for other institutions' profiles (alliance_canada, bi, incliva) and repo-wide lint/config jobs. These don't appear related to my changes and look like they may be pre-existing failures on master. Could a maintainer confirm whether I need to address anything here, or if these are upstream issues?

… docs

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

aseetharam · 2026-04-13T20:02:30Z

@copilot apply changes based on the comments in this thread

aseetharam · 2026-04-13T20:03:31Z

@copilot apply changes based on the comments in this thread

aseetharam · 2026-04-13T20:03:45Z

@copilot apply changes based on the comments in this thread

aseetharam · 2026-04-13T20:04:27Z

@copilot apply changes based on the comments in this thread

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jfy133

Comments in first apply to all subsequent configs :)

…remove nextflowVersion pin

aseetharam · 2026-04-14T15:18:10Z

Thanks for the thorough review. I've pushed commit addressing all points:

CODEOWNERS: switched to **/purdue_** wildcard, matches the pattern of **/crg**, **/iris**, etc.
Removed nextflowVersion pin from all three configs; letting pipelines decide.
Added use_standby (and gpu_partition on Gautschi) to validation.ignoreParams.
Updated error message to "profile requires..." wording.
Switched from throw new IllegalArgumentException to System.err.println + System.exit(1) across all validation closures so users get a clean error instead of a Java stack trace. Kept this inside the clusterOptions closures because top-level validation blocks conflicted with Nextflow 26 strict-config syntax earlier in this PR.
Biggest change per your comment on process_high_memory: removed all withLabel resource overrides and replaced with a dynamic top-level queue = { task.memory > 256.GB ? 'highmem' : 'cpu' } closure. The old process_long override (which stripped --qos=standby) is also gone; the standby flag is now gated by task.memory <= 256.GB && task.time <= 4.h inside clusterOptions, so it's inactive for long or high-memory jobs automatically. This means pipelines fully own their resource requests now; the profile just picks the right partition and applies the right account/QoS flags.
Removed trace/report/timeline/dag blocks; relying on pipeline defaults.
Kept withLabel: process_gpu on Gautschi only (GPU routing is label-based, not memory-based).

Docs updated to document the dynamic routing and the Slurm >= 65 / >= 49 core floors on highmem. Re-reviewed the >=48 vs >=49 convention per your Gautschi-specific suggestion and used the latter.

Thanks also @pontus for the cleaner error pattern.

jfy133 · 2026-04-15T07:06:59Z

+                System.err.println("ERROR: purdue_gautschi params.gpu_partition must be 'smallgpu' or 'ai' (got '${params.gpu_partition}').")
+                System.exit(1)
+            }
+            params.gpu_partition


How many GPU partitions do you have? Not a blocker, but I want to check there is not a way to have Nextflow automatically pick a partition based on other task attributions (e.g. task.memory for a largegpu particiation, for example)

Two GPU partitions on Gautschi: smallgpu (2x L40, 24 h) and ai (8x H100, 14 d). I deliberately kept this as a user-selectable param rather than dynamic routing because access to GPU partitions on Gautschi is allocation-bound. Users are entitled to the partition tied to their lab's purchase. A user with a smallgpu allocation auto-routed to ai (or vice versa) would just hit a Slurm submission rejection, no fallback. Letting them set --gpu_partition matches the access model and avoids surprises. Happy to add a comment in the config explaining this.

jfy133 · 2026-04-15T07:11:36Z

@pontus any last thoughts, if not please merge if you are happy with this now :) (given we still have blocking other configs)

pontus · 2026-04-15T07:32:11Z

Just to check before merging - there are checks for some environment variables (RCAC_SCRATCH, SCRATCH) and if either of those are set the apptainer cache dir is set to that. If those are job dependent it seems there will be no persistent cache (which seems bad) and if those are global it seems there will be conflicts with ownership et.c. If they are set to some place that's user-unique and repeatable, it seems good, though.

Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>

aseetharam · 2026-04-15T12:12:54Z

@pontus Good catch. $RCAC_SCRATCH on Bell/Gautschi/Negishi is set centrally by RCAC and resolves to /scratch/<cluster>/<username> for every user, persistent across sessions and inherited by Slurm jobs. So it's user-unique and stable, satisfying both your concerns. RCAC also provides /usr/local/bin/findscratch which returns the same path (e.g. /scratch/gautschi/aseethar), so the convention is documented and stable on their end.
The fallback to $SCRATCH is defensive in case someone runs from an environment that overrides RCAC_SCRATCH (unusual). The final fallback to $HOME is a last-resort and not recommended (RCAC home quotas are tight) but prevents Nextflow from blowing up if both env vars are unset.

aseetharam added 3 commits April 13, 2026 09:28

Add Purdue RCAC institutional profiles (Bell, Gautschi, Negishi)

c9594e0

Fix strict syntax: move validation into closures, remove orphan blocks

7e5135d

Apply prettier formatting to docs

5dcda6f

aseetharam requested review from Copilot and jfy133 April 13, 2026 15:56

Copilot started reviewing on behalf of aseetharam April 13, 2026 16:03 View session

This comment was marked as resolved.

Sign in to view

Address Copilot review: ignoreParams, multi-GPU support, rewrite Bell…

540a131

… docs

aseetharam requested a review from Copilot April 13, 2026 18:51

Copilot started reviewing on behalf of aseetharam April 13, 2026 18:51 View session

This comment was marked as resolved.

Sign in to view

Address Copilot review: standby excludes process_long, clarify docs

3e36580

aseetharam requested a review from Copilot April 13, 2026 19:47

Copilot started reviewing on behalf of aseetharam April 13, 2026 19:48 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Comment thread docs/purdue_negishi.md Outdated

Comment thread conf/purdue_bell.config Outdated

Comment thread docs/purdue_bell.md Outdated

Comment thread docs/purdue_gautschi.md Outdated

aseetharam requested a review from Copilot April 13, 2026 20:04

Copilot started reviewing on behalf of aseetharam April 13, 2026 20:05 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Merge branch 'master' into purdue-rcac-profiles

7df109a

jfy133 reviewed Apr 14, 2026

View reviewed changes

Address jfy133/pontus review: dynamic queue routing, cleaner errors, …

196528b

…remove nextflowVersion pin

aseetharam requested review from jfy133 and pontus April 14, 2026 15:53

jfy133 approved these changes Apr 15, 2026

View reviewed changes

aseetharam and others added 3 commits April 15, 2026 07:05

Update docs/purdue_bell.md

b3c4b32

Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>

Update docs/purdue_gautschi.md

fe4d3d7

Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>

Update docs/purdue_negishi.md

80a9a94

Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>

Merge branch 'master' into purdue-rcac-profiles

96924dd

pontus merged commit 53b74a0 into nf-core:master Apr 15, 2026
148 of 161 checks passed

aseetharam deleted the purdue-rcac-profiles branch April 15, 2026 12:38

Conversation

aseetharam commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

name: New Config about: A new cluster config

Summary

Design notes

Testing

Not included

Contact

Uh oh!

aseetharam commented Apr 13, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aseetharam commented Apr 13, 2026

Uh oh!

aseetharam commented Apr 13, 2026

Uh oh!

aseetharam commented Apr 13, 2026

Uh oh!

aseetharam commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jfy133 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aseetharam commented Apr 14, 2026

Uh oh!

jfy133 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

aseetharam Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jfy133 commented Apr 15, 2026

Uh oh!

pontus commented Apr 15, 2026

Uh oh!

aseetharam commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aseetharam commented Apr 13, 2026 •

edited

Loading

name: New Config
about: A new cluster config