Add Purdue RCAC institutional profiles (Bell, Gautschi, Negishi)#1085
Add Purdue RCAC institutional profiles (Bell, Gautschi, Negishi)#1085pontus merged 11 commits intonf-core:masterfrom
Conversation
|
The three Purdue profile tests (purdue_bell, purdue_gautschi, purdue_negishi) all pass. The 15 failing checks are for other institutions' profiles (alliance_canada, bi, incliva) and repo-wide lint/config jobs. These don't appear related to my changes and look like they may be pre-existing failures on master. Could a maintainer confirm whether I need to address anything here, or if these are upstream issues? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot apply changes based on the comments in this thread |
3 similar comments
|
@copilot apply changes based on the comments in this thread |
|
@copilot apply changes based on the comments in this thread |
|
@copilot apply changes based on the comments in this thread |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jfy133
left a comment
There was a problem hiding this comment.
Comments in first apply to all subsequent configs :)
…remove nextflowVersion pin
|
Thanks for the thorough review. I've pushed commit addressing all points:
Docs updated to document the dynamic routing and the Slurm Thanks also @pontus for the cleaner error pattern. |
| System.err.println("ERROR: purdue_gautschi params.gpu_partition must be 'smallgpu' or 'ai' (got '${params.gpu_partition}').") | ||
| System.exit(1) | ||
| } | ||
| params.gpu_partition |
There was a problem hiding this comment.
How many GPU partitions do you have? Not a blocker, but I want to check there is not a way to have Nextflow automatically pick a partition based on other task attributions (e.g. task.memory for a largegpu particiation, for example)
There was a problem hiding this comment.
Two GPU partitions on Gautschi: smallgpu (2x L40, 24 h) and ai (8x H100, 14 d). I deliberately kept this as a user-selectable param rather than dynamic routing because access to GPU partitions on Gautschi is allocation-bound. Users are entitled to the partition tied to their lab's purchase. A user with a smallgpu allocation auto-routed to ai (or vice versa) would just hit a Slurm submission rejection, no fallback. Letting them set --gpu_partition matches the access model and avoids surprises. Happy to add a comment in the config explaining this.
|
@pontus any last thoughts, if not please merge if you are happy with this now :) (given we still have blocking other configs) |
|
Just to check before merging - there are checks for some environment variables ( |
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
|
@pontus Good catch. |
name: New Config
about: A new cluster config
Please follow these steps before submitting your PR:
[WIP]in its titlemasterbranchSteps for adding a new config profile:
conf/directorydocs/directorynfcore_custom.configfile in the top-level directoryprofile:scope in.github/workflows/main.yml.github/CODEOWNERSSummary
Adds three institutional profiles for Purdue University Rosen Center for Advanced Computing (RCAC) HPC clusters:
purdue_bell— Bell (AMD EPYC 7662 Rome, 128c/256GB CPU)purdue_gautschi— Gautschi (AMD EPYC 9654 Genoa, 192c/384GB CPU + NVIDIA L40/H100 GPU)purdue_negishi— Negishi (AMD EPYC 7763 Milan, 128c/256GB CPU)Design notes
/usr/bin/singularityis a symlink).--cluster_account(hard-fails if unset; RCAC mandates--accounton all jobs). Added tovalidation.ignoreParamsto suppress schema warnings.--use_standby trueroutes eligible jobs through the 4 h standby QoS.process_gpulabel routing tosmallgpu(L40, default) orai(H100, via--gpu_partition=ai)./depot/itap/datasets/igenomes.$RCAC_SCRATCHwith$SCRATCHand$HOMEfallbacks (works in CI without RCAC env).Testing
Each profile validated on its target cluster with:
nextflow run nf-core/demo -profile test,purdue_<cluster> --cluster_account <acct> --outdir $RCAC_SCRATCH/...Live runs on Bell, Gautschi, and Negishi produced
sacctrecords with the expectedPartition,Account, andQOSvalues. Gautschi additionally validated with--use_standby trueto confirm QoS propagation.Not included
purdue_gilbreth(GPU-only cluster; CPU-heavy nf-core steps would waste GPU nodes). Can be added later if a GPU-pipeline use case emerges.Contact
Arun Seetharam, @aseetharam, aseethar@purdue.edu