feat: UTD Juno cluster config by eternal-flame-AD · Pull Request #895 · nf-core/configs

eternal-flame-AD · 2025-04-26T04:58:39Z

name: New Config
about: A new cluster config

Please follow these steps before submitting your PR:

If your PR is a work in progress, include [WIP] in its title
Your PR targets the master branch
You've included links to relevant issues, if any

Steps for adding a new config profile:

Add your custom config file to the conf/ directory
Add your documentation file to the docs/ directory
Add your custom profile to the nfcore_custom.config file in the top-level directory
Add your custom profile to the README.md file in the top-level directory
Add your profile name to the profile: scope in .github/workflows/main.yml
OPTIONAL: Add your custom profile path and GitHub user name to .github/CODEOWNERS (**/<custom-profile>** @<github-username>)

Signed-off-by: eternal-flame-AD <yume@yumechi.jp>

eternal-flame-AD · 2025-04-26T05:05:56Z

Requesting a review on this (request for a review feature isn't active for me), thanks.

Also would appreciate some guidance on how to systematically test against existing pipelines, I only tested toy CPU and GPU jobs for now.

pontus · 2025-04-27T06:52:55Z

@nf-core-bot fix linting please

pontus · 2025-04-27T07:02:16Z

+All of the intermediate files required to run the pipeline will be stored in the `work/` directory. It is recommended to delete this directory after the pipeline has finished successfully because it can get quite large, and all of the main output files will be saved in the `results/` directory anyway.
+
+> [!NOTE]
+> You will need an account to use the HPC cluster on Ganymede in order to run the pipeline.


Personal opinion; maybe it would make sense to have a single profile to for UTD systems to make life easier for users if the systems are aligned enough?

I am not sure about the entire situation but we would certainly need an escape hatch, it doesn't seem like in the near future access to all clusters would be unified.

Or are you talking about we try to detect which system we are running on, and then select the queue?

There may be different opinions and I'm not witholding approval because of this, but to me it typically makes a lot more sense to have a profile for the institution/site/department/provider rather than any number of different profiles for different clusters.

@pontus due to profile/config inheritance issues with DSL2, it was recommended to us (somewhere, I can't find it now) that it's better to have singular config files per cluster rather than 'sub profiles'.

So indeed utd_juno and then in the future utd_ganymede etc, is valid.

Note it also makes it easier to deprecate older clusters etc.

I'm not sure what issues with DSL2 that would have been, but there's definitely some things related to the upcoming strict syntax that makes it harder (but to me, not to a degree that I think the added work for configuration maintenance outweighs the burden of users).

But I can say I agree this is something one can hav different opinions about :)

To me, it seems having separate profiles make it harder to deprecate old clusters, rather than just changes in a config file and possibly docs, there's more that needs to be pulled down.

pontus · 2025-04-27T07:05:25Z

There's a lot of lifting to pick the right slurm options for GPU choices but I don't see anything to set singularity.runOptions properly to provide those GPUs inside the container.

pontus · 2025-04-27T07:09:21Z

GPU handling for nf-core pipelines isn't really standardised yet, and certainly not at the level queues are defined here, but still the process_gpu is used by many modules, so it might be nice trying to provide some GPU for those.

pontus · 2025-04-27T07:13:16Z

For testing, simply running running through the test profiles for some popular pipelines seem sensible (nf-core templates define at least test and test_full profiles that can be expected to be available).

eternal-flame-AD · 2025-04-27T07:38:53Z

I had a couple discussions with edmund too about the GPU situation, especially regards to container environments, this cluster has only one node that truly has one single H100 and 1 A30 GPU you can use as a whole, the other ones you have to submit 2 or 4 runs to one machine to make use of them. It's highly related to nextflow-io/nextflow#3909.

Currently my personal workaround for nextflow is a global semaphore program that assigns GPU on the fly, so I don't explicitly ask for a GPU through nextflow as of now.

See: https://docs.sylabs.io/guides/3.5/user-guide/gpu.html#multiple-gpus , you need a GPU ID but I don't see a portable way to get it out of nextflow yet.

pontus · 2025-04-28T08:05:54Z

As for GPU support in singularity; I'm not sure where you see that an id is required or why you think you'd need to get that out of nextflow.

Essentially, there's two main ways your cluster can provide the devices you have available - one is to have a job specific /dev mount that only has the devices for the cards the job has
been granted (this works right off).

The other is by all the libraries being respectful and just setting CUDA_VISIBLE_DEVICES. I think that should work as well as singularity should inherit that into the container, but if not, you can prefix it with SINGULARITYENV_ to have it being set explicitly inside by singularity. Same with NVIDIA_VISIBLE_DEVICES and I suppose you can cross-use if needed. Anything that needs to be done there should be possible with a beforeScript.

(I don't see this as required but given the other work with GPUs, it seems to me to make sense to actually help users.)

And I hope you're not running singularity 3.5, that's quite old nowadays.

Signed-off-by: eternal-flame-AD <yume@yumechi.jp>

eternal-flame-AD · 2025-04-28T09:06:19Z

Thanks for the suggestions, I think I addressed all except the discussion regarding unification, I will test some workflows out this week.

pontus · 2025-04-30T07:44:21Z

+## Heterogenous/GPU jobs
+
+Juno is a heterogenous compute cluster, which means it can accommodate pipelines that require GPUs.
+The config file has a dispatch rule that will automatically assign a queue based on the accelerator directive. You can always override this by specifying a queue directly in the `queue` directive.


Since the automatic selection will also assign clusterOptions that will impede scheduling, one should override that as well?

pontus · 2025-04-30T07:45:53Z

+Juno is a heterogenous compute cluster, which means it can accommodate pipelines that require GPUs.
+The config file has a dispatch rule that will automatically assign a queue based on the accelerator directive. You can always override this by specifying a queue directly in the `queue` directive.
+
+The supported accelerators considered by the profile are NVIDIA H100 and A30 GPUs, you can request them like this:


Not sure how helpful this is in the context of a nf-core config.

pontus · 2025-04-30T07:48:36Z

Approved, with general opinions stated before still being my opinions :)

I'm not sure if the cuda load went away? If it was unclear, I think it shouldn't be done if it isn't a GPU job, but it should be in those cases. I also suspect it may be a requirement for singularity to pick up the GPU libraries to bind in correctly.

jfy133 · 2025-06-18T03:13:13Z

Do you still plan to update/merge this PR @eternal-flame-AD ?

eternal-flame-AD · 2025-06-18T03:36:56Z

Oops I forgot this, I will check it and update or merge it this week. Thanks for the reminder!

eternal-flame-AD · 2025-06-22T23:54:15Z

Thanks, good. I think let's merge this for now.

Sorry, the new clusters are not moderated well, and the wait times are multi-day from some less considerate users.

feat: UTD Juno cluster config

8a08ebf

Signed-off-by: eternal-flame-AD <yume@yumechi.jp>

eternal-flame-AD force-pushed the master branch from 28d45ab to 8a08ebf Compare April 26, 2025 05:00

eternal-flame-AD changed the title ~~[WIP] feat: UTD Juno cluster config~~ feat: UTD Juno cluster config Apr 26, 2025

Merge branch 'master' into master

1abddcf

[automated] Fix code linting

ebbea18

pontus reviewed Apr 27, 2025

View reviewed changes

Comment thread conf/utd_juno.config Outdated

pontus reviewed Apr 27, 2025

View reviewed changes

Comment thread .github/CODEOWNERS

pontus reviewed Apr 27, 2025

View reviewed changes

Comment thread conf/utd_juno.config Outdated

eternal-flame-AD added 2 commits April 28, 2025 03:58

address suggestions 1

725ae6b

Signed-off-by: eternal-flame-AD <yume@yumechi.jp>

add containeroptions stanza

6aa1724

Signed-off-by: eternal-flame-AD <yume@yumechi.jp>

pontus reviewed Apr 30, 2025

View reviewed changes

pontus approved these changes Apr 30, 2025

View reviewed changes

pontus reviewed Apr 30, 2025

View reviewed changes

edmundmiller added this to nf-core 💚 GPUs May 20, 2025

Merge branch 'master' into master

5aa81df

eternal-flame-AD self-assigned this Jun 18, 2025

eternal-flame-AD merged commit 72503da into nf-core:master Jun 22, 2025
151 checks passed

github-project-automation Bot moved this to Done in nf-core 💚 GPUs Jun 22, 2025

Conversation

eternal-flame-AD commented Apr 26, 2025

name: New Config about: A new cluster config

Uh oh!

eternal-flame-AD commented Apr 26, 2025

Uh oh!

pontus commented Apr 27, 2025

Uh oh!

Uh oh!

Uh oh!

pontus Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

eternal-flame-AD Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

pontus Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

jfy133 Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

pontus Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pontus commented Apr 27, 2025

Uh oh!

pontus commented Apr 27, 2025

Uh oh!

pontus commented Apr 27, 2025

Uh oh!

eternal-flame-AD commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pontus commented Apr 28, 2025

Uh oh!

eternal-flame-AD commented Apr 28, 2025

Uh oh!

pontus Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

pontus Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

pontus commented Apr 30, 2025

Uh oh!

jfy133 commented Jun 18, 2025

Uh oh!

eternal-flame-AD commented Jun 18, 2025

Uh oh!

Uh oh!

eternal-flame-AD commented Jun 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

name: New Config
about: A new cluster config

eternal-flame-AD commented Apr 27, 2025 •

edited

Loading