Skip to content

[WIP] Update Cambridge config for current CSD3 partitions#1102

Merged
RaqManzano merged 12 commits intonf-core:masterfrom
RaqManzano:update-cambridge-config
May 5, 2026
Merged

[WIP] Update Cambridge config for current CSD3 partitions#1102
RaqManzano merged 12 commits intonf-core:masterfrom
RaqManzano:update-cambridge-config

Conversation

@RaqManzano
Copy link
Copy Markdown
Contributor

@RaqManzano RaqManzano commented Apr 29, 2026


name: Update Cambridge CSD3 Config
about: Updating Cambridge CSD3 cluster config

Summary

This PR updates the Cambridge CSD3 institutional profile to reflect the current partitions and simplifies the configuration logic based on feedback from the nf-core team.

Changes

  • update conf/cambridge.config with current partition limits:
    • icelake
    • icelake-himem
    • sapphire
  • set icelake as the default partition
  • keep --partition as a user override
  • infer walltime limits from the SLURM account name:
    • accounts containing -SL3- get a 12.h cap
    • otherwise default to 36.h for SL1 / SL2
  • add schema validation ignores for config-specific parameters
  • refresh docs/cambridge.md:
    • update install instructions
    • document partition selection
    • explain screen / tmux usage
    • add a note recommending srun / sbatch for large runs where the Nextflow manager can become memory-heavy
  • add Cambridge to .github/CODEOWNERS
  • add Cambridge-specific module initialization and SLURM executor tuning to improve job submission stability on CSD3

Ackowledgements

Many thanks to @pontus, @jfy133, @tdanhorn and @maxulysse for their initial feedback in the nf-core channel.

@RaqManzano RaqManzano self-assigned this Apr 29, 2026
Comment thread docs/cambridge.md Outdated
Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for any duplicate comments!

Don't forget to add the config in: https://github.com/nf-core/configs/blob/master/.github/workflows/main.yml

Comment thread conf/cambridge.config
Comment on lines +20 to +22
// Compatibility with nf-core schema validation across pipeline versions.
schema_ignore_params = 'partition,project,max_memory,max_cpus,max_time,csd_time,csd_parts,csd_selected,validationSchemaIgnoreParams'
validationSchemaIgnoreParams = 'partition,project,max_memory,max_cpus,max_time,csd_time,csd_parts,csd_selected,schema_ignore_params,validationSchemaIgnoreParams'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two really needed, line 26 seems to cover these already?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could but this is to be more compatible with older pipelines that still look for schema_ignore_params and validationSchemaIgnoreParams

Comment thread conf/cambridge.config
}

// Description is overwritten with user specific flags
params.csd_time = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not go in the main params block?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep it separate from the main block as it is not really a param for the users.

@tavareshugo
Copy link
Copy Markdown
Contributor

Thanks for this Raquel! A few comments and suggestions from my side:


cambridge.md

In the guidelines, I would give a specific recommendation for where to store the singularity cache.

In our HPC, it is definitely not recommended to store directly in $HOME/ -- perhaps an explicit warning about this is good to include.

You could also give a specific recommendation, for example: $HOME/rds/hpc-work/nxf-singularity-cache is a good place.


I like your approach of defining the max parameters from the partition. At the risk of over-complicating, I wonder if there could be an "automatic" selection if the user doesn't specify anything. For example, in my config I have the following:

process {
  executor = 'slurm'
  clusterOptions = --partition icelake'

  // Settings below are for CSD3 nodes detailed at
  //   https://docs.hpc.cam.ac.uk/hpc/index.html
  // Current resources (Jun 2023):
  //   icelake: 76 CPUs; 3380 MiB per cpu; 6760 MiB per cpu (himem)
  //   cclake: 56 CPUs; 3420 MiB per cpu; 6840 MiB per cpu (himem)
  // The values used below were chosen to be multiples of these resources
  // assuming a maximum of 2 retries

  // Using himem partition to ensure enough memory for single-CPU jobs
  withLabel:process_single {
      cpus   = { check_max( 1                  , 'cpus'    ) }
      memory = { check_max( 6800MB * task.attempt, 'memory'  ) }
      time   = { check_max( 4.h  * task.attempt, 'time'    ) }
      clusterOptions = "--partition icelake-himem"
  }
  // 4 CPUs + 13GB RAM
  withLabel:process_low {
      cpus   = { check_max( 4     * task.attempt, 'cpus'    ) }
      memory = { check_max( 13.GB * task.attempt, 'memory'  ) }
      time   = { check_max( 4.h   * task.attempt, 'time'    ) }
      clusterOptions = "--partition icelake"
  }
  // 8 CPUs + 27GB RAM
  withLabel:process_medium {
      cpus   = { check_max( 8     * task.attempt, 'cpus'    ) }
      memory = { check_max( 27.GB * task.attempt, 'memory'  ) }
      time   = { check_max( 8.h   * task.attempt, 'time'    ) }
      clusterOptions = "--partition icelake"
  }
  // 12 CPUs + 40GB RAM
  withLabel:process_high {
      cpus   = { check_max( 12    * task.attempt, 'cpus'    ) }
      memory = { check_max( 40.GB * task.attempt, 'memory')}
      time   = { check_max( 8.h  * task.attempt, 'time'    ) }
      clusterOptions = "--partition icelake"
  }
  // Going by chunks of 12h (2 retries should bring it to max of 36h)
  withLabel:process_long {
      time   = { check_max( 12.h  * task.attempt, 'time'    ) }
  }
  // A multiple of 3 should bring it to max resources on icelake-himem
  withLabel:process_high_memory {
      cpus   = { check_max( 25     * task.attempt, 'cpus'    ) }
      memory = { check_max( 170.GB * task.attempt, 'memory' ) }
      clusterOptions = "--partition icelake-himem"
  }
  withLabel:error_ignore {
      errorStrategy = 'ignore'
  }
  withLabel:error_retry {
      errorStrategy = 'retry'
      maxRetries    = 2
  }
}

So, I allow for 2 retries and increase the resources accordingly, to roughly reach the maximum resources of each partition.
So, the behaviour could be something like:

  • If user specifies --partition, then all jobs are submitted to that partition only
  • If user doesn't specify anything, something like the above would kick in, where jobs are submitted either to icelake or icelake-himem depending on the process labels.

But, totally fine if you prefer to leave these extra additions out - the config revision you did is already a great update to the previous config, thanks so much! 🙂

@RaqManzano
Copy link
Copy Markdown
Contributor Author

Thanks for the comments @tavareshugo, I just changed $HOME to $HOME/rds, you were right. it was lazy writing from my side, thanks for calling on that! Regarding the "automatic" selection scenario, I already tried an implementation of it and after discussing with the nf-core guys we went for a simpler approach. I refined the docs with info about the partitions. Let me know if you agree, feel free to make changes.

Comment thread docs/cambridge.md
Comment thread docs/cambridge.md
Copy link
Copy Markdown
Contributor

@tavareshugo tavareshugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the config "simpler" is probably a good idea indeed.

I've made a couple of minor suggestions to the markdown.

@RaqManzano RaqManzano merged commit 3a9a247 into nf-core:master May 5, 2026
152 of 161 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants