[WIP] Update Cambridge config for current CSD3 partitions#1102
[WIP] Update Cambridge config for current CSD3 partitions#1102RaqManzano merged 12 commits intonf-core:masterfrom
Conversation
jfy133
left a comment
There was a problem hiding this comment.
Sorry for any duplicate comments!
Don't forget to add the config in: https://github.com/nf-core/configs/blob/master/.github/workflows/main.yml
| // Compatibility with nf-core schema validation across pipeline versions. | ||
| schema_ignore_params = 'partition,project,max_memory,max_cpus,max_time,csd_time,csd_parts,csd_selected,validationSchemaIgnoreParams' | ||
| validationSchemaIgnoreParams = 'partition,project,max_memory,max_cpus,max_time,csd_time,csd_parts,csd_selected,schema_ignore_params,validationSchemaIgnoreParams' |
There was a problem hiding this comment.
Are these two really needed, line 26 seems to cover these already?
There was a problem hiding this comment.
We could but this is to be more compatible with older pipelines that still look for schema_ignore_params and validationSchemaIgnoreParams
| } | ||
|
|
||
| // Description is overwritten with user specific flags | ||
| params.csd_time = { |
There was a problem hiding this comment.
Can this not go in the main params block?
There was a problem hiding this comment.
I prefer to keep it separate from the main block as it is not really a param for the users.
|
Thanks for this Raquel! A few comments and suggestions from my side:
In the guidelines, I would give a specific recommendation for where to store the singularity cache. In our HPC, it is definitely not recommended to store directly in You could also give a specific recommendation, for example: I like your approach of defining the max parameters from the partition. At the risk of over-complicating, I wonder if there could be an "automatic" selection if the user doesn't specify anything. For example, in my config I have the following: process {
executor = 'slurm'
clusterOptions = --partition icelake'
// Settings below are for CSD3 nodes detailed at
// https://docs.hpc.cam.ac.uk/hpc/index.html
// Current resources (Jun 2023):
// icelake: 76 CPUs; 3380 MiB per cpu; 6760 MiB per cpu (himem)
// cclake: 56 CPUs; 3420 MiB per cpu; 6840 MiB per cpu (himem)
// The values used below were chosen to be multiples of these resources
// assuming a maximum of 2 retries
// Using himem partition to ensure enough memory for single-CPU jobs
withLabel:process_single {
cpus = { check_max( 1 , 'cpus' ) }
memory = { check_max( 6800MB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
clusterOptions = "--partition icelake-himem"
}
// 4 CPUs + 13GB RAM
withLabel:process_low {
cpus = { check_max( 4 * task.attempt, 'cpus' ) }
memory = { check_max( 13.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
clusterOptions = "--partition icelake"
}
// 8 CPUs + 27GB RAM
withLabel:process_medium {
cpus = { check_max( 8 * task.attempt, 'cpus' ) }
memory = { check_max( 27.GB * task.attempt, 'memory' ) }
time = { check_max( 8.h * task.attempt, 'time' ) }
clusterOptions = "--partition icelake"
}
// 12 CPUs + 40GB RAM
withLabel:process_high {
cpus = { check_max( 12 * task.attempt, 'cpus' ) }
memory = { check_max( 40.GB * task.attempt, 'memory')}
time = { check_max( 8.h * task.attempt, 'time' ) }
clusterOptions = "--partition icelake"
}
// Going by chunks of 12h (2 retries should bring it to max of 36h)
withLabel:process_long {
time = { check_max( 12.h * task.attempt, 'time' ) }
}
// A multiple of 3 should bring it to max resources on icelake-himem
withLabel:process_high_memory {
cpus = { check_max( 25 * task.attempt, 'cpus' ) }
memory = { check_max( 170.GB * task.attempt, 'memory' ) }
clusterOptions = "--partition icelake-himem"
}
withLabel:error_ignore {
errorStrategy = 'ignore'
}
withLabel:error_retry {
errorStrategy = 'retry'
maxRetries = 2
}
}So, I allow for 2 retries and increase the resources accordingly, to roughly reach the maximum resources of each partition.
But, totally fine if you prefer to leave these extra additions out - the config revision you did is already a great update to the previous config, thanks so much! 🙂 |
|
Thanks for the comments @tavareshugo, I just changed $HOME to $HOME/rds, you were right. it was lazy writing from my side, thanks for calling on that! Regarding the "automatic" selection scenario, I already tried an implementation of it and after discussing with the nf-core guys we went for a simpler approach. I refined the docs with info about the partitions. Let me know if you agree, feel free to make changes. |
tavareshugo
left a comment
There was a problem hiding this comment.
Keeping the config "simpler" is probably a good idea indeed.
I've made a couple of minor suggestions to the markdown.
Co-authored-by: Hugo Tavares <tavareshugo@users.noreply.github.com>
name: Update Cambridge CSD3 Config
about: Updating Cambridge CSD3 cluster config
Summary
This PR updates the Cambridge CSD3 institutional profile to reflect the current partitions and simplifies the configuration logic based on feedback from the nf-core team.
Changes
conf/cambridge.configwith current partition limits:icelakeicelake-himemsapphireicelakeas the default partition--partitionas a user override-SL3-get a12.hcap36.hfor SL1 / SL2docs/cambridge.md:screen/tmuxusagesrun/sbatchfor large runs where the Nextflow manager can become memory-heavy.github/CODEOWNERSAckowledgements
Many thanks to @pontus, @jfy133, @tdanhorn and @maxulysse for their initial feedback in the nf-core channel.