Skip to content

job_submit/lua - add "core_spec" to job_desc HPC-12614#79

Draft
iddecker wants to merge 76 commits intohpcugent:25.11.ugfrom
iddecker:core_spec_lua
Draft

job_submit/lua - add "core_spec" to job_desc HPC-12614#79
iddecker wants to merge 76 commits intohpcugent:25.11.ugfrom
iddecker:core_spec_lua

Conversation

@iddecker
Copy link
Copy Markdown

No description provided.

naterini and others added 30 commits February 9, 2026 14:54
Regression from 2586a90.

Changelog: slurmcltd - Avoid persistent connections hangs when
 enable_async_reply is configured.
Issue: 50737
Ticket: 24550, 24516, 24581
Cherry-picked: e248aec
When validating a node config, if there is some problem with the
configuration, the node could end up with topo_cnt greater than the
actual gres configured.

This is a problem, because could lead to a corrupt state save being
written by the controller: A topo_cnt sized array with less than
topo_cnt values initialized. Which can cause SIGEGV while accessing some
individual values.

Fix properly updates topo_cnt when there are config errors to prevent
that potential mismatch.

Ticket: 24025
Changelog: Prevent potential controller segfault when reconfiguring
 after gres file updates.
Cherry-picked: 0cb29ac
See merge request SchedMD/dev/slurm!2925
See merge request SchedMD/dev/slurm!2924
Ticket: 23467
Cherry-picked: 440cc02
When starting slurmd as a service, move the process to a subcgroup of
the slurmd systemd unit cgroup.

According to systemd docs the top cgroup is owned by systemd and it will
eventually try to reset slurmd limits, e.g. on a systemctl daemon-reload.

This can cause systemd errors in the system log, so we are only allowed to
touch the limits in a sub-cgroup. This emulates DelegateSubgroup=slurmd
only available after systemd 255.

Ticket: 23467
Cherry-picked: 212bbf0
Ticket: 23467
Changelog: Reparent slurmd to a subcgroup to avoid conflicting with systemd.
Cherry-picked: cb62c2f
See merge request SchedMD/dev/slurm!2929
sprio was unable to parse a comma-separated list of jobs despite
documentation stating that this was possible. The entire job list was
passed to the unfmt_job_id_string() function, but this function is
unable to parse job lists. Switching to only pass the single job to
unfmt_job_id_string() fixes this issue.

Regression introduced in 4f21612.

Changelog: Fix sprio regression not handling comma separated list of
 jobids.
Ticket: 24625
Cherry-picked: 80beef6
See merge request SchedMD/dev/slurm!2950
Changelog: slurmctld,slurmd - Fix memory leak when container ID is
 populated.
Issue: 50190
Cherry-picked: 13aba98
Enhanced _remove_ecores() to use a two-pass approach that collects all
P-core frequencies instead of just the first one found. This fixes P-core
detection on processors where different P-cores report different
FrequencyMaxMHz values.

The previous implementation (commit a80d6ba) used single-frequency
matching which incorrectly excluded P-cores with different frequencies.
On Intel Core Ultra 7 268V, some P-cores report 5000 MHz while others
report 4900 MHz, causing only 2 of 4 P-cores to be detected.

Implementation:
- First pass: Find all CPU Kinds with CoreType=IntelCore and store all
  their distinct FrequencyMaxMHz values
- Second pass: Include any CPU Kinds matching any collected frequency,
  even without CoreType=IntelCore attribute

The second pass is necessary for hwloc < 2.10 where P-cores restricted
by cpuset lose their CoreType attribute but retain FrequencyMaxMHz.
This was fixed in hwloc 2.10 (commit 971ea80f9) which added PMU-based
CoreType detection. The two-pass approach maintains compatibility with
older hwloc versions while fixing the multiple-frequency limitation that
broke detection on newer Intel processors.

Changelog: slurmd - Fix P-core detection on processors with varying
 P-core frequencies and in cpuset-restricted environments.
Ticket: 24590
Cherry-picked: c5bd248
See merge request SchedMD/dev/slurm!2957
See merge request SchedMD/dev/slurm!2955
This option allows namespace/linux to continue to operate even if bpf
tokens are not supported on the system, user namespaces are enabled, and
the cgroup constrain devices option is enabled.

In this mode of operation, any devices that would be constrained will
only be constrained to the job and not individual steps.

Ticket: 21718
Changelog: namespace/linux - add disable_bpf_token option.
Cherry-picked: a70b2d8
Changelog: slurmctld - Avoid expedited requeue triggering a job to requeue
 when job exit code was zero.
Ticket: 24564
Cherry-picked: c255c7d
batch_requeue_fini() should only be called after the epilog scripts have
been complete. Add logic to catch and log if batch_requeue_fini() is ever
called before the epilog is complete.

Changelog: slurmctld - Avoid expedited requeue of jobs while waiting for
 job epilog script to complete.
Ticket: 24564
Cherry-picked: 1511a97
Fair-Share factor reported only for users.

Ticket: 24618
Cherry-picked: f0ce621
…nfig

Prevent cloud nodes that are configured in topology.[conf|yaml] from being
removed from the topology when the node is powered down if it does not use
Topology=... in its node configuration line in slurm.conf or in the --conf
slurmd option.

If node_ptr->config_ptr->topology_str, node_ptr->topology_str, and
node_ptr->topology_orig_str are NULL, then the slurm.conf and --conf
option did not specify a topology for the node and thus the topology of
the node does not need to be reset by node_mgr_set_node_topology().

Changelog: slurmctld - Prevent removing cloud nodes from the topology when
 putting them in the POWERED_DOWN state if they are present in
 topology.conf or topology.yaml and their node configuration did not
 specify the Topology option.
Ticket: 24405
Cherry-picked: b36d9b4
Before topology_g_add_rm_node would only clear a node from a topology if
topology_str was NULL. When it wasn't it would only modify the topologies
listed in topology_str while potentially leaving the node in another
topology.

This makes it remove the node from all topologies not listed in
topology_str.

Changelog: interfaces/topology - When modifying a nodes topology with the
 Topology option in slurm.conf or the slurmd --conf Topology, change the
 topology to fully match the new topology.
Ticket: 24405
Cherry-picked: fafc23e
If the node configuration line in slurm.conf specifies the Topology option,
this change allows modifications to that option take effect on a
reconfig/restart. The same for changes to Topology.[conf|yaml].

Changelog: slurmctld - Allow changes to topology.conf or topology.yaml, and
 slurm.conf node configuration Topology option to take effect on a
 reconfigure or restart when power saving is enabled.
Ticket: 24405
Cherry-picked: 8f1b2d2
See merge request SchedMD/dev/slurm!2963
See merge request SchedMD/dev/slurm!2937
See merge request SchedMD/dev/slurm!2961
This fixes the logic in slurm_bf_licenses_equal() to validate that both
license lists are equal. Before only licenses in the first license list
were being checked to see if they were also the same in the second
license list. It did not verify all licenses from the second license list
were in the first license list.

Changelog: slurmctld - Prevent backfill from combining future timeslots if
 they have different license reservations.
Cherry-picked: fd0ef0d
If a slurmctld reconfig/restart happens while a cloud node is POWER_DOWN,
the node's addr is reset, and, because it is CLOUD, it can end up FUTURE.
Add POWER_DOWN to the other POWER states to exclude the node from addr
reset.

Ticket: 24358
Changelog: Fix CLOUD nodes infrequently becoming FUTURE on slurmctld
 restart.
Cherry-picked: 49e3ba7
See merge request SchedMD/dev/slurm!2968
Nathan Bulloch and others added 29 commits February 18, 2026 06:03
When rem_nodes is 0 in _get_block_level(), log2 resulted in -inf.
Previously, this caused a crash.

Check for negative before accessing the block_levels bitstring.

Changelog: Fix handling of 0 node test allocations in topology/block.
Ticket: 24552
Cherry-picked: 0b5f64f
See merge request SchedMD/dev/slurm!2973
See merge request SchedMD/dev/slurm!2962
Function returns the count of licenses in cluster_license_list. This is in
preparation for following commit.

Ticket: 24594
Cherry-picked: 95659c0
Sort the appended licenses from the advanced license reservations such that
they come after the non-resv licenses, sorted by resv_id and lic_id.

This is in preparation for the following commit.

Ticket: 24594
Cherry-picked: aaa02a4
If the remaining count of a HRes license that the job requests changes in
the next node_space table entry set later_start. Skip doing so
in all other cases in regards to licenses. This is due to HRes licenses
masking out nodes in the available bitmap depending on remaining counts.

This prevents setting later_start unnecessarily leading to many unneeded
calls to _try_sched().

Changelog: slurmctld - In backfill, prevent unnecessarily testing jobs
 at future times using the select plugin if it is guaranteed to fail.
Ticket: 24594
Cherry-picked: a23a0a2
See merge request SchedMD/dev/slurm!2981
See merge request SchedMD/dev/slurm!2977
This fixes a regression added by commit a23a0a2. When the node_space
table entries licenses is NULL don't return true that there is an
increase in requested licenses in the next time slot.

Ticket: 24594
Cherry-picked: 7321137
See merge request SchedMD/dev/slurm!2984
Update slurm.spec and debian/changelog as well.
slurmrestd does not handle SIGHUP; sending HUP terminates the daemon
instead of reconfiguring. Drop ExecReload so systemctl reload is not
advertised and does not kill the process.

Changelog: slurmrestd - Remove ExecReload from unit file since the
 daemon does not handle SIGHUP (reload would terminate the process).
Ticket: 24667
Cherry-picked: cd319ef
See merge request SchedMD/dev/slurm!2996
…es it

Call _archive_table() for the main purge_type before archiving PURGE_JOB
associated data (job env, script, etc.). Commit 4dcca88 had reversed
this order; period_start is only set by the main archive path, while
archive_write_file at the end of _archive_table uses it. The job env/
script codepaths (_pack_archive_job_env, _pack_archive_job_script) do
not set it, so the main archive must run first.

Changelog: Prevent "period_start should already be set" errors when purging
 slurmdbd data and fix file names for archives of purged slurmdbd data.
Ticket: 24523
Cherry-picked: b263ac6
See merge request SchedMD/dev/slurm!3002
Ticket: 24439
Changelog: Skip x11 shutdown when x11 functionality was not requested.
Cherry-picked: 165fe40
See merge request SchedMD/dev/slurm!3011
Changelog: Fix build errors with recent versions of libcurl (8.16+).
Cherry-picked: 0ee19c3
See merge request SchedMD/dev/slurm!3018
See merge request SchedMD/dev/slurm!3021
After swapping alloc->environment with state.job_env in _alloc_job(),
alloc->env_size was left with its original value while alloc->environment
might be smaller, causing a segfault when trying to access it out-of-bounds
in the later env_array_for_job() call.

This failed with step_mgr and in 25.11 after 87e4a70. Both of two
injects environment variables.

Changelog: Fix scrun segfault with step_mgr and if environment is set.
Ticket: 24662
Cherry-picked: 3ded4ee
See merge request SchedMD/dev/slurm!3028
Located in the job info struct.

Ticket: 24674
Changelog: Fix two memory leaks located in the job info struct.
Cherry-picked: 3012ffe
See merge request SchedMD/dev/slurm!3039
@iddecker iddecker marked this pull request as draft March 27, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.