Ludicrous Mode: Lockable resources queue contention reduction and script caching by samrocketman · Pull Request #966 · jenkinsci/lockable-resources-plugin

samrocketman · 2026-03-25T20:02:32Z

What is ludicrous speed?

Ludicrous speed is faster than light speed

Background

I am patching all plugins which affect Jenkins queue maintenance performance as part of an overall initiative to make Jenkins faster for everyone.

Your plugin has been identified for adversely affecting queue performance and so I am opening a proposal pull request to fix the performance issue.

Please keep the Ludocrous Mode: prefix in the pull request title so that patches can be tracked across all plugins.

Patch details

Queue.maintian() is a critical path in Jenkins core. Any performance delays can significantly impact Jenkins scheduling work on agents. Benchmarking for Queue.maintain() should execute in only a few milliseconds even when Jenkins agents reach in the thousands.

This patch provides three key features:

Lightweight API calls called by Queue.maintain() should instantly return.
Heavyweight API calls are made in the background (e.g. resource evaluation) outside of the Queue.maintain() path.
Since Queue.maintain() has extreme performance, scheduleMaintenance() can be more aggressively called as locked resources become available.

The lockable-resources-plugin hooks into the Queue via a QueueTaskDispatcher — its canRun(Queue.Item) method is called under the Queue lock for every item on every Queue.maintain() cycle. Unlike cloud plugins that use RetentionStrategy or Cloud, a QueueTaskDispatcher sits in the absolute innermost loop of Queue maintenance.

Three issues were found:

No scheduleMaintenance() triggers anywhere. When resources are freed (build completes, pipeline lock body finishes, user unreserves via UI), items waiting in the Jenkins Queue for those resources wait up to 5 seconds for the next timer tick. The patch adds immediate scheduleMaintenance() calls at 6 resource-freeing events: unlockResources(), unreserve(), reset(), LockRunListener.onCompleted(), LockRunListener.onDeleted(), and FreeDeadJobs.freePostMortemResources().
syncResources lock contention under Queue lock. The plugin uses a synchronized(syncResources) monitor in tryQueue() that is also held by threads doing disk I/O (save()). When onCompleted() holds syncResources during a config write, canRun() under the Queue lock blocks waiting — transitively extending Queue lock hold time by disk write latency. The patch makes save() asynchronous with coalescing and narrows the syncResources scope in tryQueue() so that candidate resolution (label matching, Groovy script evaluation) runs outside the critical section.
Groovy script and label expression evaluation under Queue lock. When jobs use resourceMatchScript, canRun() evaluates a Groovy script for every lockable resource on every cache miss. The existing cachedCandidates Guava cache mitigates repeated evaluations, but cache misses (first call, or after resource state change) are heavyweight. Similarly, isValidLabel() parses label expressions and allocates LabelAtom sets on every call. The patch adds per-resource TTL caches for both script evaluation results and label expression results.

Original Patch series

Some background. This patch is part of a series (ludicrous patch series) which enables my production instance to scale work up to ludicrous amounts of agents.

AI Analysis

See detailed AI analysis for lockable-resources plugin. You're welcome to copy any documentation from the AI analysis that you find you want included. I purposefully separated the analysis in order to limit how much I am changing in your repository.

Testing done

I do rely on this plugin in production. I have not tested this patch, yet. I plan to deploy it to my production instance after non-prod testing completes. I will follow up in this pull request when I'm successfully running this in production.

Advisory

I plan to update Jenkins core with warnings when there's plugins which adversely affect Jenkins queue performance. Please keep in mind not merging changes with this strategy will involve this plugin being called out for tanking Jenkins performance. This patch solves a critical performance issue for your plugin so that it does not tank Jenkins performance (in how fast Jenkins schedules work on agents).

Submitter checklist

Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
Ensure that the pull request title represents the desired changelog entry
Please describe what you did
Link to relevant issues in GitHub or Jira
Link to relevant pull requests, esp. upstream and downstream changes
Ensure you have provided tests that demonstrate the feature works or the issue is fixed

…ipt caching What is ludicrous speed? ------------------------ [Obligatory video of ludicrous speed](https://youtu.be/oApAdwuqtn8?si=uJHPP5OXz9GNllY0) <details><summary>Ludicrous speed is faster than light speed</summary> --- ![spaceballs ludicrous speed](https://github.com/user-attachments/assets/803d4aa5-4677-40b9-b04b-d4f22ea2cbf3) </details> --- Background ---------- I am patching all plugins which affect Jenkins queue maintenance performance as part of an overall initiative to make Jenkins faster for everyone. Your plugin has been identified for adversely affecting queue performance and so I am opening a proposal pull request to fix the performance issue. Please keep the `Ludocrous Mode:` prefix in the pull request title so that patches can be tracked across all plugins. Patch details ------------- `Queue.maintian()` is a critical path in Jenkins core. Any performance delays can significantly impact Jenkins scheduling work on agents. Benchmarking for `Queue.maintain()` should execute in only a few milliseconds even when Jenkins agents reach in the thousands. This patch provides three key features: * Lightweight API calls called by `Queue.maintain()` should instantly return. * Heavyweight API calls are made in the background (e.g. resource evaluation) outside of the `Queue.maintain()` path. * Since `Queue.maintain()` has extreme performance, `scheduleMaintenance()` can be more aggressively called as locked resources become available. --- The lockable-resources-plugin hooks into the Queue via a `QueueTaskDispatcher` — its `canRun(Queue.Item)` method is called **under the Queue lock** for every item on every `Queue.maintain()` cycle. Unlike cloud plugins that use `RetentionStrategy` or `Cloud`, a `QueueTaskDispatcher` sits in the absolute innermost loop of Queue maintenance. Three issues were found: 1. **No `scheduleMaintenance()` triggers anywhere.** When resources are freed (build completes, pipeline lock body finishes, user unreserves via UI), items waiting in the Jenkins Queue for those resources wait up to 5 seconds for the next timer tick. The patch adds immediate `scheduleMaintenance()` calls at 6 resource-freeing events: `unlockResources()`, `unreserve()`, `reset()`, `LockRunListener.onCompleted()`, `LockRunListener.onDeleted()`, and `FreeDeadJobs.freePostMortemResources()`. 2. **`syncResources` lock contention under Queue lock.** The plugin uses a `synchronized(syncResources)` monitor in `tryQueue()` that is also held by threads doing disk I/O (`save()`). When `onCompleted()` holds `syncResources` during a config write, `canRun()` under the Queue lock blocks waiting — transitively extending Queue lock hold time by disk write latency. The patch makes `save()` asynchronous with coalescing and narrows the `syncResources` scope in `tryQueue()` so that candidate resolution (label matching, Groovy script evaluation) runs outside the critical section. 3. **Groovy script and label expression evaluation under Queue lock.** When jobs use `resourceMatchScript`, `canRun()` evaluates a Groovy script for every lockable resource on every cache miss. The existing `cachedCandidates` Guava cache mitigates repeated evaluations, but cache misses (first call, or after resource state change) are heavyweight. Similarly, `isValidLabel()` parses label expressions and allocates `LabelAtom` sets on every call. The patch adds per-resource TTL caches for both script evaluation results and label expression results. Original Patch series --------------------- Some background. This patch is part of a series (ludicrous patch series) which enables my production instance to scale work up to ludicrous amounts of agents. - [ec2 plugin] - [job-restrictions plugin] - [leastload plugin] AI Analysis ----------- See detailed [AI analysis for lockable-resources plugin][1]. You're welcome to copy any documentation from the AI analysis that you find you want included. I purposefully separated the analysis in order to limit how much I am changing in your repository. Testing done ------------ I do rely on this plugin in production. I have not tested this patch, yet. I plan to deploy it to my production instance after non-prod testing completes. I will follow up in this pull request when I'm successfully running this in production. Advisory -------- I plan to update Jenkins core with warnings when there's plugins which adversely affect Jenkins queue performance. Please keep in mind not merging changes with this strategy will involve this plugin being called out for tanking Jenkins performance. This patch solves a critical performance issue for your plugin so that it does not tank Jenkins performance (in how fast Jenkins schedules work on agents). Submitter checklist ------------------- - [x] Make sure you are opening from a **topic/feature/bugfix branch** (right side) and not your main branch! - [x] Ensure that the pull request title represents the desired changelog entry - [x] Please describe what you did - [ ] Link to relevant issues in GitHub or Jira - [x] Link to relevant pull requests, esp. upstream and downstream changes - [ ] Ensure you have provided tests that demonstrate the feature works or the issue is fixed [1]: https://github.com/samrocketman/jenkins-ai-analysis/tree/main/ludicrous-mode-analysis/lockable-resources-plugin [ec2 plugin]: jenkinsci/ec2-plugin#2000 [job-restrictions plugin]: jenkinsci/job-restrictions-plugin#210 [leastload plugin]: jenkinsci/leastload-plugin#22 Co-authored-by: sgleske@integralads.com

Copilot

Pull request overview

This PR proposes performance-focused changes to reduce Jenkins Queue maintenance contention caused by the lockable-resources plugin, primarily by triggering immediate queue maintenance on resource-freeing events, reducing synchronous disk I/O impact, and caching expensive evaluations.

Changes:

Added Queue.scheduleMaintenance() triggers on multiple resource-freeing code paths.
Introduced asynchronous/coalesced save() to avoid disk I/O extending syncResources lock hold time.
Added per-resource TTL caches for Groovy script matching and label expression evaluation.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`src/main/java/org/jenkins/plugins/lockableresources/util/Constants.java`	Adds system properties controlling async save and cache TTL behavior.
`src/main/java/org/jenkins/plugins/lockableresources/queue/LockRunListener.java`	Schedules queue maintenance after build completion/deletion unlocks.
`src/main/java/org/jenkins/plugins/lockableresources/LockableResourcesManager.java`	Refactors `tryQueue`, adds queue-maintenance scheduling helper, and implements async/coalesced saving.
`src/main/java/org/jenkins/plugins/lockableresources/LockableResource.java`	Adds per-resource TTL caches for script and label evaluation.
`src/main/java/org/jenkins/plugins/lockableresources/FreeDeadJobs.java`	Schedules queue maintenance after freeing post-mortem resources.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T20:16:02Z

    public boolean scriptMatches(@NonNull SecureGroovyScript script, @CheckForNull Map<String, Object> params)
            throws ExecutionException {
+        if (SCRIPT_CACHE_TTL_MS > 0) {
+            String cacheKey = script.getScript();
+            ConcurrentHashMap<String, CachedResult> cache = getScriptCache();
+            CachedResult cached = cache.get(cacheKey);
+            if (cached != null && !cached.isExpired(SCRIPT_CACHE_TTL_MS)) {
+                return cached.value;
+            }
+            boolean result = evaluateScript(script, params);
+            cache.put(cacheKey, new CachedResult(result));
+            return result;
+        }
+        return evaluateScript(script, params);


There are no tests covering the new script/label TTL caching behavior. In particular, caching script results without considering params can change matching semantics across builds; adding unit/integration tests that queue multiple items with different parameters and a resourceMatchScript would help prevent regressions.

Copilot · 2026-03-25T20:16:02Z

    public void unlockResources(List<LockableResource> resourcesToUnLock, Run<?, ?> build) {
        if (resourcesToUnLock == null || resourcesToUnLock.isEmpty()) {
            return;
        }
        synchronized (syncResources) {
            this.freeResources(resourcesToUnLock, build);

            while (proceedNextContext()) {
                // process as many contexts as possible
            }

            save();
        }
+        scheduleQueueMaintenance();
    }


unlockResources(..) now calls scheduleQueueMaintenance() unconditionally. However, this method is invoked from call sites that already hold synchronized(syncResources) (e.g., steal(..) and recycle(..)), so the queue maintenance trigger can still run while the plugin lock is held—contradicting scheduleQueueMaintenance()'s own Javadoc and risking lock-order inversion with the Jenkins Queue lock. Consider removing the scheduling side effect from unlockResources(..) (or splitting into an internal helper) and only calling scheduleQueueMaintenance() from outermost methods after all syncResources monitors have been released.

Copilot · 2026-03-25T20:16:02Z

+        if (asyncSaveEnabled && saveCoalesceMs > 0) {
+            if (getSavePending().compareAndSet(false, true)) {
+                getSaveExecutor().schedule(this::doSave, saveCoalesceMs, TimeUnit.MILLISECONDS);
+            }
+        } else {
+            doSave();
+        }


With async-save enabled, save() may return before the configuration has been persisted. If Jenkins is shutting down or crashes within the coalescing window, the latest lock state changes may never be written. Consider flushing pending saves on shutdown (or when disabling async mode), or ensuring critical state changes force an immediate doSave() when the controller is terminating.

Copilot · 2026-03-25T20:16:03Z

@@ -1401,9 +1468,20 @@

        if (enableSave == 0) return; // saving is disabled

-        synchronized (syncResources) {
-            if (BulkChange.contains(this)) return;
+        if (BulkChange.contains(this)) return;

+        if (asyncSaveEnabled && saveCoalesceMs > 0) {
+            if (getSavePending().compareAndSet(false, true)) {
+                getSaveExecutor().schedule(this::doSave, saveCoalesceMs, TimeUnit.MILLISECONDS);
+            }
+        } else {
+            doSave();
+        }
+    }


The async-save/coalescing behavior is new and not covered by tests (e.g., verifying that multiple rapid state changes result in exactly one persisted write after the coalesce delay, and that disabling async mode still saves immediately). Adding tests around save() would help ensure future changes don’t reintroduce Queue-lock contention or cause persistence regressions.

Copilot · 2026-03-25T20:16:03Z

    public boolean scriptMatches(@NonNull SecureGroovyScript script, @CheckForNull Map<String, Object> params)
            throws ExecutionException {
+        if (SCRIPT_CACHE_TTL_MS > 0) {
+            String cacheKey = script.getScript();
+            ConcurrentHashMap<String, CachedResult> cache = getScriptCache();
+            CachedResult cached = cache.get(cacheKey);
+            if (cached != null && !cached.isExpired(SCRIPT_CACHE_TTL_MS)) {
+                return cached.value;
+            }
+            boolean result = evaluateScript(script, params);
+            cache.put(cacheKey, new CachedResult(result));
+            return result;


scriptMatches caches results by script.getScript() only, but the evaluation also depends on the provided params (build parameters/matrix combinations) and on bound resource fields (description/labels/note). This can return incorrect matches for different queue items within the TTL window. Include the relevant params (e.g., a stable hash of the map entries) and/or the resource fields in the cache key, or only cache when params is null/empty, and ensure caches are invalidated when bound fields change.

Copilot · 2026-03-25T20:16:03Z

+    /** Per-resource cache: Groovy script text -> (result, timestamp). */
+    private transient volatile ConcurrentHashMap<String, CachedResult> scriptCache;
+    /** Per-resource cache: label expression -> (result, timestamp). */
+    private transient volatile ConcurrentHashMap<String, CachedResult> labelCache;
+


scriptCache/labelCache are unbounded ConcurrentHashMaps and TTL is only checked on reads; expired entries are never evicted. In long-lived controllers with many distinct label expressions or scripts, these maps can grow without bound (per resource). Consider using a bounded cache with time-based eviction (e.g., Caffeine/Guava) or removing entries when they are detected as expired.

Copilot · 2026-03-25T20:16:03Z

@@ -47,7 +48,11 @@ public static void freePostMortemResources() {
                                + " due post mortem job: "
                                + resource.getBuildName());
                resource.recycle();
+                freedAny = true;
            }
        }
+        if (freedAny) {
+            LockableResourcesManager.scheduleQueueMaintenance();
+        }


freePostMortemResources() holds lrm.syncResources while calling resource.recycle(). Since recycle() calls back into LockableResourcesManager.recycle(..), which calls unlockResources(..) / unreserve(..) that now trigger scheduleQueueMaintenance(), this can end up attempting to acquire the Jenkins Queue lock while still holding syncResources (lock-order inversion / potential deadlock). Consider collecting the orphan resources under syncResources, releasing the lock, then recycling them and scheduling queue maintenance once afterwards.

Copilot · 2026-03-25T20:16:04Z

+    private ScheduledExecutorService getSaveExecutor() {
+        ScheduledExecutorService se = saveExecutor;
+        if (se == null) {
+            synchronized (this) {
+                se = saveExecutor;
+                if (se == null) {
+                    saveExecutor = se = Executors.newSingleThreadScheduledExecutor(r -> {
+                        Thread t = new Thread(r, "lockable-resources-async-save");
+                        t.setDaemon(true);
+                        return t;
+                    });
+                }


The async-save implementation creates a dedicated ScheduledExecutorService but there is no lifecycle management (shutdown) when Jenkins reloads configuration / the plugin is reloaded, which can leave daemon threads around. Prefer using Jenkins' shared timer (jenkins.util.Timer) or add explicit cleanup (e.g., via an @Initializer/@Terminator hook) so executors do not leak across reloads.

Copilot · 2026-03-25T20:16:04Z

+        // Resolve candidates outside syncResources when possible — label matching
+        // and Groovy script evaluation are heavyweight and should not extend the
+        // critical section.
+        List<LockableResource> candidates = null;
+        if (candidatesByScript || (requiredResources.label != null && !requiredResources.label.isEmpty())) {
+            candidates = cachedCandidates.getIfPresent(queueItemId);
+            if (candidates == null) {
+                candidates = (systemGroovyScript == null)
+                        ? getResourcesWithLabel(requiredResources.label)
+                        : getResourcesMatchingScript(systemGroovyScript, params);


The comment says candidates are resolved “outside syncResources”, but both getResourcesWithLabel(..) and getResourcesMatchingScript(..) acquire synchronized(syncResources) internally, so label/script evaluation is still performed under the plugin lock. Either update the comment to reflect the actual behavior (lock is released and reacquired) or refactor to evaluate against a snapshot of resources so the heavyweight work can truly happen without holding syncResources.

samrocketman · 2026-03-25T20:31:44Z

I have deployed this to my staging environment and it seems performant.

@Terminator

with a @Terminator flush hook Default asyncSaveEnabled to true and add a @Terminator shutdown hook that flushes any pending coalesced save synchronously before Jenkins exits. Also add a null guard in uncacheIfFreeing for defensive safety.

sgleske-ias · 2026-03-25T21:37:40Z

I verified cache fields added are marked transient the change is safe for XStream deserialization on Jenkins boot.

sgleske-ias · 2026-03-25T21:55:50Z

Finished staging testing so we're rolling this out to our production. I consider the patches OK

samrocketman · 2026-03-26T00:34:17Z

We're operating with this in production.

mPokornyETM · 2026-03-27T09:40:04Z

@samrocketman I happy, that someone else care about this plugin. Please check the comments from copilot (seems to be correct) and add tests to improve code coverage. Thx

github-actions · 2026-03-31T10:04:32Z

✅ Auto-approved: No review received within 3 days. Merging now.

github-actions

Auto-approved: no review received within 3 days of opening.

samrocketman · 2026-03-31T12:46:45Z

@mPokornyETM I haven't forgot just haven't had a chance to address, yet. A lot of my open source work is in spare time which sometimes has competing priorities.

mPokornyETM · 2026-03-31T14:57:13Z

@samrocketman it is approved and merged. I hope it was not failure ;-) Please take a time and check the open things later to make this plugin awesome

samrocketman requested a review from a team as a code owner March 25, 2026 20:02

samrocketman mentioned this pull request Mar 25, 2026

New system property for queue maintenance interval to be tunable jenkinsci/jenkins#26309

Draft

14 tasks

MarkEWaite requested a review from Copilot March 25, 2026 20:09

Copilot started reviewing on behalf of MarkEWaite March 25, 2026 20:10 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

samrocketman added 2 commits March 25, 2026 16:46

Fix async save state loss on restart

38ceef2

with a @Terminator flush hook Default asyncSaveEnabled to true and add a @Terminator shutdown hook that flushes any pending coalesced save synchronously before Jenkins exits. Also add a null guard in uncacheIfFreeing for defensive safety.

spotless apply

aecec79

MarkEWaite added the enhancement label Mar 28, 2026

Merge branch 'master' into ludicrous-mode

931fc9d

github-actions Bot added the merged-without-review Auto-approve countdown label Mar 31, 2026

github-actions Bot approved these changes Mar 31, 2026

View reviewed changes

github-actions Bot enabled auto-merge (squash) March 31, 2026 10:04

mPokornyETM removed the merged-without-review Auto-approve countdown label Mar 31, 2026

Merge branch 'master' into ludicrous-mode

65c0c3d

github-actions Bot merged commit cfceea4 into jenkinsci:master Mar 31, 2026
20 checks passed

Conversation

samrocketman commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is ludicrous speed?

Background

Patch details

Original Patch series

AI Analysis

Testing done

Advisory

Submitter checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

samrocketman commented Mar 25, 2026

Uh oh!

sgleske-ias commented Mar 25, 2026

Uh oh!

sgleske-ias commented Mar 25, 2026

Uh oh!

samrocketman commented Mar 26, 2026

Uh oh!

mPokornyETM commented Mar 27, 2026

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

samrocketman commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mPokornyETM commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

samrocketman commented Mar 25, 2026 •

edited

Loading

samrocketman commented Mar 31, 2026 •

edited

Loading