You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduced an opt-in concurrency backend that used Kubernetes Lease
objects and PipelineRun annotations for queue coordination. This
addressed potential state drift and race conditions during watcher
restarts or API delays by storing the queue state in the cluster
instead of only in memory. Added a global configuration setting to
choose between the legacy memory backend and the new lease-based
logic. Updated the queue manager interface to support context-aware
operations and ensured that stale claims were automatically
reclaimed via TTL-based expiration.
The system preserved existing lease objects during the release process
instead of completely deleting them. This reused the same records
across multiple acquisition cycles, which reduced excessive
communication overhead with the cluster.
AI-assisted-by: Cursor (Codex)
Signed-off-by: Chmouel Boudjnah <chmouel@redhat.com>
This page illustrates how Pipelines-as-Code manages concurrent PipelineRun execution. When you set a concurrency limit on a Repository CR, Pipelines-as-Code queues incoming PipelineRuns and starts them only when capacity allows.
6
6
7
+
The watcher supports two queue backends controlled by the global `concurrency-backend` setting in the `pipelines-as-code` ConfigMap:
8
+
9
+
-`memory` keeps queue state in the watcher process. This is the historical behavior and remains the default.
10
+
-`lease` stores queue coordination in Kubernetes using `Lease` objects and short-lived PipelineRun claims. This mode is more resilient when the watcher restarts or the cluster is slow to reconcile updates.
C -->|No| D[Create PipelineRun with state=started]
21
+
C -->|Yes| E[Create PipelineRun with state=queued and spec.status=pending]
22
+
23
+
D --> F[Watcher reconciles started PipelineRun]
24
+
E --> G[Watcher reconciles queued PipelineRun]
25
+
26
+
G --> H{Queue backend}
27
+
H -->|memory| I[Use in-process semaphore]
28
+
H -->|lease| J[Acquire per-repository Lease and inspect live PipelineRuns]
29
+
30
+
I --> K{Capacity available?}
31
+
J --> K
32
+
K -->|No| L[Keep PipelineRun queued]
33
+
K -->|Yes| M[Claim candidate and patch state=started]
34
+
35
+
M --> F
36
+
F --> N{PipelineRun done?}
37
+
N -->|No| F
38
+
N -->|Yes| O[Report final status]
39
+
O --> P[Release slot and try next queued run]
40
+
P --> G
41
+
```
42
+
43
+
## Backend selection
44
+
45
+
To enable the Kubernetes-backed queue coordination, set the global config to:
38
46
47
+
```yaml
48
+
data:
49
+
concurrency-backend: "lease"
39
50
```
51
+
52
+
Restart the watcher after changing `concurrency-backend`; the backend is selected at startup.
53
+
54
+
When `lease` mode is enabled, Pipelines-as-Code still uses the existing `queued`, `started`, and `completed` PipelineRun states. The difference is that promotion of the next queued PipelineRun is serialized with a per-repository `Lease`, which reduces queue drift during cluster/API instability.
55
+
56
+
## How lease promotion works
57
+
58
+
When the watcher reconciles a queued PipelineRun under the `lease` backend, it follows this sequence:
59
+
60
+
1. Acquire the per-repository Kubernetes Lease (retry up to 20 times with 100 ms delay).
61
+
2. List live PipelineRuns for that repository.
62
+
3. Separate them into running, claimed, and claimable queued runs.
63
+
4. Compute available capacity: `concurrency_limit - running - claimed`.
64
+
5. Patch one or more queued runs with short-lived claim annotations (`queue-claimed-by`, `queue-claimed-at`).
65
+
6. Release the repository Lease.
66
+
7. Re-fetch the claimed run and patch it to `started`.
67
+
68
+
If promotion fails at step 7, the watcher records the failure on the PipelineRun, clears the claim, and another reconcile retries later.
69
+
70
+
Claims expire after **30 seconds**. If a watcher crashes or stalls before completing promotion, another instance can pick up the run once the claim expires.
71
+
72
+
## Recovery loop
73
+
74
+
When the `lease` backend is active, the watcher starts a background recovery loop that runs every **31 seconds** (claim TTL + 1 s buffer). It looks for repositories where:
75
+
76
+
- there is no started PipelineRun
77
+
- there is no queued PipelineRun with an active (unexpired) claim
78
+
- there is still at least one recoverable queued PipelineRun
79
+
80
+
A queued PipelineRun is recoverable when it has `state=queued`, `spec.status=Pending`, is not done or cancelled, and has a valid `execution-order` annotation.
81
+
82
+
When a candidate is found, the recovery loop clears stale debug annotations and re-enqueues the oldest recoverable run so normal promotion logic runs again.
83
+
84
+
## Debugging the Lease Backend
85
+
86
+
When `concurrency-backend: "lease"` is enabled, queued `PipelineRun`s expose queue debugging state directly in annotations:
| Run stuck queued, nothing running | `waiting_for_slot` | Completed run was not cleaned up or finalizer is stuck | Check if a `started` PipelineRun still exists for the repo. If it is done but state was not updated, delete it or patch its state to `completed`. |
130
+
| Run stuck queued, another run is running | `waiting_for_slot` | Normal — the run is waiting for the active run to finish. | No action needed unless the running PipelineRun is itself stuck. |
131
+
| Run keeps cycling between queued and claimed | `claim_active` | Two watcher replicas are contending for the same run. | Wait for the claim to expire (30 s). If it persists, check watcher logs for lease acquisition errors. |
132
+
| Run shows promotion failures | `promotion_failed` | The watcher failed to patch the run to `started` (API error, webhook, or admission rejection). | Check `queue-promotion-last-error` and `queue-promotion-retries` annotations. Resolve the underlying API or admission error. |
133
+
| Run was recovered but is stuck again | `recovery_requeued` | The recovery loop re-enqueued the run but promotion failed again on the next attempt. | Check for repeated `QueuePromotionFailed` events on the repository. The underlying issue (permissions, resource quota, webhook) must be fixed. |
134
+
| Run is queued but marked not recoverable | `not_recoverable` | The run was cancelled, completed, or lost its `execution-order` annotation. | Inspect the PipelineRun — if it should still run, re-apply the `execution-order` annotation manually. |
135
+
136
+
If the queue decision and events do not explain the behavior, switch the watcher to debug logging and grep for the repository key and PipelineRun key. The lease backend logs include lease acquisition attempts, active claim evaluation, queue-state snapshots, and recovery loop selections.
Selects the queue coordination backend used by the watcher. Supported values:
350
+
351
+
- `memory`: in-process queue tracking. This is the default and matches the historical behavior.
352
+
- `lease`: Kubernetes-backed coordination using `Lease` objects and short-lived PipelineRun claims for improved recovery during watcher restarts or API instability. This backend is Technology Preview.
Copy file name to clipboardExpand all lines: docs/content/docs/guides/repository-crd/concurrency.md
+97-10Lines changed: 97 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,26 +3,113 @@ title: Concurrency
3
3
weight: 2
4
4
---
5
5
6
-
This page explains how to limit the number of concurrent PipelineRuns for a Repository CR and how to integrate with Kueue for Kubernetes-native job queueing. Use concurrency limits when you need to control resource consumption or prevent PipelineRuns from overwhelming your cluster.
6
+
Use `spec.concurrency_limit` on a Repository CR to cap how many `PipelineRun`s may run at once for that repository.
7
+
This is useful when you need to control cluster usage, preserve ordering for related runs, or avoid a burst of webhook events starting too many `PipelineRun`s at once.
7
8
8
-
Set the `concurrency_limit` field to define the maximum number of PipelineRuns running at any time for a Repository CR. This prevents resource exhaustion and ensures predictable scheduling when multiple events arrive in rapid succession.
9
+
## Repository setting
10
+
11
+
Set the `concurrency_limit` field on the Repository CR:
9
12
10
13
```yaml
11
14
spec:
12
15
concurrency_limit: <number>
13
16
```
14
17
15
-
When multiple PipelineRuns match the event, Pipelines-as-Code starts them in alphabetical order by PipelineRun name.
18
+
When a webhook event produces multiple `PipelineRun`s for the same repository:
19
+
20
+
- the controller creates them with an `execution-order` annotation
21
+
- runs that cannot start immediately are created as `state=queued` with Tekton `spec.status=pending`
22
+
- the watcher promotes queued runs to `state=started` only when repository capacity is available
23
+
24
+
If `concurrency_limit: 1`, only one run for that repository is active at a time and the rest stay queued until the watcher promotes them.
25
+
26
+
## End-to-end flow
27
+
28
+
1. The controller decides whether the repository is concurrency-limited.
29
+
2. If there is no limit, it creates `PipelineRun`s directly in `started`.
30
+
3. If there is a limit, it creates `PipelineRun`s in `queued` and records `execution-order`.
31
+
4. The watcher reconciles every `PipelineRun` that has a Pipelines-as-Code state annotation.
32
+
5. For queued runs, the watcher asks the selected queue backend whether a slot is available.
33
+
6. If a run is selected, the watcher patches it to `started`.
34
+
7. When a started run finishes, the watcher reports status and asks the backend for the next queued candidate.
- Each repository uses a Kubernetes `Lease` as a short critical section.
85
+
- The watcher recomputes queue state from live `PipelineRun`s rather than trusting only process memory.
86
+
- A queued run is considered temporarily reserved when it carries short-lived claim annotations (`queue-claimed-by` and `queue-claimed-at`). If the watcher crashes or stalls, another instance can recover after the claim expires.
87
+
- The watcher sorts candidates using the recorded `execution-order`, then falls back to creation time.
88
+
- A background recovery loop re-enqueues the oldest recoverable queued run when a repository has no active started run and no active claim.
89
+
90
+
This backend is designed for environments where the watcher may restart, the API server is slow, or promotion to `started` can fail transiently.
91
+
92
+
For debugging annotations, queue decisions, events, and the full promotion flow see [Advanced Concurrency]({{< relref "/docs/advanced/concurrency" >}}).
93
+
94
+
## Choosing the backend
95
+
96
+
Select the global backend in the Pipelines-as-Code ConfigMap:
97
+
98
+
```yaml
99
+
data:
100
+
concurrency-backend: "memory"
101
+
```
102
+
103
+
or:
104
+
105
+
```yaml
106
+
data:
107
+
concurrency-backend: "lease"
108
+
```
18
109
19
-
If you have three PipelineRuns in your `.tekton/` directory and you create a pull
20
-
request with a `concurrency_limit` of 1 in the repository configuration,
21
-
Pipelines-as-Code executes all PipelineRuns in alphabetical order, one after the
22
-
other. At any given time, only one PipelineRun is in the running state,
23
-
while the rest are queued.
110
+
Changing this setting requires restarting the watcher so it can recreate the queue manager with the new backend.
24
111
25
-
For additional concurrency strategies and global configuration options, see [Advanced Concurrency]({{< relref "/docs/advanced/concurrency" >}}).
112
+
For the global `concurrency-backend` setting itself, see [ConfigMap Reference]({{< relref "/docs/api/configmap" >}}).
0 commit comments