Commit 3b368ca
[SPARK-54197][K8S] Improve
### What changes were proposed in this pull request?
The current code handling deletion of Failed or Succeeded driver Pods is calling the Kubernetes API to delete objects until either the Kubelet as started the termination the Pod (the status of the object is terminating).
However, depending on configuration, the ExecutorPodsLifecycleManager loop might run multiple times before the Kubelet starts the deletion of the Pod object, resulting in un-necessary DELETE calls to the Kubernetes API, which are particularly expensive since they are served from Etcd.
Following the Kubernetes API specifications in https://kubernetes.io/docs/reference/using-api/api-concepts/
> When a client first sends a delete to request the removal of a resource, the .metadata.deletionTimestamp is set to the current time. Once the .metadata.deletionTimestamp is set, external controllers that act on finalizers may start performing their cleanup work at any time, in any order.
we can assume that whenever the deletionTimestamp is set on a Pod, this will be eventually terminated without the need of additional DELETE calls.
### Why are the changes needed?
This change is required to remove the need of redundant API calls agains the Kubernetes API that at scale might lead to excessive load against Etcd.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
This patch includes unit-tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #52898
Closes #52902 from dongjoon-hyun/driver-do-not-call-delete-for-terminating-pods-master.
Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Andrea Tosatto <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>ExecutorsPodsLifecycleManager not to request to delete if deletionTimestamp exists1 parent bc58b6e commit 3b368ca
File tree
2 files changed
+39
-4
lines changed- resource-managers/kubernetes/core/src
- main/scala/org/apache/spark/scheduler/cluster/k8s
- test/scala/org/apache/spark/scheduler/cluster/k8s
2 files changed
+39
-4
lines changedLines changed: 13 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
201 | 201 | | |
202 | 202 | | |
203 | 203 | | |
204 | | - | |
205 | | - | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
206 | 214 | | |
207 | 215 | | |
208 | 216 | | |
| |||
211 | 219 | | |
212 | 220 | | |
213 | 221 | | |
214 | | - | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
215 | 225 | | |
216 | 226 | | |
217 | 227 | | |
| |||
Lines changed: 26 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| |||
219 | 219 | | |
220 | 220 | | |
221 | 221 | | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
222 | 247 | | |
223 | 248 | | |
224 | 249 | | |
| |||
0 commit comments