[SPARK-54197][K8S] Improve ExecutorsPodsLifecycleManager not to request to delete if deletionTimestamp exists#52902
Closed
dongjoon-hyun wants to merge 3 commits intoapache:masterfrom
Conversation
…etionTimestamp is set on the Pod
Member
Author
|
To the reviewers, this aims to help the following community PR as a quick workaround. He will set up his GitHub Action later. Could you review this PR, @peter-toth and @attilapiros ? |
Member
Author
|
I fixed the main code compilation and unit test. Let's see the K8s integration test result, @atosatto |
HyukjinKwon
approved these changes
Nov 5, 2025
attilapiros
reviewed
Nov 5, 2025
...ore/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala
Outdated
Show resolved
Hide resolved
…ark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
Member
Author
|
Thank you, @HyukjinKwon and @attilapiros . |
dongjoon-hyun
added a commit
that referenced
this pull request
Nov 6, 2025
…uest to delete if `deletionTimestamp` exists ### What changes were proposed in this pull request? The current code handling deletion of Failed or Succeeded driver Pods is calling the Kubernetes API to delete objects until either the Kubelet as started the termination the Pod (the status of the object is terminating). However, depending on configuration, the ExecutorPodsLifecycleManager loop might run multiple times before the Kubelet starts the deletion of the Pod object, resulting in un-necessary DELETE calls to the Kubernetes API, which are particularly expensive since they are served from Etcd. Following the Kubernetes API specifications in https://kubernetes.io/docs/reference/using-api/api-concepts/ > When a client first sends a delete to request the removal of a resource, the .metadata.deletionTimestamp is set to the current time. Once the .metadata.deletionTimestamp is set, external controllers that act on finalizers may start performing their cleanup work at any time, in any order. we can assume that whenever the deletionTimestamp is set on a Pod, this will be eventually terminated without the need of additional DELETE calls. ### Why are the changes needed? This change is required to remove the need of redundant API calls agains the Kubernetes API that at scale might lead to excessive load against Etcd. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This patch includes unit-tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52898 Closes #52902 from dongjoon-hyun/driver-do-not-call-delete-for-terminating-pods-master. Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: Andrea Tosatto <atosatto@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 3b368ca) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Member
Author
|
Merged to master/4.1 for Apache Spark 4.1.0. Welcome to the Apache Spark community, @atosatto ! I added you to the Apache Spark contributor group and assigned SPARK-54197 to you. |
zifeif2
pushed a commit
to zifeif2/spark
that referenced
this pull request
Nov 22, 2025
…uest to delete if `deletionTimestamp` exists ### What changes were proposed in this pull request? The current code handling deletion of Failed or Succeeded driver Pods is calling the Kubernetes API to delete objects until either the Kubelet as started the termination the Pod (the status of the object is terminating). However, depending on configuration, the ExecutorPodsLifecycleManager loop might run multiple times before the Kubelet starts the deletion of the Pod object, resulting in un-necessary DELETE calls to the Kubernetes API, which are particularly expensive since they are served from Etcd. Following the Kubernetes API specifications in https://kubernetes.io/docs/reference/using-api/api-concepts/ > When a client first sends a delete to request the removal of a resource, the .metadata.deletionTimestamp is set to the current time. Once the .metadata.deletionTimestamp is set, external controllers that act on finalizers may start performing their cleanup work at any time, in any order. we can assume that whenever the deletionTimestamp is set on a Pod, this will be eventually terminated without the need of additional DELETE calls. ### Why are the changes needed? This change is required to remove the need of redundant API calls agains the Kubernetes API that at scale might lead to excessive load against Etcd. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This patch includes unit-tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52898 Closes apache#52902 from dongjoon-hyun/driver-do-not-call-delete-for-terminating-pods-master. Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: Andrea Tosatto <atosatto@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
huangxiaopingRD
pushed a commit
to huangxiaopingRD/spark
that referenced
this pull request
Nov 25, 2025
…uest to delete if `deletionTimestamp` exists ### What changes were proposed in this pull request? The current code handling deletion of Failed or Succeeded driver Pods is calling the Kubernetes API to delete objects until either the Kubelet as started the termination the Pod (the status of the object is terminating). However, depending on configuration, the ExecutorPodsLifecycleManager loop might run multiple times before the Kubelet starts the deletion of the Pod object, resulting in un-necessary DELETE calls to the Kubernetes API, which are particularly expensive since they are served from Etcd. Following the Kubernetes API specifications in https://kubernetes.io/docs/reference/using-api/api-concepts/ > When a client first sends a delete to request the removal of a resource, the .metadata.deletionTimestamp is set to the current time. Once the .metadata.deletionTimestamp is set, external controllers that act on finalizers may start performing their cleanup work at any time, in any order. we can assume that whenever the deletionTimestamp is set on a Pod, this will be eventually terminated without the need of additional DELETE calls. ### Why are the changes needed? This change is required to remove the need of redundant API calls agains the Kubernetes API that at scale might lead to excessive load against Etcd. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This patch includes unit-tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52898 Closes apache#52902 from dongjoon-hyun/driver-do-not-call-delete-for-terminating-pods-master. Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: Andrea Tosatto <atosatto@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
The current code handling deletion of Failed or Succeeded driver Pods is calling the Kubernetes API to delete objects until either the Kubelet as started the termination the Pod (the status of the object is terminating).
However, depending on configuration, the ExecutorPodsLifecycleManager loop might run multiple times before the Kubelet starts the deletion of the Pod object, resulting in un-necessary DELETE calls to the Kubernetes API, which are particularly expensive since they are served from Etcd.
Following the Kubernetes API specifications in https://kubernetes.io/docs/reference/using-api/api-concepts/
we can assume that whenever the deletionTimestamp is set on a Pod, this will be eventually terminated without the need of additional DELETE calls.
Why are the changes needed?
This change is required to remove the need of redundant API calls agains the Kubernetes API that at scale might lead to excessive load against Etcd.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
This patch includes unit-tests.
Was this patch authored or co-authored using generative AI tooling?
No.
Closes #52898