Fix K8s token refresh by caching K8sClient at executor level#6925
Merged
bentsherman merged 1 commit intomasterfrom Mar 16, 2026
Merged
Fix K8s token refresh by caching K8sClient at executor level#6925bentsherman merged 1 commit intomasterfrom
bentsherman merged 1 commit intomasterfrom
Conversation
Move the Guava cache from K8sConfig to K8sExecutor so that the K8sClient object itself is cached (not just the ClientConfig). This avoids re-creating K8sClient (including SSL setup) on every invocation while still refreshing the service account token when the configured interval expires. Fixes #6918 Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
adamrtalbot
reviewed
Mar 16, 2026
| */ | ||
| protected K8sClient getClient() { | ||
| client | ||
| clientCache.get('client', () -> new K8sClient(k8sConfig.getClient())) |
Collaborator
There was a problem hiding this comment.
Took me a sec to get this, so I'll leave a comment here for future readers. The client config (including token) is cached, but the k8s client itself is recreated on every submit task. This means on a new client is created every time but it's very lightweight. While building a client, it will use the config which is refreshed every 50 minutes (default value).
This is all handled by getClient() reaches into the Guava cache instead of getting a fresh client. Guava cache handles use-or-get logic.
bentsherman
approved these changes
Mar 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #6918 — supersedes #6920
Problem
PR #6742 added a Guava cache to
K8sConfig.getClient()with a 50-minute expiry to refresh the service account token. However, the cache was never consulted after startup because:K8sExecutor.register()calledk8sConfig.getClient()once and stored the result in a privateK8sClientfieldK8sExecutor.getClient()returned that stored field directlyK8sTaskHandlerstoredexecutor.clientin its constructor, bypassinggetClient()entirelyThis caused 401 Unauthorized errors after ~60 minutes on clusters with short-lived projected service account tokens (e.g. AKS, RKE2).
Why not #6920
PR #6920 correctly identified the root cause but had two issues:
K8sClienton everygetClient()call — the Guava cache inK8sConfigonly cachedClientConfig, so every poll/submit/delete call triggerednew K8sClient(...)which re-parses SSL certificates and initializes the trust managerK8sConfigis a configuration object; the client lifecycle belongs in the executorFix
K8sConfigtoK8sExecutor— cache theK8sClientitself (not justClientConfig), avoiding redundant SSL setup on every API callK8sConfig.getClient()becomes a plain factory method that creates a freshClientConfig(re-reading the token from disk)K8sTaskHandlerusesexecutor.getClient()instead of direct field accessK8sExecutorTestcovering cache hit and expiration behavior