fix: prevent connection termination from load balancers#18353
Open
fix: prevent connection termination from load balancers#18353
Conversation
Contributor
There was a problem hiding this comment.
2 issues found across 2 files
Confidence score: 3/5
- There is concrete operational risk in
charts/budibase/values.yaml: enabling a default PodDisruptionBudget with single-replica defaults can block voluntary evictions, so stock installs may fail node drains. - The
preStopchange incharts/budibase/values.yamlis also user-impacting: a 45s sleep within a 60s termination grace period leaves only ~15s for SIGTERM handling before SIGKILL, which can undermine graceful shutdown during drains. - Given the high-confidence, medium/high-severity runtime behaviors (sev 7/10 and 6/10), this sits in moderate merge-risk territory rather than a safe-to-merge state.
- Pay close attention to
charts/budibase/values.yaml- default PDB/replica interplay and termination timing need to be aligned to avoid drain and shutdown regressions.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="charts/budibase/values.yaml">
<violation number="1" location="charts/budibase/values.yaml:7">
P1: Enabling the default PDB blocks voluntary eviction of the chart's default single-replica services. A stock install will fail node drains unless users also raise the replica counts.</violation>
<violation number="2" location="charts/budibase/values.yaml:185">
P2: This 45s preStop sleep consumes most of the 60s termination grace period, leaving only ~15s for the process to handle SIGTERM before SIGKILL. Increase the grace period with the new delay, or the longer drain can still end in forced termination.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Contributor
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="charts/budibase/values.yaml">
<violation number="1" location="charts/budibase/values.yaml:171">
P2: Autoscaling still allows proxy/apps/worker to scale down to 1 replica, so the new default PDBs can still block node drains.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| port: 10000 | ||
| # -- The number of proxy replicas to run. | ||
| replicaCount: 1 | ||
| # -- The number of proxy replicas to run. Must be >= 2 when |
Contributor
There was a problem hiding this comment.
P2: Autoscaling still allows proxy/apps/worker to scale down to 1 replica, so the new default PDBs can still block node drains.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At charts/budibase/values.yaml, line 171:
<comment>Autoscaling still allows proxy/apps/worker to scale down to 1 replica, so the new default PDBs can still block node drains.</comment>
<file context>
@@ -168,8 +168,9 @@ services:
port: 10000
- # -- The number of proxy replicas to run.
- replicaCount: 1
+ # -- The number of proxy replicas to run. Must be >= 2 when
+ # podDisruptionBudget is enabled, otherwise node drains will be blocked.
+ replicaCount: 2
</file context>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
exceed ALB health check detection time (interval×threshold = 30s)
Summary by cubic
Prevents 503s and dropped connections during rolling updates by aligning pod shutdown with AWS ALB draining. Increases default replicas to keep capacity during PDB-enforced drains.
Bug Fixes
preStopDelaySeconds=45sandterminationGracePeriodSeconds=90sfor proxy, apps, and worker.deregistrationDelay=60s, addslow_start.duration_seconds=30to target group attributes.minAvailable: 1).replicaCountto 2 for proxy, apps, and worker to avoid blocked node drains.Migration
preStopDelaySeconds> ALB health check detection time, andderegistrationDelay>preStopDelaySeconds.replicaCount >= 2for each service.Written for commit cd3aae3. Summary will update on new commits.