Skip to content

fix: prevent connection termination from load balancers#18353

Open
calexiou wants to merge 3 commits intomasterfrom
fix/helm-chart-rollout-reliability
Open

fix: prevent connection termination from load balancers#18353
calexiou wants to merge 3 commits intomasterfrom
fix/helm-chart-rollout-reliability

Conversation

@calexiou
Copy link
Copy Markdown
Contributor

@calexiou calexiou commented Mar 19, 2026

Description

  • Increase preStop delays (proxy 10→45s, apps 5→45s, worker 10→45s) to
    exceed ALB health check detection time (interval×threshold = 30s)
  • Increase ALB deregistration delay (30→60s) to cover extended preStop
  • Add slow_start.duration_seconds=30 to ALB target group attributes
  • Enable PodDisruptionBudgets by default (minAvailable: 1)

Summary by cubic

Prevents 503s and dropped connections during rolling updates by aligning pod shutdown with AWS ALB draining. Increases default replicas to keep capacity during PDB-enforced drains.

  • Bug Fixes

    • Set preStopDelaySeconds=45s and terminationGracePeriodSeconds=90s for proxy, apps, and worker.
    • Update ALB: deregistrationDelay=60s, add slow_start.duration_seconds=30 to target group attributes.
    • Enable PodDisruptionBudgets by default (minAvailable: 1).
    • Set default replicaCount to 2 for proxy, apps, and worker to avoid blocked node drains.
  • Migration

    • If you override timings, ensure: preStopDelaySeconds > ALB health check detection time, and deregistrationDelay > preStopDelaySeconds.
    • When PDBs are enabled, set replicaCount >= 2 for each service.

Written for commit cd3aae3. Summary will update on new commits.

@calexiou calexiou self-assigned this Mar 19, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files

Confidence score: 3/5

  • There is concrete operational risk in charts/budibase/values.yaml: enabling a default PodDisruptionBudget with single-replica defaults can block voluntary evictions, so stock installs may fail node drains.
  • The preStop change in charts/budibase/values.yaml is also user-impacting: a 45s sleep within a 60s termination grace period leaves only ~15s for SIGTERM handling before SIGKILL, which can undermine graceful shutdown during drains.
  • Given the high-confidence, medium/high-severity runtime behaviors (sev 7/10 and 6/10), this sits in moderate merge-risk territory rather than a safe-to-merge state.
  • Pay close attention to charts/budibase/values.yaml - default PDB/replica interplay and termination timing need to be aligned to avoid drain and shutdown regressions.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="charts/budibase/values.yaml">

<violation number="1" location="charts/budibase/values.yaml:7">
P1: Enabling the default PDB blocks voluntary eviction of the chart's default single-replica services. A stock install will fail node drains unless users also raise the replica counts.</violation>

<violation number="2" location="charts/budibase/values.yaml:185">
P2: This 45s preStop sleep consumes most of the 60s termination grace period, leaving only ~15s for the process to handle SIGTERM before SIGKILL. Increase the grace period with the new delay, or the longer drain can still end in forced termination.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="charts/budibase/values.yaml">

<violation number="1" location="charts/budibase/values.yaml:171">
P2: Autoscaling still allows proxy/apps/worker to scale down to 1 replica, so the new default PDBs can still block node drains.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

port: 10000
# -- The number of proxy replicas to run.
replicaCount: 1
# -- The number of proxy replicas to run. Must be >= 2 when
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Autoscaling still allows proxy/apps/worker to scale down to 1 replica, so the new default PDBs can still block node drains.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At charts/budibase/values.yaml, line 171:

<comment>Autoscaling still allows proxy/apps/worker to scale down to 1 replica, so the new default PDBs can still block node drains.</comment>

<file context>
@@ -168,8 +168,9 @@ services:
     port: 10000
-    # -- The number of proxy replicas to run.
-    replicaCount: 1
+    # -- The number of proxy replicas to run. Must be >= 2 when
+    # podDisruptionBudget is enabled, otherwise node drains will be blocked.
+    replicaCount: 2
</file context>
Fix with Cubic

@github-actions github-actions bot added the stale label Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant