fix: drain accept queues on gthread worker shutdown to prevent connection resets#3612
Open
wckao wants to merge 1 commit intobenoitc:masterfrom
Open
fix: drain accept queues on gthread worker shutdown to prevent connection resets#3612wckao wants to merge 1 commit intobenoitc:masterfrom
wckao wants to merge 1 commit intobenoitc:masterfrom
Conversation
When self.alive becomes False (via max_requests, post_request hook, or
SIGTERM), the main loop exits. With SO_REUSEPORT each worker owns its
listener socket — connections that completed the TCP handshake but were
not yet accept()-ed sit in the kernel backlog and receive a RST when
the socket is later closed.
Add a two-pass drain-then-close sequence per listener:
1. Unregister listeners from the poller (set_accept_enabled(False))
2. For each listener:
a. First pass: accept() all pending connections
b. Brief pause (10 ms) for in-flight TCP handshakes to land
c. Second pass: accept() stragglers
d. Close the listener socket immediately
Accepted connections are submitted to the thread pool and waited on
during the existing graceful-timeout drain loop. After the close, new
clients get a clean ECONNREFUSED instead of a RST.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
gthreadworker's shutdown sequence has a race condition that causesConnection reset by peererrors for clients whenreuse_port=True. When a worker exits its main loop (viamax_requests, apost_requesthook settingworker.alive = False, orSIGTERM), connections that completed the TCP handshake but were not yetaccept()-ed by the application sit in the kernel's accept queue. WithSO_REUSEPORTeach worker owns its listener socket, so these orphaned connections receive a TCP RST when the socket is closed.Root cause
In
ThreadWorker.run(), the event loop'saccept()callback dequeues one connection per selector event. Whenself.alivebecomesFalse, the loop exits immediately — any remaining connections in the kernel backlog are never application-accepted. The listener socket is closed at the end ofrun(), which sends RST to those connections.Without
SO_REUSEPORTthe bug is masked because the arbiter holds the shared listener open and the replacement worker picks up the backlogged connections. Withreuse_port=Trueeach worker's socket is independent, so closing it destroys its backlog.Fix
Add a two-pass drain-then-close sequence executed once the main loop exits:
set_accept_enabled(False)).accept()all pending connections from the kernel backlog.accept()stragglers.close()the listener socket immediately after draining.Accepted connections are submitted to the thread pool and waited on during the existing graceful-timeout drain loop. After the close, new clients get a clean
ECONNREFUSEDinstead of a RST.Changes
gunicorn/workers/gthread.py_drain_listener(listener): accepts all pending connections from a single listener._drain_accept_queues(close_listeners=False): iterates listeners, performs two-pass drain, optionally closes each socket immediately after draining.run()shutdown sequence: calls_drain_accept_queues(close_listeners=True)after disabling accept, and removes the now-redundant lates.close()loop.tests/test_gthread.pyTestDrainAcceptQueuescovering: drain all pending, empty queue no-op, multiple listeners,ECONNABORTEDhandling,close_listenersflag, integration withrun(), and grace-period processing.How to reproduce
Set
max_requests=5,reuse_port=True, singlegthreadworker, and fire bursts of 30 concurrent requests:Integration test script (standalone, not part of the committed change)
Test results
Unit tests — 84 passed (8 new + 76 existing), 0 failures.
Integration test (script above, 10 cycles x 30 concurrent requests per cycle):
Notes
syncworker is not affected — it has no poller/thread pool and leaves the shared listener open for other workers.reuse_port, the arbiter holds the shared listener open across restarts, so backlogged connections survive and are picked up by the replacement worker. The bug only manifests withreuse_port=True.