[grid] Close pre-handshake race in WebSocket proxy#17435
Conversation
Frames received from the upstream WebSocket before the client-side handshake completes used to traverse MessageOutboundConverter via the fallback consumer. After onUpgradeComplete strips the converter, an in-flight frame on the upstream listener thread could land in a pipeline that no longer had its Message-layer handlers and silently drop. Move the channel reference and a pre-handshake buffer into the listener: pre-handshake frames are queued, the buffer drains under a lock when the channel is handed over, and post-handshake calls take the lock-free fast path.
Review Summary by QodoBuffer pre-handshake WebSocket frames to prevent race condition
WalkthroughsDescription• Buffer pre-handshake WebSocket frames in DirectForwardingListener to prevent silent drops • Drain buffered frames in order when client channel becomes available post-handshake • Replace fallback Message consumer with lock-protected frame queue for deterministic ordering • Add unit test validating pre-handshake frame buffering and post-handshake fast path Diagramflowchart LR
A["Pre-handshake frames arrive"] -->|enqueue| B["DirectForwardingListener buffer"]
C["onUpgradeComplete fires"] -->|drain in order| B
B -->|write to channel| D["Client receives frames"]
E["Post-handshake frames"] -->|volatile read| F["Fast path direct write"]
F --> D
File Changes1. java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java
|
Code Review by Qodo
1.
|
Address two issues raised on SeleniumHQ#17435: - Pre-handshake onClose/onError used to discard the close details and publish the channel for normal forwarding when the handshake later landed, leaving the client open indefinitely. Record the code/reason and, on onUpgrade, drain any frames the upstream had queued and then write the close frame followed by closing the channel. - The pending deque had no cap, so a stalled handshake while the upstream keeps producing could grow it without bound. Cap at 128 frames; on overflow drop the buffer, latch a 1009 terminal state, and close the upstream so the channel sees a clean close once the upgrade fires.
|
Pushed
The Ruby Windows test failure is unrelated — |
|
Persistent review updated to latest commit bf52afc |
If pre-handshake onClose kept the buffered ref-counted WebSocketFrames "for a future onUpgrade drain" and the client-side handshake never landed (client disconnected mid-handshake, or the upgrade itself failed and triggered the upstream close), those frames were never released — a Netty buffer leak. Make onClose symmetric with onError: drop the buffer on pre-handshake close, while still recording the code/reason so onUpgrade can surface a clean close to the client if the handshake does fire later. The trade-off is small: a client that just received the 101 response cannot usefully consume frames if we are about to close them immediately afterward.
|
Good catch — pushed
Made Existing test updated to reflect the new behaviour (only the close frame on the wire after pre-handshake close + upgrade). |
|
Persistent review updated to latest commit 0cd57dd |
The Router has had a direct frame-forwarding path between the Netty pipeline and the upstream JDK WebSocket since db9b07a (2026-03-11, "[grid] Router WebSocket handle dropped close frames, idle disconnects, high-latency proxying", SeleniumHQ#17197). Once the client-side handshake completes, an inbound WebSocketFrameProxy forwards each Netty WebSocketFrame straight to the upstream WebSocket, and the outbound DirectForwardingListener writes upstream replies directly to the client channel. Together those removed the per-frame Message allocation and the executor hop in WebSocketMessageHandler on the Router side. The Node still did the full round-trip through MessageInboundConverter, WebSocketMessageHandler, the registered Consumer<Message>, and MessageOutboundConverter in both directions for every frame. Each frame allocated a TextMessage or BinaryMessage and hopped onto the channel executor on delivery. For a busy CDP or VNC session that is measurable allocation and executor-queue pressure on the Node. Apply the same PostUpgradeHook pattern on the Node side: the consumer returned from ProxyNodeWebsockets installs a WebSocketFrameProxy after the handshake so inbound frames forward straight to the browser-side WebSocket, and a DirectForwardingListener writes outbound frames directly to the client channel. Frames received before the handshake are buffered in arrival order and drained on handover, so a frame cannot land in a pipeline that has already had its Message-layer handlers removed. The hardening that the Router-side listener picked up in 8d8cf64 (2026-05-14, "[grid] Close pre-handshake race in WebSocket proxy", SeleniumHQ#17435) is mirrored on the Node listener: the pre-handshake buffer is capped at 128 frames with a 1009 close recorded on overflow; the close code and reason are recorded on pre-handshake close or error so a late onUpgrade can write a clean close frame to the client and tear the channel down rather than leaving it open; and the buffer is released on close so ref-counted frames cannot leak if the handshake never completes. The Node-specific behaviour is preserved: - Session-activity heartbeats (sessionConsumer.accept(sessionId)) fire per frame, both pre- and post-handshake. - The connectionReleased CAS still guards a single node.releaseConnection call across the close and error paths, including the overflow path introduced here. - VNC sessions still install a no-op heartbeat consumer so VNC traffic does not mark the session as recently active. The existing ProxyNodeWebsocketsTest continues to exercise the slot accounting, including the regression from SeleniumHQ#17197 where onError without a follow-on onClose used to leak the slot. New unit tests in NodeDirectForwardingListenerTest pin the per-frame heartbeat, the buffer-then-drain ordering, the surface-and-teardown behaviour on a pre-handshake close, and the overflow path's clean release of the session slot.
The Router has had a direct frame-forwarding path between the Netty pipeline and the upstream JDK WebSocket since db9b07a (2026-03-11, "[grid] Router WebSocket handle dropped close frames, idle disconnects, high-latency proxying", SeleniumHQ#17197). Once the client-side handshake completes, an inbound WebSocketFrameProxy forwards each Netty WebSocketFrame straight to the upstream WebSocket, and the outbound DirectForwardingListener writes upstream replies directly to the client channel. Together those removed the per-frame Message allocation and the executor hop in WebSocketMessageHandler on the Router side. The Node still did the full round-trip through MessageInboundConverter, WebSocketMessageHandler, the registered Consumer<Message>, and MessageOutboundConverter in both directions for every frame. Each frame allocated a TextMessage or BinaryMessage and hopped onto the channel executor on delivery. For a busy CDP or VNC session that is measurable allocation and executor-queue pressure on the Node. Apply the same PostUpgradeHook pattern on the Node side: the consumer returned from ProxyNodeWebsockets installs a WebSocketFrameProxy after the handshake so inbound frames forward straight to the browser-side WebSocket, and a DirectForwardingListener writes outbound frames directly to the client channel. Frames received before the handshake are buffered in arrival order and drained on handover, so a frame cannot land in a pipeline that has already had its Message-layer handlers removed. The hardening that the Router-side listener picked up in 8d8cf64 (2026-05-14, "[grid] Close pre-handshake race in WebSocket proxy", SeleniumHQ#17435) is mirrored on the Node listener: the pre-handshake buffer is capped at 128 frames with a 1009 close recorded on overflow; the close code and reason are recorded on pre-handshake close or error so a late onUpgrade can write a clean close frame to the client and tear the channel down rather than leaving it open; and the buffer is released on close so ref-counted frames cannot leak if the handshake never completes. The Node-specific behaviour is preserved: - Session-activity heartbeats (sessionConsumer.accept(sessionId)) fire per frame, both pre- and post-handshake. - The connectionReleased CAS still guards a single node.releaseConnection call across the close and error paths, including the overflow path introduced here. - VNC sessions still install a no-op heartbeat consumer so VNC traffic does not mark the session as recently active. The existing ProxyNodeWebsocketsTest continues to exercise the slot accounting, including the regression from SeleniumHQ#17197 where onError without a follow-on onClose used to leak the slot. New unit tests in NodeDirectForwardingListenerTest pin the per-frame heartbeat, the buffer-then-drain ordering, the surface-and-teardown behaviour on a pre-handshake close, and the overflow path's clean release of the session slot.
The Router has had a direct frame-forwarding path between the Netty pipeline and the upstream JDK WebSocket since db9b07a (2026-03-11, "[grid] Router WebSocket handle dropped close frames, idle disconnects, high-latency proxying", SeleniumHQ#17197). Once the client-side handshake completes, an inbound WebSocketFrameProxy forwards each Netty WebSocketFrame straight to the upstream WebSocket, and the outbound DirectForwardingListener writes upstream replies directly to the client channel. Together those removed the per-frame Message allocation and the executor hop in WebSocketMessageHandler on the Router side. The Node still did the full round-trip through MessageInboundConverter, WebSocketMessageHandler, the registered Consumer<Message>, and MessageOutboundConverter in both directions for every frame. Each frame allocated a TextMessage or BinaryMessage and hopped onto the channel executor on delivery. For a busy CDP or VNC session that is measurable allocation and executor-queue pressure on the Node. Apply the same PostUpgradeHook pattern on the Node side: the consumer returned from ProxyNodeWebsockets installs a WebSocketFrameProxy after the handshake so inbound frames forward straight to the browser-side WebSocket, and a DirectForwardingListener writes outbound frames directly to the client channel. Frames received before the handshake are buffered in arrival order and drained on handover, so a frame cannot land in a pipeline that has already had its Message-layer handlers removed. The hardening that the Router-side listener picked up in 8d8cf64 (2026-05-14, "[grid] Close pre-handshake race in WebSocket proxy", SeleniumHQ#17435) is mirrored on the Node listener: the pre-handshake buffer is capped at 128 frames with a 1009 close recorded on overflow; the close code and reason are recorded on pre-handshake close or error so a late onUpgrade can write a clean close frame to the client and tear the channel down rather than leaving it open; and the buffer is released on close so ref-counted frames cannot leak if the handshake never completes. Close-frame reasons coming from the upstream are now truncated to the 123-byte UTF-8 cap that RFC 6455 §5.5.1 imposes. The truncation uses a CharsetEncoder writing into a 120-byte buffer so it stops at a clean character boundary on overflow — a naive byte-truncate-then- decode could split a multi-byte sequence, produce a U+FFFD replacement on decode, and re-encode back over 123 bytes, breaking the close frame. The helper lives as a public static on WebSocketFrameProxy because both DirectForwardingListener classes already depend on that class. The Router-side listener that landed in SeleniumHQ#17435 had the same unchecked path; apply the helper there too so both proxies share the same safe behaviour. The Node-specific behaviour is preserved: - Session-activity heartbeats (sessionConsumer.accept(sessionId)) fire per frame, both pre- and post-handshake. - The connectionReleased CAS still guards a single node.releaseConnection call across the close and error paths, including the overflow path introduced here. - VNC sessions still install a no-op heartbeat consumer so VNC traffic does not mark the session as recently active. The existing ProxyNodeWebsocketsTest continues to exercise the slot accounting, including the regression from SeleniumHQ#17197 where onError without a follow-on onClose used to leak the slot. New unit tests in NodeDirectForwardingListenerTest pin the per-frame heartbeat, the buffer-then-drain ordering, the surface-and-teardown behaviour on a pre-handshake close, the overflow path's clean release of the session slot, and the safe truncation of an overlong upstream close reason that contains multi-byte UTF-8 characters. The shared helper has a focused unit test alongside it in WebSocketFrameProxyTest.
The Router has had a direct frame-forwarding path between the Netty pipeline and the upstream JDK WebSocket since db9b07a (2026-03-11, "[grid] Router WebSocket handle dropped close frames, idle disconnects, high-latency proxying", SeleniumHQ#17197). Once the client-side handshake completes, an inbound WebSocketFrameProxy forwards each Netty WebSocketFrame straight to the upstream WebSocket, and the outbound DirectForwardingListener writes upstream replies directly to the client channel. Together those removed the per-frame Message allocation and the executor hop in WebSocketMessageHandler on the Router side. The Node still did the full round-trip through MessageInboundConverter, WebSocketMessageHandler, the registered Consumer<Message>, and MessageOutboundConverter in both directions for every frame. Each frame allocated a TextMessage or BinaryMessage and hopped onto the channel executor on delivery. For a busy CDP or VNC session that is measurable allocation and executor-queue pressure on the Node. Apply the same PostUpgradeHook pattern on the Node side: the consumer returned from ProxyNodeWebsockets installs a WebSocketFrameProxy after the handshake so inbound frames forward straight to the browser-side WebSocket, and a DirectForwardingListener writes outbound frames directly to the client channel. Frames received before the handshake are buffered in arrival order and drained on handover, so a frame cannot land in a pipeline that has already had its Message-layer handlers removed. The hardening that the Router-side listener picked up in 8d8cf64 (2026-05-14, "[grid] Close pre-handshake race in WebSocket proxy", SeleniumHQ#17435) is mirrored on the Node listener: the pre-handshake buffer is capped at 128 frames with a 1009 close recorded on overflow; the close code and reason are recorded on pre-handshake close or error so a late onUpgrade can write a clean close frame to the client and tear the channel down rather than leaving it open; and the buffer is released on close so ref-counted frames cannot leak if the handshake never completes. Close-frame reasons coming from the upstream are now truncated to the 123-byte UTF-8 cap that RFC 6455 §5.5.1 imposes. The truncation uses a CharsetEncoder writing into a 120-byte buffer so it stops at a clean character boundary on overflow — a naive byte-truncate-then- decode could split a multi-byte sequence, produce a U+FFFD replacement on decode, and re-encode back over 123 bytes, breaking the close frame. The helper lives as a public static on WebSocketFrameProxy because both DirectForwardingListener classes already depend on that class. The Router-side listener that landed in SeleniumHQ#17435 had the same unchecked path; apply the helper there too so both proxies share the same safe behaviour. The Node-specific behaviour is preserved: - Session-activity heartbeats (sessionConsumer.accept(sessionId)) fire per frame, both pre- and post-handshake. - The connectionReleased CAS still guards a single node.releaseConnection call across the close and error paths, including the overflow path introduced here. - VNC sessions still install a no-op heartbeat consumer so VNC traffic does not mark the session as recently active. The existing ProxyNodeWebsocketsTest continues to exercise the slot accounting, including the regression from SeleniumHQ#17197 where onError without a follow-on onClose used to leak the slot. New unit tests in NodeDirectForwardingListenerTest pin the per-frame heartbeat, the buffer-then-drain ordering, the surface-and-teardown behaviour on a pre-handshake close, the overflow path's clean release of the session slot, and the safe truncation of an overlong upstream close reason that contains multi-byte UTF-8 characters. The shared helper has a focused unit test alongside it in WebSocketFrameProxyTest.
The Router has had a direct frame-forwarding path between the Netty pipeline and the upstream JDK WebSocket since db9b07a (2026-03-11, "[grid] Router WebSocket handle dropped close frames, idle disconnects, high-latency proxying", SeleniumHQ#17197). Once the client-side handshake completes, an inbound WebSocketFrameProxy forwards each Netty WebSocketFrame straight to the upstream WebSocket, and the outbound DirectForwardingListener writes upstream replies directly to the client channel. Together those removed the per-frame Message allocation and the executor hop in WebSocketMessageHandler on the Router side. The Node still did the full round-trip through MessageInboundConverter, WebSocketMessageHandler, the registered Consumer<Message>, and MessageOutboundConverter in both directions for every frame. Each frame allocated a TextMessage or BinaryMessage and hopped onto the channel executor on delivery. For a busy CDP or VNC session that is measurable allocation and executor-queue pressure on the Node. Apply the same PostUpgradeHook pattern on the Node side: the consumer returned from ProxyNodeWebsockets installs a WebSocketFrameProxy after the handshake so inbound frames forward straight to the browser-side WebSocket, and a DirectForwardingListener writes outbound frames directly to the client channel. Frames received before the handshake are buffered in arrival order and drained on handover, so a frame cannot land in a pipeline that has already had its Message-layer handlers removed. The hardening that the Router-side listener picked up in 8d8cf64 (2026-05-14, "[grid] Close pre-handshake race in WebSocket proxy", SeleniumHQ#17435) is mirrored on the Node listener: the pre-handshake buffer is capped at 128 frames with a 1009 close recorded on overflow; the close code and reason are recorded on pre-handshake close or error so a late onUpgrade can write a clean close frame to the client and tear the channel down rather than leaving it open; and the buffer is released on close so ref-counted frames cannot leak if the handshake never completes. Close-frame reasons coming from the upstream are now truncated to the 123-byte UTF-8 cap that RFC 6455 §5.5.1 imposes. The truncation uses a CharsetEncoder writing into a 120-byte buffer so it stops at a clean character boundary on overflow — a naive byte-truncate-then- decode could split a multi-byte sequence, produce a U+FFFD replacement on decode, and re-encode back over 123 bytes, breaking the close frame. The helper lives as a public static on WebSocketFrameProxy because both DirectForwardingListener classes already depend on that class. The Router-side listener that landed in SeleniumHQ#17435 had the same unchecked path; apply the helper there too so both proxies share the same safe behaviour. The Node-specific behaviour is preserved: - Session-activity heartbeats (sessionConsumer.accept(sessionId)) fire per frame, both pre- and post-handshake. - The connectionReleased CAS still guards a single node.releaseConnection call across the close and error paths, including the overflow path introduced here. - VNC sessions still install a no-op heartbeat consumer so VNC traffic does not mark the session as recently active. The existing ProxyNodeWebsocketsTest continues to exercise the slot accounting, including the regression from SeleniumHQ#17197 where onError without a follow-on onClose used to leak the slot. New unit tests in NodeDirectForwardingListenerTest pin the per-frame heartbeat, the buffer-then-drain ordering, the surface-and-teardown behaviour on a pre-handshake close, the overflow path's clean release of the session slot, and the safe truncation of an overlong upstream close reason that contains multi-byte UTF-8 characters. The shared helper has a focused unit test alongside it in WebSocketFrameProxyTest.
The Router has had a direct frame-forwarding path between the Netty pipeline and the upstream JDK WebSocket since db9b07a (2026-03-11, "[grid] Router WebSocket handle dropped close frames, idle disconnects, high-latency proxying", #17197). Once the client-side handshake completes, an inbound WebSocketFrameProxy forwards each Netty WebSocketFrame straight to the upstream WebSocket, and the outbound DirectForwardingListener writes upstream replies directly to the client channel. Together those removed the per-frame Message allocation and the executor hop in WebSocketMessageHandler on the Router side. The Node still did the full round-trip through MessageInboundConverter, WebSocketMessageHandler, the registered Consumer<Message>, and MessageOutboundConverter in both directions for every frame. Each frame allocated a TextMessage or BinaryMessage and hopped onto the channel executor on delivery. For a busy CDP or VNC session that is measurable allocation and executor-queue pressure on the Node. Apply the same PostUpgradeHook pattern on the Node side: the consumer returned from ProxyNodeWebsockets installs a WebSocketFrameProxy after the handshake so inbound frames forward straight to the browser-side WebSocket, and a DirectForwardingListener writes outbound frames directly to the client channel. Frames received before the handshake are buffered in arrival order and drained on handover, so a frame cannot land in a pipeline that has already had its Message-layer handlers removed. The hardening that the Router-side listener picked up in 8d8cf64 (2026-05-14, "[grid] Close pre-handshake race in WebSocket proxy", #17435) is mirrored on the Node listener: the pre-handshake buffer is capped at 128 frames with a 1009 close recorded on overflow; the close code and reason are recorded on pre-handshake close or error so a late onUpgrade can write a clean close frame to the client and tear the channel down rather than leaving it open; and the buffer is released on close so ref-counted frames cannot leak if the handshake never completes. Close-frame reasons coming from the upstream are now truncated to the 123-byte UTF-8 cap that RFC 6455 §5.5.1 imposes. The truncation uses a CharsetEncoder writing into a 120-byte buffer so it stops at a clean character boundary on overflow — a naive byte-truncate-then- decode could split a multi-byte sequence, produce a U+FFFD replacement on decode, and re-encode back over 123 bytes, breaking the close frame. The helper lives as a public static on WebSocketFrameProxy because both DirectForwardingListener classes already depend on that class. The Router-side listener that landed in #17435 had the same unchecked path; apply the helper there too so both proxies share the same safe behaviour. The Node-specific behaviour is preserved: - Session-activity heartbeats (sessionConsumer.accept(sessionId)) fire per frame, both pre- and post-handshake. - The connectionReleased CAS still guards a single node.releaseConnection call across the close and error paths, including the overflow path introduced here. - VNC sessions still install a no-op heartbeat consumer so VNC traffic does not mark the session as recently active. The existing ProxyNodeWebsocketsTest continues to exercise the slot accounting, including the regression from #17197 where onError without a follow-on onClose used to leak the slot. New unit tests in NodeDirectForwardingListenerTest pin the per-frame heartbeat, the buffer-then-drain ordering, the surface-and-teardown behaviour on a pre-handshake close, the overflow path's clean release of the session slot, and the safe truncation of an overlong upstream close reason that contains multi-byte UTF-8 characters. The shared helper has a focused unit test alongside it in WebSocketFrameProxyTest.
Summary
Frames received from the upstream WebSocket before the client-side handshake completes used to travel through
MessageOutboundConvertervia a fallbackConsumer<Message>. AfteronUpgradeCompleterewires the pipeline (removing the Message-layer handlers), a frame in flight on the upstream listener thread could land in a pipeline that no longer had its converter and silently drop.Move the client
Channelreference and a pre-handshake buffer intoDirectForwardingListeneritself:onText/onBinarycalls queue aWebSocketFrameonto a per-listener deque under a lock.FrameProxyConsumer.onUpgradeCompleteinvokeslistener.onUpgrade(channel), which drains the buffer in arrival order and then publishes the channel asvolatile.onClose/onErrorpre-handshake release buffered frames and latch aclosedflag so any in-flight call doesn't leak ref-counted frames.A focused
EmbeddedChannel-based test pins the buffer-then-drain order; the existing mediumProxyWebsocketTestcontinues to cover the end-to-end Router→Node path.Test plan
🤖 Generated with Claude Code