WebSocket auth handshake: tokens off URLs, in-band refresh#1502
WebSocket auth handshake: tokens off URLs, in-band refresh#1502
Conversation
…l only Removes the historical query-string (?token=) and Authorization-header token extraction paths. Tokens are now read exclusively from the Sec-WebSocket-Protocol handshake header (format: "opencontracts.jwt.v1, <jwt>"), which is the only channel a browser WebSocket constructor can populate without a custom header. The subprotocol marker is always echoed on scope["accepted_subprotocol"] when present — even on auth failure — so consumers can close cleanly with a 4xxx code rather than failing at the transport layer. Auth errors are surfaced on scope["auth_error"] with a typed close code (4001 expired, 4002 invalid). Exports: WS_AUTH_SUBPROTOCOL, WS_CLOSE_UNAUTHENTICATED, WS_CLOSE_TOKEN_EXPIRED, WS_CLOSE_TOKEN_INVALID, WS_CLOSE_RATE_LIMITED. Backwards-compat alias GraphQLJWTTokenAuthMiddleware kept until consumers are updated. Adds JWTAuthMiddlewareSubprotocolTests (6 tests) covering: no header → anonymous, marker-only → anonymous + echoed subprotocol, valid token → authenticated, invalid token → 4002 error + echoed subprotocol, ?token= ignored, Authorization header ignored.
…perm checks - UnifiedAgentConsumer now inherits AuthHandshakeMixin (before AsyncWebsocketConsumer in MRO) and calls accept_with_auth() / cleanup_auth_handshake() in connect / disconnect. - Extracted corpus/document permission checks into _validate_resource_permissions() override so the mixin can re-validate on in-band token refresh. - receive() fast-paths AUTH frames to handle_auth_message() before rate-limit gates or agent dispatch. - JWTAuthMiddleware._parse_subprotocol_token() now also checks scope["subprotocols"] (ASGI spec field set by WebsocketCommunicator in tests) in addition to the Sec-WebSocket-Protocol header; production browser behaviour is unchanged. - Updated GraphQLJWTTokenAuthMiddlewareTestCase to use subprotocol transport and drain the initial AUTH_OK frame before sending queries. - Added UnifiedAgentHandshakeTests covering in-band refresh success and user-mismatch rejection. - Added type: ignore comments to suppress pre-existing mypy mixin attr errors in auth_handshake.py and the method-assign error in the test fixture.
- Add AuthHandshakeMixin to MRO before AsyncWebsocketConsumer - Replace accept() with accept_with_auth() to emit initial AUTH_OK frame - Add AUTH fast-path at top of receive() for in-band token refresh - Add _validate_resource_permissions() override re-checking conversation access (owner, superuser, corpus/document visibility) on refresh - Call cleanup_auth_handshake() in disconnect() - Add ThreadUpdatesHandshakeTests (3 tests: valid token, no token 4000, in-band refresh)
- Add AuthHandshakeMixin to MRO before AsyncWebsocketConsumer - Replace accept() with accept_with_auth() to emit initial AUTH_OK frame before the existing CONNECTED frame - Add AUTH fast-path at top of receive() for in-band token refresh - Call cleanup_auth_handshake() in disconnect() - No _validate_resource_permissions() override needed; consumer is bound to the authenticated user themselves (user-pk swap already forbidden by the mixin) - Add NotificationUpdatesHandshakeTests (3 tests: AUTH_OK+CONNECTED pair, no token 4001, in-band refresh)
…pusQuery wrappers Tokens are now carried exclusively via the Sec-WebSocket-Protocol subprotocol header (useWebSocketAuth). The three remaining URL builders — getUnifiedAgentWebSocket, getThreadUpdatesWebSocket, getNotificationUpdatesWebSocket — no longer accept or embed token parameters. getDocumentQueryWebSocket and getCorpusQueryWebSocket are deleted.
Replaces bespoke WebSocket lifecycle with the shared useWebSocketAuth hook. Token rotation is now in-band (no socket churn). The autoReconnect option is removed from the public API since reconnect behavior is owned by useWebSocketAuth. Callers that passed autoReconnect: true have the option removed (it remains the default).
Deletes local getEnvVar / resolveWsBaseUrl / getUnifiedAgentWebSocketUrl helpers; delegates to getUnifiedAgentWebSocket from get_websockets.ts. The WebSocket lifecycle (open, close, reconnect, in-band token refresh) is now owned by useWebSocketAuth. reconnectTrigger state and auth_token reactive-var read are removed. Test file updated to import getUnifiedAgentWebSocket from get_websockets and assert no token in URL.
CorpusChat: import getUnifiedAgentWebSocket instead of getCorpusQueryWebSocket; drop auth_token from URL builder call and effect dep array. document/utils.ts: drop token parameter from getWebSocketUrl; remove @deprecated JSDoc block. ChatTray: update getWebSocketUrl call to drop token arg; remove now-unused auth_token reactive-var read and authToken import.
Code Review: WebSocket auth handshake — tokens off URLs, in-band refreshThis is a well-motivated security improvement with solid architecture. The core approach is correct and the test coverage is genuinely impressive. There are a few issues worth addressing before merge. OverviewMoves JWT transport from URL query strings (logged everywhere) to Critical Bug:
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Critical fixes from claude-review feedback: - CorpusChat.tsx and ChatTray.tsx now compose useWebSocketAuth instead of bare `new WebSocket(url)`. Without this both chat surfaces would have authenticated as anonymous against the new middleware (no subprotocol → no token). Sub-tool unwrapping for ask_document approvals stays in CorpusChat; ChatTray keeps its message-state-derived processing indicator. pendingApproval is mirrored into a ref so the once-installed onmessage closure reads the latest value without retriggering connect. - AuthHandshakeMixin.handle_auth_message now enforces a 1-second per-connection cooldown before any DB work (token validation + resource permission check). Stops a malicious client from spamming AUTH frames to burn queries; legitimate Auth0 refreshes are ~50 min apart so unaffected. - useWebSocketAuth now treats close codes 4000 (UNAUTHENTICATED) and 4001 (TOKEN_EXPIRED) as auth-invalid signals (was: silently fell through to exponential-backoff reconnect, hammering a server that would just reject them the same way). The hook also no longer returns the underlying ws reference — it was stale on first render and a footgun for callers; useAgentChat's socketRef mirror is dropped. useNotificationWebSocket drops the no-op disconnect()/connect aliases for the same reason. - Each consumer's receive() parses the incoming text frame once and dispatches AUTH or non-AUTH from the single parsed payload (was double-parsing). thread_updates.py consolidates Corpus/Document/ Conversation imports to the module level. Test wrappers (CorpusChat, ChatTray) needed `static OPEN = 1` on their StubSocket so the hook's `ws.readyState !== WebSocket.OPEN` send guard works after `window.WebSocket = StubSocket`. useAgentChat unit tests also adjust three stale assertions left over from the URL-token removal: token=test-token in URL → expect not to contain token=, and the two _fail()-driven error tests now drive errors via close 4002 + onAuthInvalid (the post-migration path that actually surfaces errors to consumers). CHANGELOG updated. New CT screenshot: knowledge-base--chat-tray--conversation-list.png.
| } | ||
| ) | ||
| ) | ||
| except Exception: |
Summary
Move WebSocket JWTs off URL query strings (where they leak into nginx logs, browser history,
Refererheaders, Sentry breadcrumbs, CDN/WAF logs) onto the standardSec-WebSocket-Protocolhandshake header, with an in-band{type:"AUTH",token:...}refresh protocol so token rotation no longer reconnects the socket.Hard cutover:
?token=URL parameters are stripped by the new middleware and ignored. Stale browser tabs from before the deploy must reload to recover.What changed
Backend (
config/websocket/)middleware.py— rewritten. Reads tokens only fromSec-WebSocket-Protocol(["opencontracts.jwt.v1", "<jwt>"]). Query string +Authorizationheader no longer consulted. Setsscope["accepted_subprotocol"]so consumers can echo it on accept.auth_handshake.py— newAuthHandshakeMixin. Providesaccept_with_auth()(echoes subprotocol + emits initialAUTH_OK),handle_auth_message()(in-band refresh: refuses user-pk swap → 4002, refuses revoked permissions → 4003, swapsscope["user"]on success),request_token_refresh()(server-nudge with grace-timer that closes 4001 on timeout),cleanup_auth_handshake().consumers/{unified_agent_conversation,thread_updates,notification_updates}.py— all three integrate the mixin. UnifiedAgent + ThreadUpdates override_validate_resource_permissionsso refresh re-runs corpus/document/conversation access checks.Frontend (
frontend/src/)utils/websocketAuth.ts— new. Constants (WS_AUTH_SUBPROTOCOL, close codes), tiny pure helpers (buildAuthProtocols,buildAuthMessage,parseAuthMessage).hooks/useWebSocketAuth.ts— new shared hook. Owns oneWebSocket, intercepts AUTH frames, fires in-band refresh onauthTokenreactive-var change, close-code-aware reconnect policy.components/chat/get_websockets.ts— token parameter stripped from all builders. DeprecatedgetDocumentQueryWebSocketandgetCorpusQueryWebSocketdeleted.hooks/useAgentChat.ts,hooks/useNotificationWebSocket.ts,components/corpuses/CorpusChat.tsx,components/knowledge_base/document/utils.ts,components/knowledge_base/document/right_tray/ChatTray.tsx, plususeBadgeNotifications/useExtractCompletionNotification/useJobNotifications— all updated to compose the shared hook and stop readingauthTokendirectly.Wire protocol
Close codes:
4000unauthenticated,4001token expired,4002token invalid (or pk-mismatch on refresh),4003permission denied (or revoked on refresh),4029rate-limited.Test plan
docker compose -f test.yml run --rm django pytest opencontractserver/tests/test_websocket_auth.py -v— middleware unit tests + mixin unit tests + per-consumer integration tests +?token=regression tests + user-pk-swap refusalcd frontend && yarn test:unit run src/utils/__tests__/websocketAuth.test.ts src/hooks/__tests__/useWebSocketAuth.test.ts src/hooks/__tests__/useNotificationWebSocket.auth.test.ts src/hooks/__tests__/useAgentChat.reconnection.test.tscd frontend && yarn lint && yarn tsc --noEmitdocs/test_scripts/websocket-auth-handshake.md— confirm browser DevTools shows no?token=in WS request URL,Sec-WebSocket-Protocol: opencontracts.jwt.v1, <jwt>header on the handshake,AUTH_OKas first frame, singleAUTHframe on Auth0 silent renewal (no socket churn)Risk
?token=.Sec-WebSocket-Protocol, all auth fails. nginx forwards by default; deployment-time config check covered indocs/test_scripts/websocket-auth-handshake.md.