Skip to content
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
ad8fb88
Rewrite WS JWTAuthMiddleware to read token from Sec-WebSocket-Protoco…
JSv4 May 3, 2026
48a0244
Add AuthHandshakeMixin for in-band WS token refresh + security guards
JSv4 May 3, 2026
850507b
Wire AuthHandshakeMixin into UnifiedAgentConsumer; refactor resource …
JSv4 May 3, 2026
b72f2f2
Wire AuthHandshakeMixin into ThreadUpdatesConsumer
JSv4 May 3, 2026
d40e46d
Wire AuthHandshakeMixin into NotificationUpdatesConsumer
JSv4 May 3, 2026
b7b326b
Add frontend websocketAuth helpers (subprotocol + AUTH frame builders)
JSv4 May 3, 2026
603d3f1
Add useWebSocketAuth shared hook (subprotocol + in-band refresh)
JSv4 May 3, 2026
32a4799
Strip token from WS URL builders; delete deprecated DocumentQuery/Cor…
JSv4 May 3, 2026
ca9ba15
Refactor useNotificationWebSocket to compose useWebSocketAuth
JSv4 May 3, 2026
368238e
Refactor useAgentChat to compose useWebSocketAuth (no token in URL)
JSv4 May 3, 2026
08bb116
Migrate CorpusChat and document utils to tokenless WS URLs
JSv4 May 3, 2026
8012cb3
Add WS auth handshake manual test script + CHANGELOG entry
JSv4 May 3, 2026
7c0a1c6
Merge remote-tracking branch 'origin/main' into feature/websocket-aut…
JSv4 May 3, 2026
1d0a2f4
Ratchet any-baseline after WS auth refactor (460→454)
JSv4 May 3, 2026
3886d9f
Address PR #1502 review: migrate CorpusChat/ChatTray, harden refresh
JSv4 May 4, 2026
dbfbc95
Merge remote-tracking branch 'origin/main' into feature/websocket-aut…
JSv4 May 4, 2026
7a5178d
WS auth: address codecov gaps + silence React 18 act warnings
JSv4 May 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,83 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Security

- **WebSocket authentication tokens no longer travel in URL query strings**
(issue: JWT leakage via nginx logs, browser history, `Referer` headers,
Sentry breadcrumbs, and CDN/WAF logs). All WS connections now authenticate
via the standard `Sec-WebSocket-Protocol` handshake header
(`opencontracts.jwt.v1, <jwt>`). Token rotation is in-band via
`{"type":"AUTH","token":"..."}` frames — no socket churn on refresh. Hard
cutover: `?token=` URL parameters are stripped by the new middleware
(`config/websocket/middleware.py`) and ignored. `Authorization: Bearer`
headers are also no longer consulted for WS auth (browsers cannot set them
on the WebSocket constructor anyway). Stale browser tabs from before the
deploy must reload to recover.

- **`AuthHandshakeMixin`** (`config/websocket/auth_handshake.py`) — added
to all three consumers. Refuses user-pk swap mid-connection
(`USER_MISMATCH` → close 4002), re-runs resource permission checks on
refresh (`PERMISSION_REVOKED` → close 4003), supports server-nudged
refresh via `AUTH_REFRESH_REQUIRED` with a grace timer that closes 4001
on timeout.

- **Frontend `useWebSocketAuth` hook**
(`frontend/src/hooks/useWebSocketAuth.ts`) — single shared lifecycle
owner used by `useAgentChat`, `useNotificationWebSocket`, and
`CorpusChat`. Token rotation no longer reconnects the socket.

- **Removed**: deprecated `getDocumentQueryWebSocket` and
`getCorpusQueryWebSocket` URL helpers (forwarded to the unified
endpoint and now gone per the no-dead-code rule). Removed the
`autoReconnect` / `reconnectDelay` options on `useNotificationWebSocket`
— reconnect is owned by the shared hook.

- **Tests**: `opencontractserver/tests/test_websocket_auth.py` gains four
new test classes (`JWTAuthMiddlewareSubprotocolTests`,
`AuthHandshakeMixinTests`, `UnifiedAgentHandshakeTests`,
`ThreadUpdatesHandshakeTests`, `NotificationUpdatesHandshakeTests`) —
~30 cases covering middleware, mixin, per-consumer integration,
user-mismatch refusal, and `?token=` regression. Frontend adds
`useWebSocketAuth.test.ts` and `websocketAuth.test.ts`; the existing
`useNotificationWebSocket.auth.test.ts` is rewritten as a no-token-in-URL
regression suite.

- **Manual test script**:
`docs/test_scripts/websocket-auth-handshake.md` — verifies subprotocol
transport, in-band refresh, user-pk-swap refusal, and DevTools sanity
check that the JWT never appears in the request URL.

- **Per-connection AUTH-frame cooldown** in `AuthHandshakeMixin`: a
1-second floor between accepted AUTH frames per socket prevents a
malicious client from spamming refresh frames to burn DB queries on
`_get_user_from_token` + `_validate_resource_permissions`. Auth0 silent
renewal happens roughly every 50 minutes, so legitimate refreshes are
unaffected.

- **Auth-failure close codes are terminal in the frontend**:
`useWebSocketAuth` treats close codes 4000 (UNAUTHENTICATED), 4001
(TOKEN_EXPIRED), and 4002 (TOKEN_INVALID) as auth-invalid signals —
`onAuthInvalid` fires once and the hook stops spawning new sockets.
Previously 4000/4001 fell through to exponential-backoff reconnect,
which would just be rejected again in a loop until the user
re-authenticated. The hook also no longer returns the underlying
`WebSocket` reference (it was stale on the first render), to remove a
footgun for downstream consumers.

- **`CorpusChat` and `ChatTray` migrated to `useWebSocketAuth`**: both
chat surfaces previously instantiated raw `new WebSocket(url)` without
subprotocols, so authenticated users would have hit the new middleware
as anonymous after this deploy. Both now compose the shared hook
(subprotocol auth on connect, in-band AUTH refresh on token rotation,
close-code-aware reconnect policy).

- **Backend `receive()` cleanup**: each consumer now parses the incoming
text frame once and dispatches AUTH/non-AUTH from the same payload
(was double-parsing). `thread_updates.py` consolidates redundant
nested imports of `Conversation` / `Corpus` / `Document` to module
level.

### Added

- **Cross-content Discover search** at `/discover/search`. The Discover hero
Expand Down
253 changes: 253 additions & 0 deletions config/websocket/auth_handshake.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
"""
AuthHandshakeMixin — adds in-band token-refresh and re-validation behavior to
any AsyncWebsocketConsumer.

Wire protocol (frames in addition to whatever the consumer already speaks):

Client -> Server:
{"type": "AUTH", "token": "<jwt>"}

Server -> Client:
{"type": "AUTH_OK", "user_id": int|null, "username": str|null,
"anonymous": bool, "refreshed": bool}
{"type": "AUTH_FAILED", "reason":
"EXPIRED" | "INVALID" | "USER_MISMATCH" | "PERMISSION_REVOKED"}
{"type": "AUTH_REFRESH_REQUIRED", "grace_seconds": float}

Security guarantees enforced by handle_auth_message():
1. A live socket bound to user A cannot be re-bound to user B (USER_MISMATCH).
2. If the user has lost access to a bound resource since connect, the next
AUTH frame closes 4003 (PERMISSION_REVOKED).
3. An expired/invalid AUTH frame closes the socket (4001/4002) and never
leaves the consumer in an inconsistent state.
"""

from __future__ import annotations

import asyncio
import json
import logging
import time
from typing import Any

from channels.db import database_sync_to_async
from django.contrib.auth.models import AnonymousUser
from graphql_jwt.exceptions import JSONWebTokenError, JSONWebTokenExpired

from config.jwt_utils import get_user_from_jwt_token
from config.websocket.middleware import (
WS_CLOSE_TOKEN_EXPIRED,
WS_CLOSE_TOKEN_INVALID,
)

logger = logging.getLogger(__name__)

# Permission-denied close code (consistent with existing consumer usage).
WS_CLOSE_PERMISSION_DENIED = 4003

# Minimum interval between accepted AUTH frames on a single connection.
# Auth0 silent renewal happens on the order of every 50 minutes, so a 1-second
# floor cannot interfere with legitimate refreshes but stops a malicious client
# from spamming AUTH frames to burn DB queries (issue raised in PR #1502 review).
_MIN_AUTH_FRAME_INTERVAL_SEC = 1.0


@database_sync_to_async
def _get_user_from_token(token: str):
return get_user_from_jwt_token(token)


class AuthHandshakeMixin:
"""
Mix this into an AsyncWebsocketConsumer to opt into in-band auth refresh.

Consumers using this mixin should:
1. Replace ``await self.accept()`` with ``await self.accept_with_auth()``.
2. In ``receive()``, dispatch frames whose top-level "type" == "AUTH" to
``await self.handle_auth_message(payload)`` BEFORE any other handling.
3. Optionally override ``_validate_resource_permissions(user)`` to re-run
resource-level access checks on refresh; default is permissive.
4. Optionally call ``await self.request_token_refresh()`` from streaming
code that catches a JSONWebTokenExpired mid-flight.
"""

# Populated by accept_with_auth() and updated by handle_auth_message().
_refresh_grace_task: asyncio.Task | None = None
_initial_auth_sent: bool = False
# Monotonic timestamp of the last AUTH frame we accepted; used to throttle
# spam at the per-connection level before any DB work runs.
_last_auth_frame_at: float = 0.0

@property
def current_user(self):
return self.scope.get("user") # type: ignore[attr-defined]

# ------------------------------------------------------------------ #
# Connection accept
# ------------------------------------------------------------------ #

async def accept_with_auth(self) -> None:
"""Accept the connection echoing the negotiated subprotocol."""
subprotocol = self.scope.get("accepted_subprotocol") # type: ignore[attr-defined]
await self.accept(subprotocol=subprotocol) # type: ignore[attr-defined]
await self._send_initial_auth_ok()

async def _send_initial_auth_ok(self) -> None:
if self._initial_auth_sent:
return
user = self.current_user
is_anon = (
isinstance(user, AnonymousUser)
or user is None
or not getattr(user, "is_authenticated", False)
)
await self.send( # type: ignore[attr-defined]
text_data=json.dumps(
{
"type": "AUTH_OK",
"user_id": None if is_anon else user.pk,
"username": None if is_anon else user.username,
"anonymous": is_anon,
"refreshed": False,
}
)
)
self._initial_auth_sent = True

# ------------------------------------------------------------------ #
# Refresh: client-driven
# ------------------------------------------------------------------ #

async def handle_auth_message(self, payload: dict[str, Any]) -> None:
"""
Process a ``{"type":"AUTH","token":...}`` frame from the client.

Validates the token, refuses user-pk swap, re-validates resource
permissions, swaps scope["user"] on success, and cancels any pending
server-nudge grace timer.

Enforces a per-connection cooldown so a malicious client cannot spam
AUTH frames to burn DB queries on token validation + permission checks.
Frames arriving inside the cooldown window are silently dropped without
touching the database.
"""
now = time.monotonic()
if now - self._last_auth_frame_at < _MIN_AUTH_FRAME_INTERVAL_SEC:
logger.debug("Dropping AUTH frame: per-connection cooldown active")
return
self._last_auth_frame_at = now

token = payload.get("token")
if not token or not isinstance(token, str):
await self._fail_auth("INVALID", WS_CLOSE_TOKEN_INVALID)
return

try:
new_user = await _get_user_from_token(token)
except JSONWebTokenExpired:
await self._fail_auth("EXPIRED", WS_CLOSE_TOKEN_EXPIRED)
return
except JSONWebTokenError:
await self._fail_auth("INVALID", WS_CLOSE_TOKEN_INVALID)
return
except Exception:
logger.exception("Unexpected error validating refresh token")
await self._fail_auth("INVALID", WS_CLOSE_TOKEN_INVALID)
return

# User-pk swap is forbidden — defense in depth.
current = self.current_user
current_is_anon = (
isinstance(current, AnonymousUser)
or current is None
or not getattr(current, "is_authenticated", False)
)
if not current_is_anon and current.pk != new_user.pk:
await self._fail_auth("USER_MISMATCH", WS_CLOSE_TOKEN_INVALID)
return

# Re-validate resource permissions.
if not await self._validate_resource_permissions(new_user):
await self._fail_auth("PERMISSION_REVOKED", WS_CLOSE_PERMISSION_DENIED)
return

# Success — swap, ack, cancel any pending grace timer.
self.scope["user"] = new_user # type: ignore[attr-defined]
self._cancel_refresh_grace_timer()
await self.send( # type: ignore[attr-defined]
text_data=json.dumps(
{
"type": "AUTH_OK",
"user_id": new_user.pk,
"username": new_user.username,
"anonymous": False,
"refreshed": True,
}
)
)

async def _validate_resource_permissions(self, user) -> bool:
"""
Override in consumers that have resource-level access requirements
(e.g., document/corpus/conversation membership). Default permits.
"""
return True

async def _fail_auth(self, reason: str, close_code: int) -> None:
try:
await self.send( # type: ignore[attr-defined]
text_data=json.dumps(
{
"type": "AUTH_FAILED",
"reason": reason,
}
)
)
except Exception:
Comment thread
github-code-quality[bot] marked this conversation as resolved.
Fixed
pass
await self.close(code=close_code) # type: ignore[attr-defined]

# ------------------------------------------------------------------ #
# Refresh: server-nudged
# ------------------------------------------------------------------ #

async def request_token_refresh(self, grace_seconds: float = 30.0) -> None:
"""
Ask the client to send a fresh token. If the client doesn't respond
with a successful AUTH frame within ``grace_seconds``, close 4001.
"""
await self.send( # type: ignore[attr-defined]
text_data=json.dumps(
{
"type": "AUTH_REFRESH_REQUIRED",
"grace_seconds": grace_seconds,
}
)
)
self._cancel_refresh_grace_timer()
self._refresh_grace_task = asyncio.create_task(
self._refresh_grace_timeout(grace_seconds)
)

async def _refresh_grace_timeout(self, grace_seconds: float) -> None:
try:
await asyncio.sleep(grace_seconds)
except asyncio.CancelledError:
return
if getattr(self, "_is_connected", True):
logger.info("Refresh grace timer expired; closing 4001")
await self.close(code=WS_CLOSE_TOKEN_EXPIRED) # type: ignore[attr-defined]

def _cancel_refresh_grace_timer(self) -> None:
task = self._refresh_grace_task
if task is not None and not task.done():
task.cancel()
self._refresh_grace_task = None

# ------------------------------------------------------------------ #
# Cleanup
# ------------------------------------------------------------------ #

async def cleanup_auth_handshake(self) -> None:
"""Consumers should call this from their ``disconnect()``."""
self._cancel_refresh_grace_timer()
Loading
Loading