Skip to content

websocket optimization and benchmarking#2399

Open
stephenberry wants to merge 6 commits intomainfrom
websocket-optimization
Open

websocket optimization and benchmarking#2399
stephenberry wants to merge 6 commits intomainfrom
websocket-optimization

Conversation

@stephenberry
Copy link
Copy Markdown
Owner

@stephenberry stephenberry commented Mar 24, 2026

WebSocket Optimization and Benchmarking

Performance Optimizations

Shared receive buffers — All WebSocket connections on a given thread now share a single 512KB receive buffer (one allocation per thread instead of per-connection). Unconsumed partial-frame bytes spill to a small per-connection buffer. Deferred reclamation avoids thrashing. Enabled by default; disable with ws_recv_buffer_size(0).

Fused unmask + ASCII detection — XOR unmasking now processes 8 bytes at a time and simultaneously checks whether all bytes are ASCII. For ASCII-only text frames, the separate UTF-8 validation pass is skipped entirely.

Zero-allocation write fast path — When no write is in flight, outgoing frames are built directly in a persistent per-connection buffer (capacity reused across messages). Frames are only heap-allocated and queued when a concurrent write is already in progress.

Write queue simplification — Replaced std::deque<std::unique_ptr<std::vector<uint8_t>>> with std::deque<std::vector<uint8_t>>, removing a level of indirection.

Benchmark Suite

Added benchmarks/ws_benchmark/ comparing Glaze against uWebSockets using Boost.Beast as a neutral client. Tests cover:

  • Single-connection echo at 64B, 1KB, 64KB, and JSON payloads
  • Connection upgrade (new WebSocket per message)
  • Concurrent echo with N clients (single-threaded and multi-threaded server)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant