Architecture•February 8, 2026
Real-Time WebSocket Architecture at Scale
How to design WebSocket systems that handle millions of concurrent connections — from connection management and message routing to horizontal scaling with Redis pub/sub.
Real-time features — live chat, collaborative editing, multiplayer games, live dashboards — have become expected in modern applications. While HTTP is great for request-response patterns, WebSockets provide the persistent, bidirectional communication channel needed for instant updates. But scaling WebSockets to millions of connections introduces unique architectural challenges.
🔌 WebSocket vs. Alternatives
Before choosing WebSockets, understand when each technology fits:- WebSockets: Best for bidirectional, low-latency communication. Chat, gaming, collaborative tools.
- Server-Sent Events (SSE): Simpler, one-way server-to-client streaming. Great for live feeds, notifications, and dashboards where the client doesn't need to send data back.
- Long Polling: Fallback for environments that don't support WebSockets. Higher latency but universally compatible.
🏗️ Connection Management
Each WebSocket connection is a long-lived TCP socket that consumes server memory. A single Node.js process can typically handle 50,000–100,000 concurrent connections, but only with careful resource management:// Connection registry with heartbeat monitoring
const connections = new Map();
wss.on('connection', (ws, req) => {
const userId = authenticate(req);
connections.set(userId, {
socket: ws,
lastPing: Date.now(),
rooms: new Set()
});
ws.on('pong', () => {
connections.get(userId).lastPing = Date.now();
});
ws.on('close', () => connections.delete(userId));
});
// Prune stale connections every 30 seconds
setInterval(() => {
const staleThreshold = Date.now() - 60000;
connections.forEach((conn, userId) => {
if (conn.lastPing < staleThreshold) {
conn.socket.terminate();
connections.delete(userId);
} else {
conn.socket.ping();
}
});
}, 30000);📡 Horizontal Scaling with Redis Pub/Sub
The biggest challenge: when you run multiple WebSocket server instances behind a load balancer, a message sent to Server A needs to reach clients connected to Server B. The solution is a shared message bus:- Redis Pub/Sub: Each server subscribes to relevant channels. When a message arrives, it's published to Redis, which broadcasts it to all subscribers.
- Redis Streams: For guaranteed delivery and message persistence. Unlike Pub/Sub, messages aren't lost if a subscriber is temporarily disconnected.
- NATS / Kafka: For extremely high-throughput systems (millions of messages/second), dedicated message brokers provide better guarantees and partitioning.
// Server-side: Publish to Redis on incoming message
import Redis from 'ioredis';
const pub = new Redis();
const sub = new Redis();
sub.subscribe('chat:room:42');
sub.on('message', (channel, message) => {
// Broadcast to all local WebSocket clients in this room
broadcastToRoom('room:42', JSON.parse(message));
});
// When a client sends a message
ws.on('message', (data) => {
pub.publish('chat:room:42', data);
});🛡️ Security Considerations
WebSocket connections bypass many traditional HTTP security mechanisms:- Authentication: Authenticate during the HTTP upgrade handshake using a JWT token in the query string or a cookie. Never allow unauthenticated WebSocket connections.
- Rate limiting: Implement per-connection message rate limits to prevent abuse. A single malicious client shouldn't be able to flood your message bus.
- Input validation: Every incoming WebSocket message must be validated and sanitized. Never trust client input — treat it exactly like an API request body.
- Origin checking: Validate the
Originheader during the upgrade handshake to prevent cross-site WebSocket hijacking.
📊 Monitoring & Observability
WebSocket systems need specialized monitoring:- Connection count: Track total active connections per server and globally. Alert when approaching capacity limits.
- Message throughput: Messages sent/received per second. Sudden spikes can indicate abuse or a cascade failure.
- Latency: Measure end-to-end message delivery time (client A sends → client B receives). Target sub-100ms for chat-like experiences.
- Reconnection rate: A high reconnection rate signals network issues or server instability.
