Jonas Weber·
Stateless WebSocket design for 100k connections - Redis fan-out, presence TTLs, SSE fallback included
Designs real-time communication infrastructure for live features with WebSocket management, fallback strategies, and horizontal scaling patterns.
Real-Time System Design (WebSockets/Polling)
You are a real-time systems architect who built live features at Slack and Discord. Design a real-time communication architecture.
**Use Case**: {{use_case}} (e.g., live chat, real-time collaboration, live notifications, gaming, financial tickers)
**Concurrent Connection Target**: {{concurrent_connection_target}} (number of simultaneous WebSocket connections)
**Message Pattern**: {{message_pattern}} (broadcast, room/channel-based, 1:1, presence/typing indicators)
**Client Types**: {{client_types}} (web browsers, mobile apps, IoT devices, third-party integrations)
Provide:
1. **Connection Architecture** - WebSocket server design, connection acceptance flow, handshake optimization
2. **Horizontal Scaling** - How to distribute connections across nodes with sticky sessions or shared state
3. **Pub/Sub Backbone** - Redis Pub/Sub, Redis Streams, NATS, or Kafka for cross-node message delivery
4. **Room/Channel Management** - How clients join/leave channels, channel state distribution, channel sharding
5. **Presence System** - Online/offline tracking, last-seen timestamps, presence fan-out optimization
6. **Message History** - Persistence strategy, pagination, message retention policies, archive storage
7. **Fallback Strategy** - Long-polling, SSE, periodic polling for clients that cannot use WebSockets
8. **Heartbeat & Reconnection** - Ping/pong intervals, exponential backoff reconnection, missed message recovery
9. **Backpressure Handling** - Rate limiting per connection, message dropping policies, slow consumer handling
10. **Load Testing Plan** - How to simulate 100K+ concurrent connections, message throughput testing
11. **Infrastructure Design** - Load balancer configuration (Layer 4 vs Layer 7), proxy settings (nginx/HAProxy), K8s service type
12. **Security** - Connection authentication (JWT token in query param), payload encryption, origin validation
13. **Monitoring** - Connection count metrics, message latency histograms, disconnection reason tracking
Include connection state diagrams and pseudocode for the WebSocket server connection handler.
Ergebnisse
Real-time architecture for **live chat** targeting 100k concurrent WebSocket connections, room-based with presence. The trick is keeping nodes stateless — connection affinity stays local, message fan-out goes through Redis.
**Connection handler (pseudocode)**
```
on connection:
token = url.query.token
user = verifyJwt(token) || close(4401) # auth in the handshake
registerLocal(user.id, socket) # in-memory map on this node
redis.subscribe("room:" + roomId) # cross-node delivery
redis.sadd("presence:" + roomId, user.id)
broadcastLocal(roomId, presenceEvent(user, "online"))
on message(msg):
redis.publish("room:" + msg.roomId, msg) # every node with members delivers
on disconnect:
unregisterLocal(user.id)
redis.srem("presence:" + roomId, user.id)
```
**Scaling:** connections spread across nodes via a Layer-4 LB (no sticky sessions needed — state isn't node-local beyond the socket map). **Redis Pub/Sub** is the backbone; for replayable history we also append to a Redis Stream per room. **Presence** uses a TTL'd set refreshed by heartbeat, so a crashed client ages out in ~30s.
**Resilience:** ping/pong every 25s; client reconnects with exponential backoff and replays missed messages via `last_seen_id` against the stream. **Backpressure:** per-connection send-queue cap — a slow consumer gets dropped rather than ballooning node memory. **Fallback:** clients behind hostile proxies degrade to SSE, then long-polling. Infra is K8s with a `LoadBalancer` service and `nginx` tuned to `worker_connections 65535`.
Modell: Claude Opus 4
81 Likes19 SavesScore: 71
3 Kommentare
Luca Brunner·
The fact that it flagged the latent bug instead of just rewriting is the real win.
Ahmed Hassan·
Our junior devs are going to live on this one.
Priya Nair·
Bookmarked — exactly the system design approach I was missing.