05
Product
18
Backend
10
Auth
12
iOS
08
Infra
02
Real-Time

Traefik sticky sessions for stateful MCP connections

ADR-0066 ACCEPTED · 2026-04-11
Traefik sticky sessions for stateful MCP connections

Context

MCP uses Streamable HTTP with Server-Sent Events (SSE) for real-time notifications. Each MCP session holds in-process state — channels, SSE streams, worker tasks — that cannot be serialized or moved between nodes. In our 3-node cluster (ADR-0055), Traefik distributes requests across all healthy instances. Without session affinity, a client's initialize hits node 1, then their next tool call lands on node 2 which has no channel for them.

Decision

A dedicated Traefik router matches PathPrefix(/mcp) at higher priority than the default router and applies a sticky cookie (_mcp_backend, httpOnly, secure, sameSite=strict). After the first request, Traefik pins that client to the same backend node for the duration of the session.

Redis tracks session existence (mcp:session:{id}, 30-day TTL) but does not hold transport state. This gives us two things: we can see which sessions are active, and we can return a useful 404 when a client hits the wrong node (e.g. after a deploy) rather than silently failing. The has_session check returns false for sessions that exist only in Redis — transport state is in-process and unrecoverable — but logs the mismatch so we can tell a misroute from a genuinely unknown session.

On close, Redis is cleaned up regardless of whether the local close succeeds, because the session may belong to a different node or a previous process.

Consequences

Clients stay pinned to one node for the life of their session. If that node goes down or redeploys, the SSE stream drops and the client must re-initialize on whichever node Traefik picks next. Well-behaved MCP clients already handle this (the spec says sessions are ephemeral), but it means deploys cause a brief reconnection storm.

The sticky cookie is scoped to /mcp only — normal GraphQL traffic continues to round-robin across all nodes. Redis adds a small per-session write but no read latency in the happy path (local check is tried first).