05
Product
19
Backend
10
Auth
12
iOS
08
Infra
02
Real-Time

Make MCP server stateless for horizontal scaling

Supersedes ADR-0066
ADR-0067 ACCEPTED · 2026-04-12
Make MCP server stateless for horizontal scaling

Context

ADR-0066 pinned MCP sessions to a backend node via a Traefik sticky cookie. rmcp's StreamableHttpService in stateful mode holds transport state (channels, tokio tasks) in process memory, so a session initiated on one replica cannot be served by another.

In production (ADR-0055, count = 2 Nomad allocations) the sticky cookie assumed browser-style cookie jars. Claude Code and Claude Desktop use Node's fetch without cookie persistence, so Traefik distributed follow-up requests to either replica and clients received 404 Not Found: Session not found. The preview environment ran count = 1, which is why it worked there.

Stateful mode provides one capability over stateless: server-initiated push from the MCP server to the MCP client. No client currently consumes those notifications. Real-time updates to iOS go via GraphQL subscriptions over Redis pub/sub (ADR-0041), which is independent of MCP transport.

Decision

Switch StreamableHttpService to stateless mode with json_response = true. Each POST is an independent JSON request and response. No session id, no in-process transport state, no sticky cookie.

Remove the RedisSessionManager wrapper and the session-lifecycle code: peer and user_id mutexes, the ensure_*_subscription tasks, the on_initialized handler. User id is resolved per request from the AuthContext stashed in HTTP request extensions by mcp_token_auth_middleware.

Consequences

Any replica serves any request. Deploys no longer cause a reconnection storm. The Traefik sticky router and Redis session tracking introduced in ADR-0066 are removed.

MCP server-to-client push notifications are gone. MCP mutations still call publish_trip_updated, so any iOS client subscribed over GraphQL still receives the update from whichever replica it is connected to (ADR-0041 is untouched).