04
Product
16
Backend
09
Auth
12
iOS
07
Infra
02
Real-Time

Adopt signal-not-payload for real-time subscriptions

ADR-0041 ACCEPTED · 2026-02-15
Adopt signal-not-payload for real-time subscriptions

Context

Issue #111 introduces GraphQL subscriptions for real-time trip updates across a user's devices. The fundamental design question is how much data the subscription carries.

Three approaches exist for real-time updates in GraphQL:

  1. Full state: Subscription delivers the complete updated object on every change. Client replaces its local state entirely. Simple but wasteful — a trip with 10 destinations sends the full object when a single field changes.

  2. Delta/diff: Subscription delivers only the changed fields. Client merges diffs into local state. Efficient over the wire but requires diff computation on the server and merge logic on the client, creating a parallel data flow alongside regular queries.

  3. Signal: Subscription delivers only an identifier (trip UUID). Client refetches via its existing query infrastructure. Minimal subscription payload, but trades an extra round-trip for simplicity.

The app already has a query layer (Apollo iOS with normalized cache, GetTripQuery) that handles fetching, caching, and UI updates. Mutations return updated data through this same path. Building a parallel delta-sync path would duplicate this infrastructure.

The trip domain uses an append-only modification log (ADR-0033). State is derived by replaying effects. Computing diffs between effect applications is non-trivial and couples the subscription layer to the effect system.

Decision

We adopt a signal-not-payload model for real-time subscriptions. The tripUpdated subscription emits only the trip UUID. Clients use this as a trigger to refetch via their existing query path.

The subscription is a notification layer, not a state synchronization engine. Comparable to how Notion or Asana handle real-time updates — signal that something changed, let the client refetch.

Redis pub/sub provides cross-node fan-out for the 3-node cluster. Each server maintains local broadcast channels per trip, with Redis distributing signals between nodes. The publish is fire-and-forget outside the database transaction — a failed notification doesn't roll back a successful write. The database is the source of truth; Redis is best-effort.

Transport

HTTP multipart is the v1 transport. Apollo iOS v2 does not support WebSocket subscriptions (tracked at apollographql/apollo-ios#3624). A WebSocket endpoint exists for future use but is currently unused.

Both transports share the same GraphQL subscription resolvers and Redis pub/sub infrastructure. The transport is a concern of the HTTP layer, not the subscription domain.

Centralized publishing

All trip content mutations flow through ModificationService::store_modification, which publishes the Redis signal after commit. This means new domains that use store_modification get real-time signaling automatically. Two exceptions publish directly: create_trip/delete_trip (which don't create modifications) and the document domain (which manages its own transactions).

Subscription lifecycle

Each application node maintains a TripSubscriptionManager with in-memory broadcast channels keyed by trip/user ID. When the first subscriber for a trip connects, the manager subscribes to the corresponding Redis channel. When the last subscriber disconnects, it unsubscribes. A dedicated Redis SubscriberClient (separate from the store/cache pools) maintains these subscriptions across the node's lifetime. The health check endpoint verifies this connection is alive.

Consequences

Benefits

  • No parallel data path: Subscription reuses the existing query/cache infrastructure on iOS. No diff merging, no partial state updates, no cache consistency concerns.
  • Transport-agnostic: Works identically over HTTP multipart (Apollo iOS v2 default) and WebSocket. The signal is just a UUID regardless of transport.
  • Decoupled from domain model: The subscription layer doesn't need to understand effects, modifications, or trip structure. It only knows trip IDs.
  • Simple failure semantics: Missed signals cause a delayed UI update, not data loss or inconsistency. The client catches up on next refetch.

Trade-offs

  • Extra round-trip per update: Each signal triggers a GetTripQuery fetch. For a travel app with infrequent edits (minutes between modifications), this is negligible.
  • No partial updates: Even a small change triggers a full trip refetch. Apollo's normalized cache minimizes the UI impact, but the network cost is a full query response.
  • No gap detection: If signals are lost (Redis failure, network partition), the client has no way to know it missed something until the next refetch trigger (app foreground, navigation, or next signal).

Known limitations for team collaboration (#69)

  • Missed signal recovery: For multi-user scenarios, a missed signal means another person's update is invisible until manual refetch. A transactional outbox with sequence numbers would provide at-least-once delivery.
  • Concurrent edit handling: The append-only modification log detects position conflicts via unique constraint but doesn't retry. Advisory locks or retry logic needed for team edits.