Skip to content

Latest commit

 

History

History
943 lines (706 loc) · 45.5 KB

File metadata and controls

943 lines (706 loc) · 45.5 KB

Apollo Subscriptions at Scale — System Walkthrough

This document is a guided tour of every component in the system: what it does, why it exists, and how it connects to everything else. Read it top-to-bottom for a complete mental model, or jump to any section as a reference.


Table of Contents

  1. Big Picture
  2. Infrastructure — Kafka and Redis
  3. The HTTP Callback Protocol
  4. Notifications Subgraph — Plugin Approach
  5. Orders Subgraph — Manual Redis Approach
  6. Users Subgraph — Query/Mutation Only
  7. Subscription Manager
  8. Apollo Router
  9. Nginx Load Balancer
  10. React Web App
  11. Observability Stack
  12. Horizontal Scaling: How It All Fits Together
  13. Key Design Decisions Explained
  14. Full Request Lifecycle Walkthrough
  15. Failure Modes and Recovery

1. Big Picture

┌─────────────────────────────────────────────────────────────────────────────────┐
│                              CLIENT LAYER                                       │
│                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────────────┐  │
│   │  React Web App  (localhost:3000)                                         │  │
│   │  Apollo Client 4 · HTTP multipart subscriptions · no WebSocket           │  │
│   └────────────────────────────┬─────────────────────────────────────────────┘  │
└────────────────────────────────│────────────────────────────────────────────────┘
                                 │ HTTP multipart (application/json;boundary=...)
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                            ROUTING LAYER                                        │
│                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────────────┐  │
│   │  nginx Load Balancer  (localhost:4000)                                   │  │
│   │  proxy_buffering off · proxy_read_timeout 3600s · resolver 127.0.0.11   │  │
│   └───────────────────────┬─────────────────────┬────────────────────────────┘  │
│                           │ round-robin          │ round-robin                  │
│               ┌───────────▼──────────┐  ┌───────▼──────────────┐              │
│               │  Apollo Router (pod 0)│  │ Apollo Router (pod 1) │              │
│               │  router-0:4000        │  │ router-1:4000         │              │
│               │  callback URL prefix: │  │ callback URL prefix:  │              │
│               │  /router-0:4000/      │  │ /router-1:4000/       │              │
│               │   callback            │  │  callback             │              │
│               └───────────┬──────────┘  └───────┬──────────────┘              │
└───────────────────────────│─────────────────────│─────────────────────────────┘
                            │ HTTP Callback Protocol
                            ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                           SUBGRAPH LAYER                                        │
│                                                                                 │
│  ┌───────────────────────────┐  ┌────────────────────────┐  ┌────────────────┐ │
│  │   Notifications (4001)    │  │    Orders (4003 ×2)     │  │  Users (4002)  │ │
│  │                           │  │                         │  │                │ │
│  │  ApolloServerPlugin       │  │  Manual HTTP Callback   │  │  Query/Mutation│ │
│  │  SubscriptionCallback     │  │  Protocol               │  │  only          │ │
│  │                           │  │                         │  │                │ │
│  │  State: in-process        │  │  State: Redis (shared)  │  │  No subscriptions│ │
│  │  Scale: single-pod only   │  │  Scale: any pod can     │  │  Stateless     │ │
│  │                           │  │  service any event      │  │                │ │
│  └─────────────┬─────────────┘  └────────────┬────────────┘  └────────────────┘ │
└────────────────│────────────────────────────│─────────────────────────────────┘
                 │ KafkaJS                     │ KafkaJS consumer group
                 ▼                             ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                          EVENT STREAMING LAYER                                  │
│                                                                                 │
│  ┌──────────────────────────────────────────────────────────────────────────┐  │
│  │  Apache Kafka 4.x (KRaft mode, no ZooKeeper)                             │  │
│  │                                                                           │  │
│  │  notification-events   (3 partitions)   ◄── notificationCreated subs    │  │
│  │  system-alerts         (1 partition)    ◄── systemAlert subs            │  │
│  │  order-status-changed  (3 partitions)   ◄── orderStatusChanged subs     │  │
│  │                        keyed by orderId (same order → same partition)   │  │
│  └──────────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────────┘
                                              │
                                              ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                          STATE MANAGEMENT LAYER                                 │
│                                                                                 │
│  ┌─────────────────────────────────────┐  ┌──────────────────────────────────┐ │
│  │  Redis 7.4                          │  │  Subscription Manager (×2)       │ │
│  │                                     │  │                                  │ │
│  │  substate:{id}   Hash  TTL:60s      │  │  Redis leader election (SET NX)  │ │
│  │    callbackUrl                      │  │  1 leader active, 1 standby      │ │
│  │    verifier                         │  │                                  │ │
│  │    orderId                          │  │  Every 20s:                      │ │
│  │    indexKey                         │  │  SCAN substate:* →               │ │
│  │    subgraph                         │  │    POST check to callbackUrl     │ │
│  │                                     │  │    200 → EXPIRE 60s              │ │
│  │  subindex:{orderId}  Set  TTL:1h    │  │    404 → DEL + SREM              │ │
│  │    {subId1, subId2, ...}            │  │                                  │ │
│  │                                     │  │  Lock TTL: 35s (> 20s interval)  │ │
│  │  sub:manager:lock    String TTL:35s │  │                                  │ │
│  └─────────────────────────────────────┘  └──────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘
                                              │
                                              ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                         OBSERVABILITY LAYER                                     │
│                                                                                 │
│   Subgraphs + Router                                                            │
│   ──────────────── OTLP/HTTP (port 4318) ──▶  OTel Collector                  │
│                                                    │ traces           │ metrics │
│                                                    ▼                  ▼         │
│                                                 Zipkin           Prometheus     │
│                                               :9411               :9090         │
│                                                                    │             │
│                                                                    ▼             │
│                                                                 Grafana          │
│                                                                 :3001            │
└─────────────────────────────────────────────────────────────────────────────────┘

Why so many moving parts? Each layer solves a specific problem:

  • Kafka — decouples event producers from consumers; survives subgraph restarts; enables fan-out to N subscriptions per order
  • Redis — shared state so any orders pod can deliver to any Router callback URL
  • Subscription Manager — externalized heartbeat so stateless orders pods don't need timers
  • nginx — single entry point; hides Router pod count from clients; enables zero-downtime Router restarts
  • Named Router pods — each pod advertises its own callback URL so events go to exactly the right pod

2. Infrastructure — Kafka and Redis

Kafka (apache/kafka:latest, KRaft mode)

Kafka is the event backbone. Every order status change, notification, or system alert goes through Kafka — not from the mutation resolver directly to waiting subscriptions.

Why Kafka instead of direct in-memory pub/sub?

With in-memory pub/sub, the pod that handles the mutation must also hold the subscription state. If you're running 2 orders pods and a mutation hits pod A but the subscription was registered on pod B, the client never sees the event. Kafka eliminates this: any pod produces an event, and Kafka distributes it to the consumer group, which any pod can join.

Topics and partition strategy:

Topic Partitions Partition key Consumer group
notification-events 3 userId notifications-subgraph-group
system-alerts 1 userId notifications-subgraph-group
order-status-changed 3 orderId orders-subgraph-group

Partitioning by orderId means all events for the same order go to the same partition, which is consumed by the same pod. This provides ordering guarantees per order while still allowing parallel consumption across different orders.

CRITICAL severity dual-publish: When a notification has severity CRITICAL, it is published to both system-alerts and notification-events. This means it appears in both the systemAlert subscription (broadcast to all users on Dashboard) and the notificationCreated subscription (filtered to specific users on Notifications page).

Redis (redis:7.4-alpine)

Redis serves two purposes: subscription state storage and distributed leader election.

Subscription state (orders subgraph):

substate:{subscriptionId}  → Hash, TTL 60s
  callbackUrl              → http://router-0:4000/callback/{subscriptionId}
  verifier                 → <random token set by Router at init>
  orderId                  → ord-abc123
  indexKey                 → subindex:ord-abc123
  subgraph                 → orders
  variables                → {"orderId":"ord-abc123"}
  createdAt                → 2024-01-01T00:00:00.000Z

subindex:{orderId}         → Set, TTL 1h
  {sub-id-1, sub-id-2}     → all subscription IDs watching this orderId

The indexKey field deserves special attention. Rather than storing just the orderId and reconstructing the key, the full Redis key (subindex:ord-abc123) is stored directly. This makes the subscription manager completely domain-agnostic — it can clean up state for orders, notifications, or any future subgraph without knowing what "orderId" means.

Leader election (subscription manager):

sub:manager:lock  → String, TTL 35s
  value: manager-<hostname>-<pid>

3. The HTTP Callback Protocol

This is the mechanism Apollo Router uses to push subscription events to clients. Understanding it is central to understanding the whole system.

Standard subscription flow (WebSocket)

In a traditional setup, the client opens a WebSocket to the server. The server holds the connection open and writes frames as events arrive. This is stateful — the server process that accepted the connection must stay alive to deliver events.

HTTP Callback Protocol

Instead of a persistent connection from the subgraph to the Router, the Router registers a callback URL with the subgraph at subscription init time. The subgraph then POSTs to that URL whenever it has an event.

Step 1 — Init (Router → Subgraph):
POST /graphql
Accept: multipart/mixed; subscriptionSpec=1.0
{
  "query": "subscription { orderStatusChanged(orderId: $orderId) { ... } }",
  "extensions": {
    "subscription": {
      "callbackUrl": "http://router-0:4000/callback/sub-abc",
      "subscriptionId": "sub-abc",
      "verifier": "tok-xyz"
    }
  }
}

→ Subgraph responds: 200 {"data": null}   (acknowledges, stores state)

Step 2 — Event delivery (Subgraph → Router):
POST http://router-0:4000/callback/sub-abc
subscription-protocol: callback/1.0
{
  "kind": "subscription",
  "action": "next",
  "id": "sub-abc",
  "verifier": "tok-xyz",
  "payload": { "data": { "orderStatusChanged": { ... } } }
}

→ Router responds: 200   (pushes multipart chunk to client)
→ Router responds: 404   (subscription was dropped — clean up)

Step 3 — Heartbeat check (Subgraph → Router):
POST http://router-0:4000/callback/sub-abc
{
  "kind": "subscription",
  "action": "check",
  "id": "sub-abc",
  "verifier": "tok-xyz"
}

→ Router responds: 204   (still alive, renew TTL)
→ Router responds: 404   (subscription dropped — clean up)

Step 4 — Completion (Subgraph → Router):
POST http://router-0:4000/callback/sub-abc
{
  "kind": "subscription",
  "action": "complete",
  "id": "sub-abc",
  "verifier": "tok-xyz"
}

The verifier field is a random token that the Router sets at init and the subgraph echoes back on every request. This prevents spoofed events from external systems that might learn the callback URL.

Why is the callback URL pod-specific?

The callback URL must point to exactly the Router pod that accepted the client's subscription. When the client opens a streaming HTTP response, it's physically connected to one pod. If another pod receives the callback POST, it would have no open HTTP response to write to. With nginx round-robining to router-0 and router-1, the subgraph must store which pod registered the subscription and always POST to that same pod.

This is why router-0 and router-1 are named services rather than using --scale router=2 — each pod needs a unique, stable hostname to put in its CALLBACK_PUBLIC_URL.


4. Notifications Subgraph — Plugin Approach

Entry point: subgraphs/notifications/src/index.ts

This subgraph uses ApolloServerPluginSubscriptionCallback — a plugin included with Apollo Server that implements the HTTP Callback Protocol automatically. You don't write any middleware; you just use regular GraphQL AsyncIterator-based subscription resolvers.

How it works internally

Subscription request arrives
         ↓
ApolloServerPluginSubscriptionCallback intercepts
  - Stores subscription state in-process (Map<subscriptionId, ...>)
  - Returns {"data": null} to Router
  - Starts heartbeat loop (internal to plugin)
         ↓
Kafka event arrives (via pubsub.ts)
  - emitter.emit('NOTIFICATION_CREATED', payload)
         ↓
asyncIterator yields the payload
  - Subscription resolver's AsyncIterator returns the next value
         ↓
Plugin POSTs to callbackUrl with action:"next"

The pubsub bridge (subgraphs/notifications/src/pubsub.ts):

The bridge connects Kafka (which delivers one message at a time to a callback function) to GraphQL subscriptions (which need an AsyncIterableIterator). It uses a pull/push queue pattern:

  • Push queue: Kafka message arrives, but no subscription is await-ing yet → buffer the event in pushQueue[]
  • Pull queue: Subscription calls .next() but no Kafka message has arrived yet → store the resolve callback in pullQueue[]
  • When both sides are active, resolve immediately

Why this subgraph cannot be horizontally scaled:

The plugin stores subscription state in-process memory. If you run 2 replicas, the Router might init a subscription on pod A (stored in pod A's memory), but a Kafka event might be consumed by pod B (which has no record of that subscription). Events would be silently dropped for subscriptions registered on the other pod.

When to use this approach

  • Simpler to implement (less custom code)
  • Fine for low-to-moderate traffic on a single pod
  • Good starting point for a subscription that doesn't need HA

5. Orders Subgraph — Manual Redis Approach

Entry point: subgraphs/orders/src/index.ts

This subgraph implements the HTTP Callback Protocol entirely by hand. There is no plugin. Instead, Express middleware intercepts subscription init requests before Apollo Server ever sees them.

Subscription init middleware

// In index.ts — simplified
app.post('/graphql', async (req, res, next) => {
  const isSubscriptionInit = req.headers.accept?.includes('subscriptionSpec=1.0');
  if (!isSubscriptionInit) return next(); // pass queries/mutations to Apollo Server

  // Extract orderId from variables (NOT inline literal — middleware reads body.variables)
  const orderId = req.body.variables?.orderId;
  const { callbackUrl, subscriptionId, verifier } = req.body.extensions.subscription;

  // Store in Redis with 60s TTL
  await storeSubscription(subscriptionId, {
    callbackUrl,
    verifier,
    orderId,
    indexKey: `subindex:${orderId}`,
    subgraph: 'orders',
  });

  // Acknowledge to Router
  res.json({ data: null });
});

The orderId must come from body.variables, not an inline literal in the query string. The middleware parses body.variables.orderId — if the client writes orderStatusChanged(orderId: "abc") as a literal, the middleware can't extract it and the subscription silently fails.

Event delivery loop

When a Kafka message arrives on order-status-changed:

// In index.ts — simplified
consumer.run({
  eachMessage: async ({ message }) => {
    const event = JSON.parse(message.value.toString());
    const { orderId } = event;

    // Find all subscriptions watching this orderId
    const subIds = await redis.sMembers(`subindex:${orderId}`);

    for (const subId of subIds) {
      const sub = await redis.hGetAll(`substate:${subId}`);

      // POST to the specific Router pod that owns this subscription
      const res = await fetch(sub.callbackUrl, {
        method: 'POST',
        headers: { 'subscription-protocol': 'callback/1.0' },
        body: JSON.stringify({
          kind: 'subscription', action: 'next',
          id: subId, verifier: sub.verifier,
          payload: { data: { orderStatusChanged: event } }
        })
      });

      if (res.status === 404) {
        // Router dropped the subscription — clean up immediately
        await deleteSubscription(subId, `subindex:${orderId}`);
      }

      if (event.newStatus === 'DELIVERED' || event.newStatus === 'CANCELLED') {
        // Terminal state — send complete action and clean up
        await fetch(sub.callbackUrl, {
          body: JSON.stringify({ kind: 'subscription', action: 'complete', ... })
        });
        await deleteSubscription(subId, `subindex:${orderId}`);
      }
    }
  }
});

Auto-progression

When placeOrder is called, a fire-and-forget timer automatically advances the order through CONFIRMED → PREPARING → SHIPPED → DELIVERED every 10 seconds. Each step publishes a Kafka event which the consumer loop delivers to active subscriptions. This makes the demo self-animating without needing manual updateOrderStatus calls.

The progression logic skips steps if the order was manually advanced past them (e.g., if you manually set SHIPPED, the auto-progression won't go back to CONFIRMED).

Why this approach scales

Because all state lives in Redis (not in-process), any pod can service any delivery:

  • Pod A publishes the Kafka event
  • Pod B consumes it, reads subindex:ord-123 from Redis, fetches callbackUrl, delivers it

Neither pod needs to know which pod registered the subscription.


6. Users Subgraph — Query/Mutation Only

Entry point: subgraphs/users/src/index.ts

The Users subgraph exists to demonstrate federation — specifically the @key directive for entity resolution. It provides User as a federated type that the Notifications subgraph can reference via @requires.

It has no subscriptions, no Kafka consumers, and no Redis state. It's a plain Apollo Server instance with an in-memory user store.


7. Subscription Manager

Entry point: subscription-manager/src/index.ts

The subscription manager is a standalone Node.js process — no Apollo Server, no HTTP server, no web framework. It does exactly one thing: send heartbeat check requests to the Router for every active subscription in Redis.

Why it's a separate service

The orders subgraph is stateless: it stores subscription state in Redis, consumes Kafka events, and delivers them. It has no internal timer loop for heartbeats. This separation of concerns is intentional:

  • The orders subgraph can restart or scale without worrying about heartbeat timers
  • The manager can be scaled and HA'd independently
  • The manager works for all subgraphs (notifications, orders, any future ones)

Leader election

Two instances run simultaneously. Only one sends heartbeats. They race using Redis's atomic SET NX EX command:

SET sub:manager:lock "manager-<hostname>-<pid>" NX EX 35
  • NX — only set if key doesn't exist (atomic compare-and-set)
  • EX 35 — expires after 35 seconds

The leader renews its lock every 20 seconds using a Lua script:

if redis.call("GET", KEYS[1]) == ARGV[1] then
  return redis.call("EXPIRE", KEYS[1], ARGV[2])
else
  return 0
end

The Lua script runs atomically — there's no race between checking the current value and setting the TTL. This prevents a scenario where a slow pod renews a lock it actually lost.

Timing guarantees:

  • Leader renews at 20s → lock TTL 35s → 15s safety margin before expiry
  • If leader crashes, standby acquires lock within 35s
  • Router expects a heartbeat within 30s (heartbeat_interval in router.yaml)
  • Manager heartbeats every 20s → 10s margin before Router drops the subscription

Generic across all subgraphs

The ManagedSubState interface only reads callbackUrl, verifier, indexKey, and subgraph. The indexKey field is the full Redis set key stored by the subgraph at init time (e.g., subindex:ord-123). The manager never needs to know what orderId means — it just reads the key and calls SREM indexKey {subId} on cleanup.


8. Apollo Router

Config: router/router.yaml

The Router is the single entry point for all GraphQL operations. It composes the supergraph schema from the three subgraphs and routes operations to the appropriate services.

Critical configuration: public_url

subscription:
  mode:
    callback:
      public_url: "${env.CALLBACK_PUBLIC_URL:-http://router:4000/callback}"

This URL must include the /callback path prefix. The Router constructs per-subscription URLs as {public_url}/{subscriptionId}. If /callback is omitted, the subgraph stores http://router:4000/sub-abc instead of http://router:4000/callback/sub-abc, the callback POST hits the wrong path, and events are silently dropped.

APQ (Automatic Persisted Queries)

apq:
  router:
    cache:
      redis:
        urls: ["redis://redis:6379"]

APQ caches query/subscription documents in Redis by their hash. On subsequent requests, the client sends only the hash instead of the full query string. This reduces payload size for frequent operations like subscription registrations. Requires the same GraphOS Enterprise license as federated subscriptions.

Telemetry

The Router exports traces and metrics via OTLP to the collector:

telemetry:
  exporters:
    tracing:
      otlp:
        enabled: true
        endpoint: "http://otel-collector:4318"
    metrics:
      otlp:
        enabled: true
        endpoint: "http://otel-collector:4318"
  instrumentation:
    instruments:
      router:
        http.server.request.duration:
          attributes:
            graphql.operation.name:
              operation_name: string

Traces appear in Zipkin tagged by service name (router-0 or router-1), letting you trace a subscription init through Router → Subgraph → Kafka → delivery → callback.

experimental_diagnostics

experimental_diagnostics:
  enabled: true
  listen: 0.0.0.0:8089

An internal-only HTTP endpoint (not exposed outside the Docker network) that exposes Router debug information — useful for diagnosing why subscriptions aren't receiving events or why heartbeats are failing.


9. Nginx Load Balancer

Config: nginx/nginx.conf

Nginx is the only service exposed on port 4000 to the host. Clients connect here; nginx forwards to router-0 or router-1.

Why these settings matter

proxy_buffering    off;
proxy_cache        off;

Apollo Router uses HTTP multipart (chunked transfer encoding) for subscription streaming. If nginx buffers the response, it holds all chunks and flushes them at once — the client would see nothing until the subscription ends (or times out). proxy_buffering off makes each chunk flush immediately to the client.

proxy_read_timeout  3600s;
proxy_send_timeout  3600s;

Subscriptions are long-lived connections. Without extended timeouts, nginx would terminate the connection after the default 60 seconds.

resolver 127.0.0.11 valid=5s ipv6=off;

Docker's internal DNS resolver. Without this, nginx resolves router-0 and router-1 once at startup and caches the IPs indefinitely. If a Router pod restarts with a new IP (which happens when you docker compose restart), nginx continues sending to the old (now dead) IP and returns 502. valid=5s forces re-resolution every 5 seconds.

Callback bypass

Nginx only handles client traffic. The subgraph callback POSTs go directly from the subgraph to the specific Router pod (router-0:4000 or router-1:4000), bypassing nginx entirely. This is intentional — callbacks must reach the exact pod that owns the client connection.


10. React Web App

Entry point: web-app/src/App.tsx

A Vite + React 18 + Apollo Client 4 application demonstrating all three subscription types.

HTTP multipart (no WebSocket)

Apollo Client 4 uses HTTP multipart for subscriptions by default when configured with HttpLink. There are no WebSockets in this architecture — subscriptions are long-lived HTTP responses with Content-Type: multipart/mixed. You can verify this in DevTools Network tab: look for a request to /graphql that stays open and returns chunked data. No ws:// connections.

Three pages

Dashboard — Subscribes to systemAlert. Shows any CRITICAL-severity notification sent via the Notifications subgraph. Demonstrates broadcast subscriptions (all connected clients see the same events).

Notifications — Shows notificationCreated filtered by userId. Includes 4 preset quick-send buttons (Order Shipped/INFO, Payment Failed/WARN, Security Alert/CRITICAL, Flash Sale/INFO) plus a custom form. The CRITICAL button also appears on Dashboard (dual-publish).

Orders — Full end-to-end pipeline:

  1. placeOrder mutation → returns an orderId
  2. orderStatusChanged subscription using that orderId
  3. Auto-progression fires every 10s (no action needed)
  4. UI shows a status stepper updating in real time
  5. Optional: updateOrderStatus to manually skip ahead

State preservation

Pages use CSS display: none/block rather than conditional rendering (&&). This keeps components mounted across tab switches, preserving the orderId from placeOrder and the subscription state without re-initiating.


11. Observability Stack

Configs: observability/

Four services added in Phase 8, inspired by the Apollo Solutions reference architecture.

OTel Collector (otel/opentelemetry-collector-contrib)

The collector is the central aggregation point. Subgraphs and the Router send OTLP (OpenTelemetry Protocol) data to it. The collector batches and forwards:

  • Traces → Zipkin (distributed request tracing)
  • Metrics → Prometheus scrape endpoint on :9091
Subgraph / Router
    │
    │  OTLP/HTTP  POST /v1/traces   (port 4318)
    │  OTLP/HTTP  POST /v1/metrics  (port 4318)
    ▼
OTel Collector
    │
    ├──▶ Zipkin:9411/api/v2/spans   (traces)
    └──▶ :9091 (Prometheus scrape)  (metrics)

How tracing works in the subgraphs

Each subgraph loads tracing.ts via Node's --import flag before any application code:

CMD ["node", "--import", "./dist/tracing.js", "dist/index.js"]

--import (not --require) is used because both subgraphs use "type": "module" (ESM). --require only works for CommonJS modules. The --import flag instructs Node to load and execute the module before the main entry point, ensuring OTel hooks are in place before Express, KafkaJS, and Redis clients are imported.

The getNodeAutoInstrumentations() call automatically instruments HTTP requests, Kafka producer/consumer, Redis operations, and DNS — without any manual span creation in application code.

Zipkin (openzipkin/zipkin)

Visual trace explorer at http://localhost:9411. Filter by service name (notifications, orders, router-0, router-1) to see distributed traces for a complete subscription lifecycle.

Prometheus (prom/prometheus)

Metrics store at http://localhost:9090. Scrapes otel-collector:9091 every 15 seconds. The scrape config drops high-cardinality _bucket metrics to keep storage lean.

Grafana (grafana/grafana)

Dashboard UI at http://localhost:3001 (admin/admin). The Prometheus datasource is auto-provisioned via grafana-datasources.yaml — no manual setup needed.


12. Horizontal Scaling: How It All Fits Together

The scaled stack (docker-compose.scale.yml) runs:

nginx (port 4000)
  ├── router-0 (CALLBACK_PUBLIC_URL=http://router-0:4000/callback)
  └── router-1 (CALLBACK_PUBLIC_URL=http://router-1:4000/callback)

orders (2 replicas)
  ├── apollo-subs-at-scale-orders-2
  └── apollo-subs-at-scale-orders-3

subscription-manager (2 replicas)
  ├── apollo-subs-at-scale-subscription-manager-1  ← leader (holds Redis lock)
  └── apollo-subs-at-scale-subscription-manager-2  ← standby

Cross-pod event delivery (the key scenario)

Client ──HTTP──▶ nginx ──round-robin──▶ router-0
                                            │ subscription init POST
                                            ▼
                                      orders-pod-2
                                      storeSubscription → Redis
                                      callbackUrl = http://router-0:4000/callback/sub-abc

Later:
  mutation placeOrder (hits orders-pod-3 via nginx → router)
       ↓
  orders-pod-3 publishes to Kafka (topic: order-status-changed, key: ord-123)
       ↓
  Kafka delivers to orders-subgraph-group consumer
  (could be pod-2 OR pod-3 — Kafka decides based on partition assignment)
       ↓
  whichever pod consumes it: SMEMBERS subindex:ord-123 → [sub-abc]
                              HGETALL substate:sub-abc → callbackUrl=http://router-0:...
                              POST http://router-0:4000/callback/sub-abc
       ↓
  router-0 pushes multipart chunk to client  ✓

The critical invariant: the Redis lookup always returns the exact Router pod URL. The delivering pod never needs to know which pod accepted the client connection — Redis tells it.


13. Key Design Decisions Explained

Why indexKey is stored in the hash itself

The subindex:{orderId} set maps an orderId to all subscriptionIds watching it. When cleaning up a subscription (on 404 or terminal state), you need to SREM the subscriptionId from this set. But to know which set to remove it from, you need the orderId.

Storing indexKey directly in the substate hash (as the full Redis key string, not just the orderId) makes the subscription-manager completely domain-agnostic. It never needs to know what an orderId is, or how to reconstruct the key. It just reads indexKey and calls SREM indexKey {subId}.

Why deleteSubscription must fetch orderId BEFORE calling redis.del

This was a real bug that was fixed:

// WRONG — reads orderId after deleting the hash (returns null)
await redis.del(`substate:${subId}`);
const orderId = await redis.hGet(`substate:${subId}`, 'orderId'); // null!

// CORRECT — read indexKey before deleting
const indexKey = await redis.hGet(`substate:${subId}`, 'indexKey');
await redis.del(`substate:${subId}`);
await redis.sRem(indexKey, subId);

Why named Router pods instead of --scale router=2

--scale router=2 creates two pods with the same name, but they can't have different CALLBACK_PUBLIC_URL values — all pods share the same environment block. Named pods (router-0, router-1) each get their own environment section with a unique callback URL.

Why --import not --require for OTel

--require only works for CommonJS modules. Both subgraphs use "type": "module" in package.json (ESM). --import is the ESM-compatible equivalent introduced in Node.js 18+.

Why bookworm-slim not Alpine

Some OTel auto-instrumentation modules use native add-ons that require glibc. Alpine uses musl libc (a different implementation). The native modules either fail to install or crash at runtime on Alpine. bookworm-slim (Debian) provides glibc.

At-most-once Kafka delivery (no retry on 5xx)

The Kafka consumer does not retry failed deliveries on 5xx errors. Retrying would deliver the same event to all N subscribers multiple times. One failed delivery is preferable to N duplicate deliveries. For true at-least-once guarantees you would need per-subscription retry state.


14. Full Request Lifecycle Walkthrough

Let's trace a complete order subscription from client click to event delivery.

Step 1: Place an order

Browser → POST http://localhost:4000/graphql
mutation { placeOrder(customerId: "cust-1", total: 99.0) { id status } }

nginx → router-0 (round-robin)
router-0 → orders:4003/graphql (resolvers.ts placeOrder)
  - generates UUID: "ord-abc"
  - stores in in-memory array
  - fires scheduleAutoProgression("ord-abc") [async, not awaited]
  - returns { id: "ord-abc", status: "PLACED" }

Step 2: Open the subscription

Browser → GET http://localhost:4000/graphql
Accept: multipart/mixed; subscriptionSpec=1.0
{
  "query": "subscription($orderId: ID!) { orderStatusChanged(orderId: $orderId) { ... } }",
  "variables": { "orderId": "ord-abc" }
}

nginx → router-0 (same connection stays open)
router-0 creates subscription:
  - generates subscriptionId: "sub-xyz"
  - generates verifier: "ver-123"
  - POSTs init to orders:4003/graphql (middleware intercepts):
    { "extensions": { "subscription": {
        "callbackUrl": "http://router-0:4000/callback/sub-xyz",
        "subscriptionId": "sub-xyz",
        "verifier": "ver-123"
    }}}

orders-pod-2 receives init:
  - extracts orderId from body.variables.orderId = "ord-abc"
  - storeSubscription("sub-xyz", {
      callbackUrl: "http://router-0:4000/callback/sub-xyz",
      verifier: "ver-123",
      orderId: "ord-abc",
      indexKey: "subindex:ord-abc",
      subgraph: "orders"
    })
  - responds: { "data": null }

router-0 keeps the multipart HTTP response open, waiting for callbacks

Step 3: Auto-progression fires (10 seconds later)

scheduleAutoProgression fires for "ord-abc":
  - updates store: status = "CONFIRMED"
  - publishes to Kafka:
    topic: order-status-changed, key: "ord-abc"
    value: { orderId: "ord-abc", previousStatus: "PLACED", newStatus: "CONFIRMED", ... }

Kafka assigns to partition (hash("ord-abc") % 3) = partition 1
orders-subgraph-group consumer on orders-pod-3 holds partition 1

orders-pod-3 receives Kafka message:
  - SMEMBERS subindex:ord-abc → ["sub-xyz"]
  - HGETALL substate:sub-xyz → { callbackUrl: "http://router-0:4000/callback/sub-xyz", verifier: "ver-123", ... }
  - POST http://router-0:4000/callback/sub-xyz
    { "kind": "subscription", "action": "next", "id": "sub-xyz", "verifier": "ver-123",
      "payload": { "data": { "orderStatusChanged": { orderId: "ord-abc", newStatus: "CONFIRMED", ... } } } }

router-0 receives callback:
  - verifies subscriptionId + verifier
  - writes multipart chunk to open HTTP response:
    --graphql
    Content-Type: application/json
    { "data": { "orderStatusChanged": { orderId: "ord-abc", newStatus: "CONFIRMED" } } }
    --graphql

Browser receives chunk → Apollo Client parses → React re-renders with "CONFIRMED"

Step 4: Terminal state (DELIVERED)

Auto-progression fires: newStatus = "DELIVERED"
(same delivery path as Step 3)

orders-pod delivers event, then checks terminal state:
  - status === "DELIVERED" → send complete action:
    POST http://router-0:4000/callback/sub-xyz
    { "kind": "subscription", "action": "complete", ... }
  - deleteSubscription("sub-xyz", "subindex:ord-abc")
    → DEL substate:sub-xyz
    → SREM subindex:ord-abc sub-xyz

router-0 receives complete action:
  - writes final multipart boundary:
    --graphql--
  - closes the HTTP response

Browser: Apollo Client receives stream end → onComplete callback fires

Step 5: Heartbeat (every 20 seconds, concurrently)

subscription-manager-1 (leader):
  - SCAN substate:* → ["sub-xyz"] (while subscription is active)
  - HGETALL substate:sub-xyz → { callbackUrl, verifier, ... }
  - POST http://router-0:4000/callback/sub-xyz
    { "kind": "subscription", "action": "check", "id": "sub-xyz", "verifier": "ver-123" }
  - router-0 responds: 204
  - EXPIRE substate:sub-xyz 60s  (renew TTL)

15. Failure Modes and Recovery

Router pod crashes

Scenario: router-0 crashes while client has an active subscription

What happens:
  - Client loses the HTTP connection
  - subscription-manager heartbeats router-0:4000/callback/sub-xyz → TCP error / timeout
  - subscription-manager logs: "Heartbeat network error for sub sub-xyz"
  - (within 60s) substate:sub-xyz TTL expires → Redis cleans up automatically

Client recovery:
  - Apollo Client detects connection drop
  - Reconnects via nginx → router-1 (or new router-0)
  - Router-1 creates a new subscription → new callback URL stored in Redis

Orders pod crashes

Scenario: orders-pod-2 crashes mid-delivery

What happens:
  - Kafka consumer group rebalances
  - orders-pod-3 takes over all 3 partitions
  - Unacknowledged messages are redelivered to pod-3
  - Delivery continues (Redis state is intact)

Note: if pod-2 crashed after consuming but before POSTing to Router,
that event will be redelivered (at-most-once guarantee not violated
because the first POST never happened).

Subscription-manager leader crashes

Scenario: subscription-manager-1 (leader) crashes

What happens:
  - sub:manager:lock TTL expires after 35s
  - subscription-manager-2 (standby) acquires lock: SET sub:manager:lock "mgr-2" NX EX 35
  - Standby logs: "🔒 Leader acquired: manager-<hostname>-<pid>"
  - Heartbeats resume from standby

If heartbeats pause for > 30s (Router heartbeat_interval):
  - Router drops the subscription → 404 on next heartbeat
  - subscription-manager-2 receives 404 → cleans up Redis
  - Client must re-subscribe

Subgraph crashes (Redis state survives)

Scenario: all orders pods crash and restart

What happens:
  - Kafka consumer group has no members → no rebalance, no consumption
  - subscription-manager heartbeats continue → Router stays alive
  - Orders pods restart → Kafka consumer group rebalances → consumption resumes
  - Any in-flight events (produced but not consumed) are delivered after restart

Redis state (substate:*) is unaffected by subgraph restarts.

Quick Reference: Port Map

Port Service Protocol
3000 React web app HTTP
4000 nginx → Apollo Router (client-facing) HTTP multipart
4001 Notifications subgraph HTTP / GraphQL
4002 Users subgraph HTTP / GraphQL
4003 Orders subgraph HTTP / GraphQL
4317 OTel Collector gRPC gRPC
4318 OTel Collector HTTP HTTP
6379 Redis Redis protocol
8080 Kafka UI HTTP
9090 Prometheus HTTP
9092 Kafka Kafka protocol
9411 Zipkin HTTP
3001 Grafana HTTP

Quick Reference: Redis Key Space

Key pattern Type TTL Owner Purpose
substate:{subscriptionId} Hash 60s orders subgraph Subscription metadata
subindex:{orderId} Set 1h orders subgraph Reverse index: orderId → subscriptionIds
sub:manager:lock String 35s subscription-manager Leader election lock
apq:* String varies Apollo Router (APQ) Cached query documents