Skip to content

Emit lifecycle webhooks for mesh events (generalize the alerts webhook) + document them in the OpenAPI 3.1 webhooks block #256

@juev

Description

@juev

Summary

We already have most of a webhook delivery mechanism: internal/alerts defines a Sink interface and a WebhookSink that POSTs JSON with an HMAC-SHA256 signature, SSRF-guarded at request time (#243). But it is narrow — it only ever fires for one event, cert.expiring, from the cert-expiry scanner, and the payload is a flat cert-expiry shape. There is also an in-process web.EventBus (host-seen events fan out to SSE browser tabs), i.e. a second, separate event path.

This issue proposes generalizing those into a single lifecycle-event webhook bus: handlers publish typed events; webhook delivery is one subscriber (SSE is another). The OpenAPI 3.1 webhooks block (now available after #254) documents each event as a machine-readable contract for subscribers, and the contract tests validate emitted payloads against it.

Why

For a CA/control-plane, mesh state changes are exactly what operators want to react to without polling: inventory/CMDB sync on enrollment, SOC alerting on revocation/block, automation on cert/CA rotation, paging on CA expiry. Today none of that is possible except the single cert-expiry alert.

Event model

A typed envelope (Stripe/GitHub-style), signed and idempotent:

{
  "id": "evt_<uuid>",
  "type": "host.enrolled",
  "created_at": "2026-06-13T12:00:00Z",
  "data": { "...": "type-specific" }
}
  • Reuse the existing HMAC-SHA256 signature over the raw body.
  • Headers: X-Nebula-Event: <type>, X-Nebula-Delivery: <id> (idempotency key).
  • Keep the SSRF guard and the private-network opt-out.

Candidate events (grounded in existing handlers)

Start with the high-value, low-noise set:

  • host.enrolled — first cert issued (CMDB/inventory, downstream provisioning)
  • host.blocked / host.unblocked — security automation / SIEM
  • host.revoked — durable revocation (SOC alerting)
  • cert.rotated — host cert rotation
  • cert.expiring — fold the existing alert into the bus
  • ca.expiring — mesh-wide trust-anchor expiry (paging-worthy)

Later: host.deleted, host.rekey_required, ca.created/ca.rotated, operator.created, operator.api_key.created/revoked.

Delivery semantics (gaps vs today)

  • At-least-once with bounded retry + backoff. The current WebhookSink fires once with no retry; a flaky receiver drops the event. Add retries and an audit/dead-letter on permanent failure.
  • Async / non-blocking. Publish from handlers into the bus; deliver in the background (mirror the scanner goroutine / EventBus pattern) so the request path is never blocked on a slow receiver.
  • Per-subscription event-type filter so a subscriber gets only what it asked for.

Management — two phases

  • Phase 1 (config-driven): a global webhook subscription in server.yml (URL, secret, event-type allowlist), reusing the alerts config shape. Generalize alerts.Alert → an Event{Type, CreatedAt, Data}, broaden the Sink/bus, emit from the relevant handlers, document the events in api/openapi.yaml webhooks:, and extend the contract tests to validate emitted payloads against those schemas.
  • Phase 2 (managed subscriptions): a webhook_subscriptions table + REST CRUD + Web UI (URL, secret, events, active, last-delivery status), and delivery observability.

Acceptance (phase 1)

  • A typed event bus that handlers publish to; webhook + audit are sinks.
  • At least the six core events emit and deliver to a configured endpoint, signed + idempotent, with retry/backoff.
  • api/openapi.yaml documents the events under webhooks:; contract tests validate real emitted payloads against the schemas.
  • Existing cert.expiring behavior preserved.
  • go test -race, make gosec, make govulncheck, make lint green.

Depends on

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions