Summary
We already have most of a webhook delivery mechanism: internal/alerts defines a Sink interface and a WebhookSink that POSTs JSON with an HMAC-SHA256 signature, SSRF-guarded at request time (#243). But it is narrow — it only ever fires for one event, cert.expiring, from the cert-expiry scanner, and the payload is a flat cert-expiry shape. There is also an in-process web.EventBus (host-seen events fan out to SSE browser tabs), i.e. a second, separate event path.
This issue proposes generalizing those into a single lifecycle-event webhook bus: handlers publish typed events; webhook delivery is one subscriber (SSE is another). The OpenAPI 3.1 webhooks block (now available after #254) documents each event as a machine-readable contract for subscribers, and the contract tests validate emitted payloads against it.
Why
For a CA/control-plane, mesh state changes are exactly what operators want to react to without polling: inventory/CMDB sync on enrollment, SOC alerting on revocation/block, automation on cert/CA rotation, paging on CA expiry. Today none of that is possible except the single cert-expiry alert.
Event model
A typed envelope (Stripe/GitHub-style), signed and idempotent:
{
"id": "evt_<uuid>",
"type": "host.enrolled",
"created_at": "2026-06-13T12:00:00Z",
"data": { "...": "type-specific" }
}
- Reuse the existing HMAC-SHA256 signature over the raw body.
- Headers:
X-Nebula-Event: <type>, X-Nebula-Delivery: <id> (idempotency key).
- Keep the SSRF guard and the private-network opt-out.
Candidate events (grounded in existing handlers)
Start with the high-value, low-noise set:
host.enrolled — first cert issued (CMDB/inventory, downstream provisioning)
host.blocked / host.unblocked — security automation / SIEM
host.revoked — durable revocation (SOC alerting)
cert.rotated — host cert rotation
cert.expiring — fold the existing alert into the bus
ca.expiring — mesh-wide trust-anchor expiry (paging-worthy)
Later: host.deleted, host.rekey_required, ca.created/ca.rotated, operator.created, operator.api_key.created/revoked.
Delivery semantics (gaps vs today)
- At-least-once with bounded retry + backoff. The current
WebhookSink fires once with no retry; a flaky receiver drops the event. Add retries and an audit/dead-letter on permanent failure.
- Async / non-blocking. Publish from handlers into the bus; deliver in the background (mirror the scanner goroutine / EventBus pattern) so the request path is never blocked on a slow receiver.
- Per-subscription event-type filter so a subscriber gets only what it asked for.
Management — two phases
- Phase 1 (config-driven): a global webhook subscription in
server.yml (URL, secret, event-type allowlist), reusing the alerts config shape. Generalize alerts.Alert → an Event{Type, CreatedAt, Data}, broaden the Sink/bus, emit from the relevant handlers, document the events in api/openapi.yaml webhooks:, and extend the contract tests to validate emitted payloads against those schemas.
- Phase 2 (managed subscriptions): a
webhook_subscriptions table + REST CRUD + Web UI (URL, secret, events, active, last-delivery status), and delivery observability.
Acceptance (phase 1)
- A typed event bus that handlers publish to; webhook + audit are sinks.
- At least the six core events emit and deliver to a configured endpoint, signed + idempotent, with retry/backoff.
api/openapi.yaml documents the events under webhooks:; contract tests validate real emitted payloads against the schemas.
- Existing
cert.expiring behavior preserved.
go test -race, make gosec, make govulncheck, make lint green.
Depends on
Summary
We already have most of a webhook delivery mechanism:
internal/alertsdefines aSinkinterface and aWebhookSinkthat POSTs JSON with an HMAC-SHA256 signature, SSRF-guarded at request time (#243). But it is narrow — it only ever fires for one event,cert.expiring, from the cert-expiry scanner, and the payload is a flat cert-expiry shape. There is also an in-processweb.EventBus(host-seen events fan out to SSE browser tabs), i.e. a second, separate event path.This issue proposes generalizing those into a single lifecycle-event webhook bus: handlers publish typed events; webhook delivery is one subscriber (SSE is another). The OpenAPI 3.1
webhooksblock (now available after #254) documents each event as a machine-readable contract for subscribers, and the contract tests validate emitted payloads against it.Why
For a CA/control-plane, mesh state changes are exactly what operators want to react to without polling: inventory/CMDB sync on enrollment, SOC alerting on revocation/block, automation on cert/CA rotation, paging on CA expiry. Today none of that is possible except the single cert-expiry alert.
Event model
A typed envelope (Stripe/GitHub-style), signed and idempotent:
{ "id": "evt_<uuid>", "type": "host.enrolled", "created_at": "2026-06-13T12:00:00Z", "data": { "...": "type-specific" } }X-Nebula-Event: <type>,X-Nebula-Delivery: <id>(idempotency key).Candidate events (grounded in existing handlers)
Start with the high-value, low-noise set:
host.enrolled— first cert issued (CMDB/inventory, downstream provisioning)host.blocked/host.unblocked— security automation / SIEMhost.revoked— durable revocation (SOC alerting)cert.rotated— host cert rotationcert.expiring— fold the existing alert into the busca.expiring— mesh-wide trust-anchor expiry (paging-worthy)Later:
host.deleted,host.rekey_required,ca.created/ca.rotated,operator.created,operator.api_key.created/revoked.Delivery semantics (gaps vs today)
WebhookSinkfires once with no retry; a flaky receiver drops the event. Add retries and an audit/dead-letter on permanent failure.Management — two phases
server.yml(URL, secret, event-type allowlist), reusing the alerts config shape. Generalizealerts.Alert→ anEvent{Type, CreatedAt, Data}, broaden theSink/bus, emit from the relevant handlers, document the events inapi/openapi.yamlwebhooks:, and extend the contract tests to validate emitted payloads against those schemas.webhook_subscriptionstable + REST CRUD + Web UI (URL, secret, events, active, last-delivery status), and delivery observability.Acceptance (phase 1)
api/openapi.yamldocuments the events underwebhooks:; contract tests validate real emitted payloads against the schemas.cert.expiringbehavior preserved.go test -race,make gosec,make govulncheck,make lintgreen.Depends on
webhooksblock. Done/in-flight.