Skip to content

Commit 3bfd1b3

Browse files
authored
feat(webhooks): emit lifecycle events to a configured endpoint (phase 1) (#257)
Generalizes outbound webhook delivery into a unified lifecycle-event bus (#256 phase 1). Handlers publish typed events; a dispatcher signs and delivers them asynchronously, off the request path, with bounded retry and an SSRF guard — the same posture as the existing cert-expiry alerter, now shared. - internal/webhook: Event envelope {id,type,created_at,data} + Dispatcher (async queue, linear-backoff retry, HMAC-SHA256 signing, X-Nebula-Event/ Delivery/Signature headers, request-time SSRF guard, event-type allowlist, graceful Close). Emit is non-blocking and nil-safe. - config: a webhooks block (enabled/url/hmac_secret/allow_private/events) with the URL SSRF-validated at load; validateWebhookURL generalized to name the offending key so alerts and webhooks share it. - api.Server: a narrow EventEmitter seam (no webhook import) + nil-safe emit. Handlers emit host.enrolled (enroll), host.blocked/unblocked (block/unblock), host.deleted (delete), and cert.rotated (in-place re-sign). - serve.go: build the dispatcher when enabled, inject it into the API server, drain it on shutdown, and fold cert.expiring into the bus via an adapter sink on the cert-expiry scanner. - api/openapi.yaml: a 3.1 webhooks block documenting the six events; a contract test drives the real handlers and validates every emitted payload against the schemas (drift breaks the build). - docs/webhooks.md: configuration, signature verification, delivery semantics. Phase 2 (managed subscriptions, ca.expiring, durable dead-letter queue) stays tracked in #256. Refs #256
1 parent d041317 commit 3bfd1b3

14 files changed

Lines changed: 1222 additions & 12 deletions

api/openapi.yaml

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,90 @@ paths:
198198
items:
199199
$ref: "#/components/schemas/AuditEntry"
200200

201+
# Outbound webhooks the server POSTs to an operator-configured endpoint (#256).
202+
# Each delivery carries X-Nebula-Event (type), X-Nebula-Delivery (id), and
203+
# X-Nebula-Signature (sha256=<hmac> over the raw body). The handler/scanner emit
204+
# shapes are validated against these schemas by the contract tests.
205+
webhooks:
206+
host.enrolled:
207+
post:
208+
operationId: webhookHostEnrolled
209+
summary: A host completed enrollment and received its first certificate.
210+
requestBody:
211+
required: true
212+
content:
213+
application/json:
214+
schema:
215+
$ref: "#/components/schemas/HostEnrolledEvent"
216+
responses:
217+
"2XX":
218+
description: Acknowledged.
219+
host.blocked:
220+
post:
221+
operationId: webhookHostBlocked
222+
summary: A host was blocked and its certificate revoked.
223+
requestBody:
224+
required: true
225+
content:
226+
application/json:
227+
schema:
228+
$ref: "#/components/schemas/HostEvent"
229+
responses:
230+
"2XX":
231+
description: Acknowledged.
232+
host.unblocked:
233+
post:
234+
operationId: webhookHostUnblocked
235+
summary: A previously blocked host was unblocked.
236+
requestBody:
237+
required: true
238+
content:
239+
application/json:
240+
schema:
241+
$ref: "#/components/schemas/HostEvent"
242+
responses:
243+
"2XX":
244+
description: Acknowledged.
245+
host.deleted:
246+
post:
247+
operationId: webhookHostDeleted
248+
summary: A host was deleted and its certificate revoked.
249+
requestBody:
250+
required: true
251+
content:
252+
application/json:
253+
schema:
254+
$ref: "#/components/schemas/HostEvent"
255+
responses:
256+
"2XX":
257+
description: Acknowledged.
258+
cert.rotated:
259+
post:
260+
operationId: webhookCertRotated
261+
summary: A host certificate was re-signed (in-place rotation).
262+
requestBody:
263+
required: true
264+
content:
265+
application/json:
266+
schema:
267+
$ref: "#/components/schemas/CertRotatedEvent"
268+
responses:
269+
"2XX":
270+
description: Acknowledged.
271+
cert.expiring:
272+
post:
273+
operationId: webhookCertExpiring
274+
summary: A host certificate is approaching expiry without renewal.
275+
requestBody:
276+
required: true
277+
content:
278+
application/json:
279+
schema:
280+
$ref: "#/components/schemas/CertExpiringEvent"
281+
responses:
282+
"2XX":
283+
description: Acknowledged.
284+
201285
components:
202286
securitySchemes:
203287
bearerAuth:
@@ -414,3 +498,109 @@ components:
414498
type: string
415499
details:
416500
type: string
501+
502+
# --- Webhook event envelopes (#256) ---
503+
504+
HostEventData:
505+
type: object
506+
required: [host_id, host_name, network_id, ca_id]
507+
properties:
508+
host_id:
509+
type: string
510+
host_name:
511+
type: string
512+
network_id:
513+
type: string
514+
ca_id:
515+
type: string
516+
517+
HostEnrolledData:
518+
type: object
519+
required: [host_id, host_name, network_id, ca_id, fingerprint]
520+
properties:
521+
host_id:
522+
type: string
523+
host_name:
524+
type: string
525+
network_id:
526+
type: string
527+
ca_id:
528+
type: string
529+
fingerprint:
530+
type: string
531+
532+
CertExpiringData:
533+
type: object
534+
required: [host_id, host_name, network_id, ca_id, fingerprint, not_after, seconds_until_expiry]
535+
properties:
536+
host_id:
537+
type: string
538+
host_name:
539+
type: string
540+
network_id:
541+
type: string
542+
ca_id:
543+
type: string
544+
fingerprint:
545+
type: string
546+
not_after:
547+
type: string
548+
format: date-time
549+
seconds_until_expiry:
550+
type: number
551+
552+
HostEvent:
553+
type: object
554+
required: [id, type, created_at, data]
555+
properties:
556+
id:
557+
type: string
558+
type:
559+
type: string
560+
created_at:
561+
type: string
562+
format: date-time
563+
data:
564+
$ref: "#/components/schemas/HostEventData"
565+
566+
HostEnrolledEvent:
567+
type: object
568+
required: [id, type, created_at, data]
569+
properties:
570+
id:
571+
type: string
572+
type:
573+
type: string
574+
created_at:
575+
type: string
576+
format: date-time
577+
data:
578+
$ref: "#/components/schemas/HostEnrolledData"
579+
580+
CertRotatedEvent:
581+
type: object
582+
required: [id, type, created_at, data]
583+
properties:
584+
id:
585+
type: string
586+
type:
587+
type: string
588+
created_at:
589+
type: string
590+
format: date-time
591+
data:
592+
$ref: "#/components/schemas/HostEnrolledData"
593+
594+
CertExpiringEvent:
595+
type: object
596+
required: [id, type, created_at, data]
597+
properties:
598+
id:
599+
type: string
600+
type:
601+
type: string
602+
created_at:
603+
type: string
604+
format: date-time
605+
data:
606+
$ref: "#/components/schemas/CertExpiringData"

docs/webhooks.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Webhooks
2+
3+
The server can POST lifecycle events to an operator-configured HTTP endpoint so
4+
external systems react to mesh changes without polling — inventory/CMDB sync on
5+
enrollment, SOC alerting on block/revoke, automation on cert rotation. Delivery
6+
is asynchronous (off the request path), signed, and SSRF-guarded.
7+
8+
The machine-readable contract for every event lives in
9+
[`api/openapi.yaml`](../api/openapi.yaml) under `webhooks:`; the contract tests
10+
validate the real emitted payloads against it.
11+
12+
## Enabling
13+
14+
```yaml
15+
# server.yml
16+
webhooks:
17+
enabled: true
18+
url: https://hooks.example.com/nebula
19+
hmac_secret: "<random secret>" # optional; signs each delivery
20+
events: [host.enrolled, host.blocked] # optional; empty = all events
21+
allow_private: false # set true only for an intentional internal sink
22+
```
23+
24+
`url` is validated at startup (must be http/https; loopback/private/link-local
25+
targets are rejected unless `allow_private: true`).
26+
27+
## Events (phase 1)
28+
29+
| Event | Fires when | `data` fields |
30+
|---|---|---|
31+
| `host.enrolled` | a host completes enrollment | `host_id`, `host_name`, `network_id`, `ca_id`, `fingerprint` |
32+
| `host.blocked` | a host is blocked (cert revoked) | `host_id`, `host_name`, `network_id`, `ca_id` |
33+
| `host.unblocked` | a blocked host is unblocked | same as `host.blocked` |
34+
| `host.deleted` | a host is deleted (cert revoked) | same as `host.blocked` |
35+
| `cert.rotated` | a host cert is re-signed in place | host fields + new `fingerprint` |
36+
| `cert.expiring` | a cert approaches expiry without renewal | host fields + `fingerprint`, `not_after`, `seconds_until_expiry` |
37+
38+
`cert.expiring` is produced by the cert-expiry scanner, so it requires
39+
`alerts.enabled: true` (the scanner) in addition to `webhooks.enabled`. The
40+
other events come from the API handlers and need only the webhooks block.
41+
42+
## Delivery format
43+
44+
Every delivery is `POST <url>` with `Content-Type: application/json` and a body:
45+
46+
```json
47+
{
48+
"id": "evt_2f1c…",
49+
"type": "host.blocked",
50+
"created_at": "2026-06-13T12:00:00Z",
51+
"data": { "host_id": "…", "host_name": "…", "network_id": "…", "ca_id": "…" }
52+
}
53+
```
54+
55+
Headers:
56+
57+
- `X-Nebula-Event` — the event type (also in the body).
58+
- `X-Nebula-Delivery` — the unique event id; use it to deduplicate (delivery is at-least-once).
59+
- `X-Nebula-Signature` — `sha256=<hex>`, the HMAC-SHA256 of the raw body under `hmac_secret` (present only when a secret is set).
60+
61+
## Verifying the signature
62+
63+
Compute `HMAC-SHA256(hmac_secret, raw_request_body)` and compare, in constant
64+
time, against the hex in `X-Nebula-Signature` (after the `sha256=` prefix).
65+
Reject deliveries that do not match.
66+
67+
## Reliability
68+
69+
- **Asynchronous**: events are queued and delivered by a background worker; a
70+
slow or down receiver never blocks an API request.
71+
- **Retried**: a failed delivery (connection error or HTTP ≥ 400) is retried a
72+
few times with linear backoff. After the retries are exhausted the event is
73+
logged and dropped (a durable dead-letter queue is a later phase).
74+
- **Best-effort under load**: if the in-memory queue is full the event is
75+
dropped with a logged warning rather than stalling the server. Treat webhooks
76+
as a notification stream, not a system of record — the audit log remains the
77+
authoritative history.
78+
79+
## Not in phase 1
80+
81+
- Managed subscriptions (multiple endpoints, CRUD, per-subscription status) —
82+
phase 1 is a single config-driven subscription.
83+
- `ca.expiring` and other CA-lifecycle events.
84+
- A persistent dead-letter queue and delivery dashboards.

internal/api/enroll.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,13 @@ func (s *Server) handleEnroll(w http.ResponseWriter, r *http.Request) {
236236
}
237237

238238
s.metrics.recordEnrollment(resultOK)
239+
s.emit("host.enrolled", map[string]any{
240+
"host_id": host.ID,
241+
"host_name": host.Name,
242+
"network_id": host.NetworkID,
243+
"ca_id": host.CAID,
244+
"fingerprint": fp,
245+
})
239246
writeJSON(w, http.StatusOK, enrollResponse{
240247
CertificatePEM: string(certPEM),
241248
CACertificatePEM: string(caCertPEM),

0 commit comments

Comments
 (0)