Skip to content

Latest commit

 

History

History
356 lines (277 loc) · 16.3 KB

File metadata and controls

356 lines (277 loc) · 16.3 KB

Application Architecture

Webhook Debugger & Logger is a high-performance, stateless-first application designed to run on the Apify Platform. It leverages an in-memory SQL analytics engine (DuckDB) for query performance while using Apify's Dataset and Key-Value Store for durability and persistence.


High-Level Overview

The application follows a Modular Monolith architecture with distinct layers for ingestion, processing, storage, and presentation.

Component Technology Purpose Resilience Strategy
Runtime Node.js 20+ (ESM) Core execution environment Graceful shutdown handling
Web Server Express.js HTTP routing, middleware pipeline Rate limiting, timeouts
Read Model DuckDB OLAP queries, filtering, aggregation Disposable: rebuilds from Dataset on start
Write Model Apify Dataset Append-only log of all events Persistent Source of Truth
Sync Layer SyncService + EventEmitter Event-driven real-time + batch catch-up Auto-recovery on restart
State Store Apify KVS Webhook lifecycle, config, large payloads Graceful degradation

The DuckDB read model is intentionally disposable. If it fails to initialize, the application still starts — ingestion works via the Apify Dataset. The read model is rebuilt from the Dataset on restart.


System Overview

Webhook Debugger & Logger is an Apify Actor that generates temporary webhook endpoints for testing, debugging, and mocking webhook integrations. It uses a CQRS (Command Query Responsibility Segregation) architecture with an event-driven sync layer.

graph TB
    subgraph Clients
        WH[Webhook Senders<br/>Stripe, GitHub, Shopify]
        DASH[Dashboard UI]
        API[API Consumers]
    end

    subgraph "Express Server (main.js)"
        direction TB

        subgraph "Middleware Chain"
            REQ_ID[RequestID Middleware]
            CSP[CSP / Security Headers]
            CORS[CORS]
            COMP[Compression<br/>SSE excluded]
            BODY[Dynamic BodyParser<br/>managed by AppState]
        end

        subgraph "Ingestion Path (configurable auth/IP controls)"
            INGEST["LoggerMiddleware.ingestMiddleware<br/>• Recursion detection<br/>• Per-webhook rate limiting<br/>• Streaming KVS offload<br/>• Signature verification"]
            MW["LoggerMiddleware.middleware<br/>• Validation (webhook, auth, IP allowlist)<br/>• Data preparation<br/>• Custom script execution<br/>• Response generation"]
        end

        subgraph "Dashboard + Management Path (auth + rate limited when authKey is set)"
            AUTH[Auth Middleware]
            RL[RateLimiter]
            LOGS[GET /logs]
            DETAIL[GET /logs/:logId]
            REPLAY[POST /replay/:webhookId/:itemId?url=https://...]
            STREAM[GET /log-stream]
            INFO[GET /info]
            METRICS[GET /system/metrics]
            DASHBOARD_R["GET / Dashboard"]
        end

        subgraph "Probe Path (rate limited, no auth)"
            HEALTH[GET /health]
            READY[GET /ready]
        end
    end

    subgraph "Background Services"
        FWD["ForwardingService<br/>• SSRF validation<br/>• Circuit breaker<br/>• Exponential backoff<br/>• Connection pooling"]
        CB[CircuitBreaker<br/>hostname-level]
        ALERT["Alerting<br/>• SSRF validation<br/>• Slack webhooks<br/>• Discord webhooks"]
        SYNC["SyncService<br/>event-driven + batch"]
    end

    subgraph "State Management"
        APPSTATE["AppState<br/>• Auth key<br/>• Body parser<br/>• Rate limiter<br/>• Replay config"]
        HOT["HotReloadManager<br/>• KVS polling (platform)<br/>• fs.watch (local dev)"]
        WM["WebhookManager<br/>• Lifecycle<br/>• Persistence<br/>• Cleanup"]
    end

    subgraph "Data Layer"
        DS[("Apify Dataset<br/>(Write Model)<br/>Source of Truth")]
        DUCK[("DuckDB<br/>(Read Model)<br/>Disposable")]
        KVS[("Apify KVS<br/>• Config state<br/>• Large payloads<br/>• Webhook state")]
    end

    WH -->|"POST /webhook/:id"| INGEST
    INGEST --> MW
    MW -->|background| FWD
    MW -->|background| ALERT
    MW -->|"Actor.pushData"| DS
    MW -->|"appEvents.emit"| SYNC

    DASH --> DASHBOARD_R
    API --> AUTH

    FWD --> CB
    HOT -->|config change| APPSTATE
    APPSTATE -->|propagate| BODY
    APPSTATE -->|propagate| RL

    SYNC -->|"batch insert"| DUCK
    DS -->|"catch-up sync"| SYNC

    LOGS -->|query| DUCK
    DETAIL -->|query| DUCK
    REPLAY -->|query + forward| DUCK
    WM -->|persist| KVS
Loading

Module Dependency Graph

graph LR
    subgraph "Entry Point"
        MAIN[main.js]
    end

    subgraph "Core"
        LM[logger_middleware.js]
        WMG[webhook_manager.js]
    end

    subgraph "Services"
        FWD_S[ForwardingService]
        SYNC_S[SyncService]
        CB_S[CircuitBreaker]
    end

    subgraph "Data Access"
        DB[db/duckdb.js]
        REPO[LogRepository]
    end

    subgraph "Utilities"
        CONFIG[config.js]
        SSRF[ssrf.js]
        SIG[signature.js]
        SCRIPT_EXEC[custom_script_executor.js<br/>custom_script_worker.js]
        AUTH_U[auth.js]
        BOOT[bootstrap.js]
        STORE[storage_helper.js]
        EVENTS[events.js]
        COMMON[common.js]
        LOGGER[logger.js]
        ENV[env.js]
        ALERTING[alerting.js]
    end

    subgraph "Constants (13 files)"
        CONSTS[app, http, database,<br/>security, errors, messages,<br/>logging, storage, network,<br/>auth, ui, alerting]
    end

    MAIN --> LM
    MAIN --> WMG
    MAIN --> CONFIG
    MAIN --> BOOT

    LM --> FWD_S
    LM --> SIG
    LM --> SCRIPT_EXEC
    LM --> STORE
    LM --> ALERTING
    LM --> EVENTS

    FWD_S --> CB_S
    FWD_S --> SSRF
    ALERTING --> SSRF

    SYNC_S --> REPO
    SYNC_S --> EVENTS

    REPO --> DB

    CONFIG --> ENV
    SSRF --> CONSTS
    LOGGER --> CONSTS
Loading

Data Flow

Webhook Ingestion (Write Path)

Incoming Request
    │
    ▼
LoggerMiddleware.ingestMiddleware
    ├── Recursion check (header loop detection)
    ├── Per-webhook rate limiting (Token Bucket)
    ├── Content-Length > limit? → 413
    ├── Content-Length > KVS threshold? → Stream to KVS
    └── next()
    │
    ▼
LoggerMiddleware.middleware
    ├── Validate webhook ID, IP, auth
    ├── Prepare data (parse, redact, encode)
    ├── Signature verification (if configured)
    ├── Custom script execution (worker-isolated vm context)
    ├── Measure processingTime before any simulated response delay
    ├── Send HTTP response to caller
    └── Background tasks (fire-and-forget with timeout):
        ├── Actor.pushData(event)  → Dataset (Write Model)
        ├── appEvents.emit('log:received')  → SyncService
        ├── ForwardingService.forwardWebhook()  → Target URL
        └── triggerAlertIfNeeded()  → Slack/Discord

Log Query (Read Path)

API Request (GET /logs, /logs/:logId)
    │
    ▼
Auth + Rate Limit middleware
    │
    ▼
Route handler
    │
    ▼
LogRepository (parameterized SQL)
    │
    ▼
DuckDB (in-process, connection pooled)
    │
    ▼
JSON response

Key Architectural Decisions

1. Disposable Read Model

DuckDB is treated as ephemeral. On startup, SyncService catches up from the Apify Dataset. This means the system tolerates DuckDB failures without data loss.

Read-model teardown is also restart-safe: reset logic waits for active DuckDB operations to drain before it closes pooled and in-use connections, so repeated initialize/shutdown cycles do not leave stale handles behind. The reset path also invalidates any in-flight singleton initializer before it can republish a late DuckDB instance after teardown, which keeps repeated reset/reinitialize cycles from reviving stale cached handles. Production shutdown follows the same ordering: the HTTP listener is drained before SyncService and DuckDB teardown begin, which prevents in-flight handlers and readiness probes from racing read-model disposal.

2. Event-Driven Sync

SyncService listens to appEvents for real-time inserts and uses batch catch-up for gap recovery. This provides near-real-time query availability without coupling the write path to the read path.

SyncService shutdown is retry-safe. If limiter shutdown succeeds but disconnect cleanup fails, the service retains the stale limiter long enough to finish cleanup on a later stop() or start() call instead of issuing a second incompatible limiter stop.

3. Connection Pooling + Write Serialization

DuckDB connections are pooled (configurable size). All write operations go through a Bottleneck queue (maxConcurrent: 1) to prevent "Database Locked" errors. Reads are parallel.

When the singleton is reset, new DB callers are held behind a reset gate. Active DuckDB operations are allowed to complete before pooled and in-use connections are closed, but waiting jobs on the serialized write queue are not guaranteed to drain during teardown and may be dropped as part of reset. Connection shutdown is coordinated with the active-operation tracker so teardown closes pooled handles only after in-flight reads and currently executing DB work complete. This keeps test restarts and hot lifecycle rebuilds from racing the pool while accurately reflecting the current write-queue reset behavior.

4. Circuit Breaker for Forwarding

ForwardingService uses a hostname-level circuit breaker. After consecutive failures, requests to the same host are blocked for a cooldown period. This prevents cascading failures to dead downstream services.

5. Hot-Reload Configuration

HotReloadManager watches for config changes via:

  • Platform: KVS polling at configurable intervals
  • Local dev: fs.watch on the INPUT.json file

Config changes propagate through AppState.applyConfigUpdate() which updates body parser limits, rate limiters, auth keys, retention, replay settings, and more — all without restart.

When JSON Schema validation is enabled, compiled validators are cached and reused while the effective schema stays unchanged. Stable object schemas also reuse memoized cache keys, which keeps hot-reload flexibility without paying repeated compilation cost on the webhook request path.

Retention updates are intentionally non-destructive for active webhooks. The current implementation extends existing expiry timestamps when retention increases instead of shortening live webhook lifetimes.

responseDelayMs remains a simulation layer for downstream timeout testing. The runtime records processingTime before that delay is applied so the stored metric reflects server-side work rather than synthetic waiting time.

6. Streaming Large Payload Offload

Payloads exceeding the KVS offload threshold are streamed directly to Apify KVS before body-parser runs. The log entry stores a reference body with a public URL to the original payload.

7. Worker-Isolated Custom Scripts

customScript is syntax-validated up front, then executed inside a disposable worker thread with a dedicated vm context. The isolate receives only the mutable event, a safe copy of req, console, and HTTP_STATUS.

This keeps the main request handler isolated from untrusted webhook transformation code while still allowing useful response shaping.


Security Architecture

Trust Boundaries

┌─────────────────────────────────────────────────┐
│  Internet (untrusted)                           │
│  ┌───────────────────────────────────────────┐  │
│  │  Ingestion Endpoint /webhook/:id          │  │
│  │  • IP whitelist (optional)                │  │
│  │  • Per-webhook rate limiting              │  │
│  │  • Payload size limits                    │  │
│  │  • Recursion detection                    │  │
│  │  • Auth key (optional)                    │  │
│  └───────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────┐  │
│  │  Dashboard + Management API               │  │
│  │  • API key authentication (when enabled)  │  │
│  │  • Per-IP rate limiting                   │  │
│  │  • CSP headers on dashboard               │  │
│  │  • Security headers (HSTS, etc.)          │  │
│  └───────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────┐  │
│  │  Outbound (forwarding, alerts, replay)    │  │
│  │  • SSRF validation (DNS + CIDR check)     │  │
│  │  • Circuit breaker                        │  │
│  │  • AbortController timeouts               │  │
│  └───────────────────────────────────────────┘  │
│  ┌───────────────────────────────────────────┐  │
│  │  Custom scripting isolate                 │  │
│  │  • Worker-thread execution boundary       │  │
│  │  • Safe copied req/event inputs           │  │
│  │  • eval / Function disabled               │  │
│  │  • Timeout + memory resource limits       │  │
│  └───────────────────────────────────────────┘  │
└─────────────────────────────────────────────────┘

Custom Script Trust Boundary

  • Custom scripts never receive the live Express request object.
  • The isolate does not inject process, require, filesystem, or outbound networking primitives.
  • codeGeneration.strings = false and codeGeneration.wasm = false prevent dynamic code generation from strings.
  • Runtime failures are serialized back to the main process for logging, and the request pipeline continues with the actor's normal fallback behavior.

Defense-in-Depth Layers

  1. Network: SSRF prevention on all outbound URLs, IP whitelisting on ingestion.
  2. Transport: HSTS enforcement, X-Frame-Options, Permissions-Policy.
  3. Application: Auth middleware, rate limiting, payload size limits, recursion detection.
  4. Data: Parameterized SQL, JSON key sanitization, body redaction, header masking.
  5. Output: XSS prevention via CSP + escapeHtml(), generic error messages to callers.

Scalability Characteristics

Dimension Approach Limits
Concurrent webhooks Dynamic URL generation (up to MAX_BULK_CREATE) Configurable
Ingestion throughput Node.js event loop + streaming offload for large payloads Single-process
Query performance DuckDB in-process analytics with indexes and connection pooling Memory-bound
Write throughput Bottleneck queue (serial writes) ~1000s ops/sec
Forwarding Circuit breaker + connection pooling + retries Per-host breakers
Memory Configurable via useFixedMemory + fixedMemoryMbytes Platform-limited