Skip to content

Latest commit

 

History

History
250 lines (202 loc) · 13.8 KB

File metadata and controls

250 lines (202 loc) · 13.8 KB

AiSOC Hosted Demo (Fly.io)

This directory contains the infrastructure-as-code for the public demo at tryaisoc.com, deployed on Fly.io.

Three public hostnames front the stack:

Hostname Fly app Purpose
tryaisoc.com aisoc-demo-web Next.js UI (apex/root domain)
api.tryaisoc.com aisoc-demo-api FastAPI: /health, /api/v1/*
ws.tryaisoc.com aisoc-demo-realtime WebSocket fanout (wss://)

Why three hostnames instead of routing everything through tryaisoc.com: the realtime service speaks raw WebSocket which Next.js rewrites can't proxy in production, and sending all /api/v1/* through the web app's machine would double latency. Splitting api/ws onto their own Fly certs is the standard pattern and keeps the browser's CORS/CSP boundary explicit.

Goal

A visitor clicks the README's "Live Demo" button and sees an AiSOC agent mid-investigation in under 60 seconds, with the full agent decision ledger streaming live — no signup, no install.

That sub-60s time-to-first-investigation (TTFI) is the headline number this stack is engineered for.

Architecture

    tryaisoc.com         api.tryaisoc.com      ws.tryaisoc.com
        │                       │                     │
        ▼                       ▼                     ▼
  ┌──────────────┐     ┌──────────────────┐   ┌──────────────────────┐
  │ aisoc-demo-  │     │ aisoc-demo-api   │   │ aisoc-demo-realtime  │
  │ web (Next.js)│     │ (FastAPI)        │   │ (WebSocket)          │
  │ shared-cpu-1x│     │ shared-cpu-1x    │   │ shared-cpu-1x · 0.5GB│
  │ 1GB · min=1  │     │ 1GB · min=1      │   │ auto_stop=off (WS)   │
  └──────────────┘     └──────────────────┘   └──────────────────────┘
        │                       │                     │
        └───── 6PN internal ────┴─────────────────────┘
                                │
                                ▼
                       │                │
                       └───────┬────────┘
                               ▼
              ┌──────────────────────────────────┐
              │  aisoc-demo-agents (LangGraph)   │
              │  shared-cpu-2x · 2GB · min=1     │
              │  AISOC_AGENT_MODE=deterministic  │
              └──────────────────────────────────┘
                               │
                ┌──────────────┴──────────────┐
                ▼                             ▼
   ┌────────────────────────┐   ┌──────────────────────┐
   │ Fly Postgres            │   │ Upstash Redis        │
   │ aisoc-demo-postgres     │   │ aisoc-demo-redis     │
   │ dev plan, 3GB volume    │   │ Free plan            │
   └────────────────────────┘   └──────────────────────┘

   ┌──────────────────────────────────────────────┐
   │  aisoc-demo-seed-cron (scheduled machine)    │  no public traffic
   │  Lives on the aisoc-demo-api app, runs       │
   │  daily at 00:00 UTC using the api image:     │
   │   1. python -m app.scripts.seed_demo         │
   │   2. seeder is idempotent — refreshes        │
   │      INC-RT-001 + 14 other canonical cases   │
   │   3. visitors get a hot demo at all times    │
   └──────────────────────────────────────────────┘

What's intentionally not here, to keep the demo lean:

Component Status Why
Kafka disabled Realtime uses Redis pub/sub on the demo path
ClickHouse disabled No analytics queries in the demo flow
OpenSearch disabled Detection rules ship with synthetic match payloads
Neo4j disabled Attack graph isn't on the canonical demo path
Qdrant disabled KB lookup uses the in-image Postgres + pg_trgm path

These get re-enabled the moment a self-hoster wants the full stack — see the root docker-compose.yml and services/*/Dockerfile.

Files

infra/fly/
├── README.md           — this file
├── fly-demo-deploy.sh  — orchestrator: provisions DB, deploys 4 apps, runs seed
├── api/fly.toml        — FastAPI core API (also hosts the seeder process)
├── agents/fly.toml     — LangGraph orchestrator + investigator agents
├── web/fly.toml        — Next.js console (public)
└── realtime/fly.toml   — WebSocket fanout

The seeder is not a separate app. It ships inside the api image as python -m app.scripts.seed_demo, which lets us run it three ways without maintaining a fifth Dockerfile or Fly app:

When How
Every deploy [deploy].release_command in infra/fly/api/fly.toml runs alembic upgrade head && python -m app.scripts.seed_demo on every flyctl deploy. Idempotent — a no-op against an already-seeded volume.
Post-deploy (bootstrap) flyctl ssh console -a aisoc-demo-api -C "python -m app.scripts.seed_demo" runs once on a live api machine. Belt-and-suspenders for first-time deploys.
Daily refresh (00:00 UTC) A scheduled Fly machine on the aisoc-demo-api app, named aisoc-demo-seed-cron, boots from the same api image, runs the same command, and exits.
Local recovery python -m app.scripts.seed_demo inside the api container of a docker-compose -f docker-compose.demo.yml stack — same module, same idempotency.

The canonical implementation lives in services/api/app/scripts/seed_demo.py. The seeder mints 15 incidents (ransomware/phishing/credential-access/lateral/ exfil/cloud) plus the in-flight INC-RT-001 LockBit 3.0 investigation that the onboarding deeplink targets.

The seed flow is the secret sauce for the TTFI budget:

On every deploy  ┌────────────────────────────────────────────────────┐
                 │ 1. flyctl deploy ships api/agents/realtime/web     │
                 │ 2. release_command runs alembic + seed_demo        │
                 │ 3. Postgres now contains INC-RT-001 + 14 others    │
                 │ 4. Visitors land at /cases/INC-RT-001?tab=ledger   │
                 │    with the agent already mid-stream.              │
                 └────────────────────────────────────────────────────┘

00:00 UTC daily  ┌────────────────────────────────────────────────────┐
                 │ 1. scheduled machine boots from api image          │
                 │ 2. runs `python -m app.scripts.seed_demo`          │
                 │ 3. Refreshes the showcase case for the next 24h    │
                 │    of visitors. All writes happen under the demo   │
                 │    tenant's RLS scope.                             │
                 └────────────────────────────────────────────────────┘

T+anytime        ┌────────────────────────────────────────────────────┐
                 │ Visitor lands at /cases/INC-RT-001?tab=ledger      │
                 │   - case is already CREATED                        │
                 │   - investigation_run is RUNNING or COMPLETED      │
                 │   - ledger has 20-50 events ready to stream        │
                 │   - playbook DAG mid-execution                     │
                 │ Time-to-first-investigation: 0s (already running). │
                 └────────────────────────────────────────────────────┘

First-time setup

# 1. Install flyctl + auth
brew install flyctl
flyctl auth login

# 2. Pick the org. The deploy script defaults to `personal` (each Fly user's
#    default org). Override with FLY_ORG=… if you're deploying under a team org.
export FLY_ORG=personal

# 3. Reserve app names (one-time, idempotent)
for app in aisoc-demo-api aisoc-demo-agents aisoc-demo-web \
           aisoc-demo-realtime; do
  flyctl apps create "$app" --org "$FLY_ORG" 2>/dev/null || true
done

# 4. Provision Postgres + Upstash + deploy everything + request TLS certs
./infra/fly/fly-demo-deploy.sh --provision

# 5. Add DNS at your provider (the deploy script prints the exact records):
#    tryaisoc.com.       CNAME  aisoc-demo-web.fly.dev.
#    api.tryaisoc.com.   CNAME  aisoc-demo-api.fly.dev.
#    ws.tryaisoc.com.    CNAME  aisoc-demo-realtime.fly.dev.
#
#    tryaisoc.com is an apex/root record. If your DNS provider doesn't support
#    CNAME at apex, use ALIAS/ANAME, or run
#      flyctl certs show tryaisoc.com --app aisoc-demo-web
#    to get the A/AAAA records to use instead.

Routine deploy

# Push your branch, then:
./infra/fly/fly-demo-deploy.sh

Re-running is idempotent. Already-provisioned Postgres / Redis / cert add calls fail-soft.

Demo mode at runtime

The AISOC_DEMO_MODE flag is set on every Fly app's [env] block. This flag drives two pieces of behavior:

  1. API middleware (services/api/app/middleware/demo_mode.py) Returns 403 for non-allowlisted writes (POST/PUT/PATCH/DELETE) and stamps X-AiSOC-Demo: true plus X-AiSOC-Demo-Banner headers on every response. Allowlisted writes: auth flows, /cases/INC-RT-001/investigate, alert ack.

  2. Web banner (apps/web/src/components/demo/DemoBanner.tsx) Renders a fixed amber strip at the top of every authenticated page with the daily-reset notice and a "Self-host AiSOC →" link.

Both layers read from environment variables surfaced through the fly.toml [env] blocks, so flipping any AiSOC self-hoster into demo mode (e.g., for a customer presentation) is a one-flag operation.

Smoke checks

# API liveness
curl -sf https://aisoc-demo-api.fly.dev/health

# Demo headers visible
curl -sI https://aisoc-demo-api.fly.dev/api/v1/cases | grep -i x-aisoc

# Mutating writes blocked
curl -si -X POST https://aisoc-demo-api.fly.dev/api/v1/cases | head -3
# expect: HTTP/2 403 …

# Public domain (after DNS propagates)
curl -sf https://api.tryaisoc.com/health

# Visitor flow
open https://tryaisoc.com/cases/INC-RT-001?tab=ledger

Troubleshooting

Symptom Likely cause / fix
flyctl deploy hangs on builder Nuke remote builder: flyctl builders destroy
API 503 on first hit Cold start; min_machines_running=1 should fix
Web shows "demo data resets" but writes work API's AISOC_DEMO_MODE not set; redeploy api
INC-RT-001 case missing Re-run seed: flyctl ssh console -a aisoc-demo-api -C "python -m app.scripts.seed_demo" (idempotent)
Daily seed cron not firing Verify the scheduled machine: flyctl machine list -a aisoc-demo-api (look for aisoc-demo-seed-cron)
WS disconnects in 30s Realtime auto_stop_machines = "off" — verify fly.toml
Cert pending flyctl certs show tryaisoc.com --app aisoc-demo-web (and same for api. / ws. subdomains)

Cost envelope

Target: <$30/mo for the running demo so it's sustainable on a single maintainer's budget.

Resource Monthly cost (est.)
3 × shared-cpu-1x machines (api, web, rt) ~$6 (with auto_stop=stop)
1 × shared-cpu-2x agents ~$5
1 × scheduled seed machine (~1min/day) <$0.10
Fly Postgres (dev, 3GB) ~$2
Upstash Redis (Free) $0
Outbound bandwidth (~50GB) ~$1
Total ~$14/mo

If demo traffic exceeds 50GB/mo we'll cache the seed snapshot on Cloudflare R2.