Simulations by sandboxws · Pull Request #2 · sandboxws/flink-reactor-console

sandboxws · 2026-03-21T16:08:15Z

No description provided.

- Bump flink-reactor to 0.1.8-rc.1 with schema introspection fixes - Add pnpm store prune to refresh-dsl.sh to clear stale integrity checksums

Flink's timeSinceLastHeartbeat field is an epoch timestamp, not a duration. The dashboard was computing Date.now() - timestamp, which produced near-zero values. Use the value directly as new Date(value).

…penSpec skills - EXPLAIN statement support: GraphQL mutation, resolver, dashboard sandbox integration with explain tab in synthesis output - Plan analyzer: parser (JSON + text), 7 analyzers (bottleneck, changelog, join, skew, state, watermark, window), DAG visualization components, 31 test fixtures, Zustand store - Sandbox UI: streamlined editor toolbar, removed redundant output header, added explain button - Sandbox bug fix report: structured analysis of 14 DSL codegen bugs - OpenSpec skills and prompts for Claude Code - IDE project files

Replace flat column list in catalog browser with a proper table featuring search, sortable columns, and pagination. Style JM config tag filter pills with their corresponding tag background colors.

Update all references after the DSL package rename: - dashboard dependency and dynamic imports - completions generator node_modules path - refresh-dsl.sh cache paths (using pnpm @scope+name encoding) - release.yml: remove `if: false`, drop DSL-repo packages (create-app, ts-plugin), keep only UI + dashboard builds

…visibility Server: - Add tap_manifests table (migration 007) with pipeline_name PK and JSONB manifest - Add TapManifestStore with Upsert/GetByPipeline/List/Delete operations - Replace filesystem tap.Loader with DB-backed tap.Store - Add POST /api/tap-manifests and DELETE /api/tap-manifests/:pipeline endpoints - Wire TapStore in main.go when storage is enabled - Fix detail_snapshot not captured for jobs first seen in terminal state Dashboard: - Fix tap manifest fetch using backend base URL (was hitting Vite dev server) - Show Tap tab only for running jobs with an existing manifest - Shorten tap job prefix from flink-reactor-tap- to fr-tap- - Update @flink-reactor/dsl to 0.1.8-rc.3

- Add IsNotFound() helper and FlexFloat64.MarshalJSON() for safe JSON roundtrip - Add GetJobByID() and UpsertJobSnapshot() for DB-backed job detail fallback - Add job_db_fallback.go with two-tier recovery (JSONB snapshot → normalized tables) - Add 006_job_detail_snapshot migration for detail_snapshot JSONB column - Extract mapJobDetailAggregate() shared mapper for live and DB paths - Add connector/ package with detector, vertex name patterns, and manifest parsing - Add sources_sinks.graphqls schema extending JobDetail with sourcesAndSinks field - Add Sources & Sinks tab to job detail with connector type cards and I/O metrics

sim-infra-01: K8s manifests for minikube simulation stack - SeaweedFS (S3-compatible checkpoint storage), Kafka KRaft, PostgreSQL, SQL Gateway, reactor-server with ConfigMap - Custom Flink image Dockerfile with S3 plugin - README with quick start guide sim-console-01: Job lifecycle mutations (savepoint, stop, rescale) - Go: TriggerSavepoint, StopWithSavepoint, RescaleJob service methods - GraphQL: triggerSavepoint, stopJobWithSavepoint, rescaleJob mutations - Dashboard: wire savepoint button to real API, add Stop button, add Stop All Jobs to cluster overview page

sim-console-02: Chaos engineering simulation system - New simulation package: engine orchestrator, 11 scenario presets (resource stress, checkpoint, load, failure), PostgreSQL store - DB migration 008: simulation_runs + simulation_observations tables - GraphQL schema: SimulationRun, SimulationPreset, SimulationObservation types with queries (list/get runs, presets) and mutations (run/stop) - Resolvers map between domain and GraphQL model types - Engine wired into main.go (conditional on storage enabled) - One simulation at a time (mutex-guarded), background goroutine execution

sim-console-03: Simulation dashboard UI - Admin sidebar group with Simulations and Benchmarks items - Simulation store (Zustand) with presets, runs, active polling - Preset grid organized by category (resource/checkpoint/load/failure) - Inline parameter configuration per preset card - Active simulation panel with live observation polling (3s) - History table with status badges and run detail links - Run detail page with observation timeline sim-console-04: Benchmark collection page - Run selector table with multi-select checkboxes (max 5) - Comparison cards showing metric averages per run - Empty state with link to Simulations page - GraphQL client functions for simulation queries/mutations

- Guard against undefined observations in timeline component - Include observations in runSimulation mutation response - Add error banner to simulations page (shows API errors like missing storage or infrastructure)

Clicking Run now opens a modal that checks: - Flink cluster reachable (required) - PostgreSQL storage connected (required) - Running jobs available (optional, with deploy instructions) - No other simulation active (required) Each check shows pass/fail/warn status with fix instructions. Launch button only enabled when all required checks pass. Re-check button to retry. Applied to both grid and list views.

- Each run row has a View link → navigates to full run detail - Checkboxes for multi-select, Compare button appears when 2+ selected - Compare fetches full observation data and shows side-by-side metric table - Metrics show avg, min–max range, and sample count per run - Clear button to dismiss comparison - Help text explaining View vs Compare workflow

Preflight now validates the full infrastructure chain: - Kubernetes cluster reachable (kubectl cluster-info) - flink-demo namespace exists - Flink Operator running in flink-system - Kafka, PostgreSQL, SeaweedFS pods running in flink-demo - PostgreSQL storage connected (server config) - Flink cluster reachable (REST API) - FlinkDeployments exist (optional) - Kafka instrument healthy (optional) - No active simulation running Checks run server-side via new simulationPreflight GraphQL query. Docker-only clusters correctly show failures for K8s checks.

…server-unreachable

…arallel - Check kubectl existence upfront via exec.LookPath — if not found, all K8s checks instantly return "fail" with install instructions - Run all 11 checks in parallel via goroutines (was sequential) - Reduce kubectl timeout from 5s to 3s per check - Eliminates 35s+ hang when kubectl is missing or can't connect

Checks now short-circuit — if a required check fails, downstream checks are not shown: kubectl → K8s cluster → namespace → [pods] → FlinkDeployments Pod checks verify actual K8s pods by label selector and phase, tied to specific services (Kafka:9092, PostgreSQL:5432, SeaweedFS:8333, SQL Gateway:8083, reactor-server:8080, Operator). Removes false-positive green checks for local Docker services that aren't the minikube infrastructure.

Iceberg REST deployment manifest for minikube and an optional preflight check that warns (not fails) when the catalog is missing.

Infrastructure manifests belong with the CLI that manages them, not the console. Preflight fix hints now reference `flink-reactor sim up` instead of raw manifest paths.

sandboxws added 24 commits March 15, 2026 21:41

Update flink-reactor to 0.1.8-rc.1 and fix refresh script

69862ac

- Bump flink-reactor to 0.1.8-rc.1 with schema introspection fixes - Add pnpm store prune to refresh-dsl.sh to clear stale integrity checksums

Fix task manager heartbeat showing 1970 (epoch zero)

cc5473f

Flink's timeSinceLastHeartbeat field is an epoch timestamp, not a duration. The dashboard was computing Date.now() - timestamp, which produced near-zero values. Use the value directly as new Date(value).

Add searchable columns table to catalog tree and color tag filter pills

c48dc2c

Replace flat column list in catalog browser with a proper table featuring search, sortable columns, and pagination. Style JM config tag filter pills with their corresponding tag background colors.

Add multi-platform build instructions to Flink S3 Dockerfile

01aa312

Align simulation preset card actions to the left

4d1849a

Use flat grid layout for simulation presets to avoid uneven rows

025e86d

Add list view toggle to simulations page

747c96e

Add Actions column with Configure and Run to simulation list view

dfd1ad5

Fix simulation crash on undefined observations, add error banner

163a394

- Guard against undefined observations in timeline component - Include observations in runSimulation mutation response - Add error banner to simulations page (shows API errors like missing storage or infrastructure)

Show actual error message when preflight query fails instead of fake …

5a619b0

…server-unreachable

Add optional Iceberg REST catalog to simulation stack

c292771

Iceberg REST deployment manifest for minikube and an optional preflight check that warns (not fails) when the catalog is missing.

Move minikube manifests to flink-reactor-dsl sim command

8d0346d

Infrastructure manifests belong with the CLI that manages them, not the console. Preflight fix hints now reference `flink-reactor sim up` instead of raw manifest paths.

sandboxws self-assigned this Mar 21, 2026

sandboxws closed this Mar 21, 2026

github-actions bot locked and limited conversation to collaborators Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulations#2

Simulations#2
sandboxws wants to merge 24 commits intomainfrom
sim

sandboxws commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sandboxws commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant