Skip to content

Add @flink-reactor/ui package, simulation engine, and EXPLAIN analyzer#1

Merged
sandboxws merged 38 commits intomainfrom
sim-ui-package
Mar 21, 2026
Merged

Add @flink-reactor/ui package, simulation engine, and EXPLAIN analyzer#1
sandboxws merged 38 commits intomainfrom
sim-ui-package

Conversation

@sandboxws
Copy link
Copy Markdown
Owner

@sandboxws sandboxws commented Mar 21, 2026

Summary

📦 @flink-reactor/ui package extraction (Phases 1–7)

  • Migrate all 19 local UI primitives to @flink-reactor/ui, swap default theme to Gruvpuccin
  • Extract domain types (cluster, logs, deployments, tap, metrics, etc.) and format utilities
  • Add fixture factories with 4 scenario presets (healthy, degraded, failing, empty cluster)
  • Extract 45 domain components across 10 domains as prop-driven, store-free components
  • Add page-level templates across 12 domains with companion .demo.tsx files
  • ✨ Scaffold showcase app (apps/showcase/) with full component coverage and secondary sidebar nav
  • Migrate 27 dashboard files to @flink-reactor/ui imports

🧪 Simulation engine (chaos engineering)

  • Backend: engine orchestrator, 11 scenario presets (resource stress, checkpoint, load, failure), PostgreSQL persistence (migration 008), mutex-guarded single-run execution
  • GraphQL API: SimulationRun, SimulationPreset, SimulationObservation types with queries and mutations
  • Dashboard: preset grid/list views, preflight checklist modal with cascading K8s dependency checks via kubectl, live observation polling, run timeline, benchmarks comparison page
  • Admin sidebar group with Simulations and Benchmarks pages

🔍 EXPLAIN integration & plan analyzer

  • EXPLAIN statement support: GraphQL mutation, resolver, sandbox Explain tab
  • Plan parser (JSON + text formats), 7 analyzers (bottleneck, changelog, join, skew, state, watermark, window), DAG visualization, 31 test fixtures

🚀 Job lifecycle & storage

  • Savepoint trigger, stop-with-savepoint, and rescale mutations (Go → GraphQL → dashboard)
  • Stop All Jobs button on cluster overview
  • 🗃️ Replace filesystem tap loader with DB-backed store (migration 007), REST endpoints for manifest CRUD
  • Job detail DB fallback: two-tier recovery (JSONB snapshot → normalized tables), migration 006
  • Connector detection package, Sources & Sinks tab with I/O metrics

🔧 Other

  • ⬆️ Rename to @flink-reactor/dsl (0.1.8-rc.3), enable release workflow
  • 🐛 Fix task manager heartbeat epoch display
  • Searchable/sortable columns table in catalog browser, colored tag filter pills
  • OpenSpec skill prompts for Claude Code
  • CLA workflow: switch signature branch to clasignatures, fix permissions

Test plan

  • Verify showcase app renders all component demos (pnpm --filter showcase dev)
  • Confirm dashboard builds with @flink-reactor/ui imports (pnpm build)
  • Run plan analyzer tests (plan-analyzer.test.ts)
  • Test EXPLAIN tab in sandbox with a valid Flink SQL statement
  • Validate simulation preflight modal (with and without kubectl)
  • Test savepoint, stop, and rescale mutations against a running cluster
  • Verify tap manifests persist and load from PostgreSQL
  • Confirm Sources & Sinks tab renders connector cards
  • Check catalog columns table search, sort, and pagination

- Bump flink-reactor to 0.1.8-rc.1 with schema introspection fixes
- Add pnpm store prune to refresh-dsl.sh to clear stale integrity checksums
Flink's timeSinceLastHeartbeat field is an epoch timestamp, not a
duration. The dashboard was computing Date.now() - timestamp, which
produced near-zero values. Use the value directly as new Date(value).
…penSpec skills

- EXPLAIN statement support: GraphQL mutation, resolver, dashboard
  sandbox integration with explain tab in synthesis output
- Plan analyzer: parser (JSON + text), 7 analyzers (bottleneck,
  changelog, join, skew, state, watermark, window), DAG visualization
  components, 31 test fixtures, Zustand store
- Sandbox UI: streamlined editor toolbar, removed redundant output
  header, added explain button
- Sandbox bug fix report: structured analysis of 14 DSL codegen bugs
- OpenSpec skills and prompts for Claude Code
- IDE project files
Replace flat column list in catalog browser with a proper table featuring
search, sortable columns, and pagination. Style JM config tag filter pills
with their corresponding tag background colors.
Update all references after the DSL package rename:
- dashboard dependency and dynamic imports
- completions generator node_modules path
- refresh-dsl.sh cache paths (using pnpm @scope+name encoding)
- release.yml: remove `if: false`, drop DSL-repo packages (create-app,
  ts-plugin), keep only UI + dashboard builds
…visibility

Server:
- Add tap_manifests table (migration 007) with pipeline_name PK and JSONB manifest
- Add TapManifestStore with Upsert/GetByPipeline/List/Delete operations
- Replace filesystem tap.Loader with DB-backed tap.Store
- Add POST /api/tap-manifests and DELETE /api/tap-manifests/:pipeline endpoints
- Wire TapStore in main.go when storage is enabled
- Fix detail_snapshot not captured for jobs first seen in terminal state

Dashboard:
- Fix tap manifest fetch using backend base URL (was hitting Vite dev server)
- Show Tap tab only for running jobs with an existing manifest
- Shorten tap job prefix from flink-reactor-tap- to fr-tap-
- Update @flink-reactor/dsl to 0.1.8-rc.3
- Add IsNotFound() helper and FlexFloat64.MarshalJSON() for safe JSON roundtrip
- Add GetJobByID() and UpsertJobSnapshot() for DB-backed job detail fallback
- Add job_db_fallback.go with two-tier recovery (JSONB snapshot → normalized tables)
- Add 006_job_detail_snapshot migration for detail_snapshot JSONB column
- Extract mapJobDetailAggregate() shared mapper for live and DB paths
- Add connector/ package with detector, vertex name patterns, and manifest parsing
- Add sources_sinks.graphqls schema extending JobDetail with sourcesAndSinks field
- Add Sources & Sinks tab to job detail with connector type cards and I/O metrics
sim-infra-01: K8s manifests for minikube simulation stack
- SeaweedFS (S3-compatible checkpoint storage), Kafka KRaft,
  PostgreSQL, SQL Gateway, reactor-server with ConfigMap
- Custom Flink image Dockerfile with S3 plugin
- README with quick start guide

sim-console-01: Job lifecycle mutations (savepoint, stop, rescale)
- Go: TriggerSavepoint, StopWithSavepoint, RescaleJob service methods
- GraphQL: triggerSavepoint, stopJobWithSavepoint, rescaleJob mutations
- Dashboard: wire savepoint button to real API, add Stop button,
  add Stop All Jobs to cluster overview page
sim-console-02: Chaos engineering simulation system
- New simulation package: engine orchestrator, 11 scenario presets
  (resource stress, checkpoint, load, failure), PostgreSQL store
- DB migration 008: simulation_runs + simulation_observations tables
- GraphQL schema: SimulationRun, SimulationPreset, SimulationObservation
  types with queries (list/get runs, presets) and mutations (run/stop)
- Resolvers map between domain and GraphQL model types
- Engine wired into main.go (conditional on storage enabled)
- One simulation at a time (mutex-guarded), background goroutine execution
sim-console-03: Simulation dashboard UI
- Admin sidebar group with Simulations and Benchmarks items
- Simulation store (Zustand) with presets, runs, active polling
- Preset grid organized by category (resource/checkpoint/load/failure)
- Inline parameter configuration per preset card
- Active simulation panel with live observation polling (3s)
- History table with status badges and run detail links
- Run detail page with observation timeline

sim-console-04: Benchmark collection page
- Run selector table with multi-select checkboxes (max 5)
- Comparison cards showing metric averages per run
- Empty state with link to Simulations page
- GraphQL client functions for simulation queries/mutations
- Guard against undefined observations in timeline component
- Include observations in runSimulation mutation response
- Add error banner to simulations page (shows API errors like
  missing storage or infrastructure)
Clicking Run now opens a modal that checks:
- Flink cluster reachable (required)
- PostgreSQL storage connected (required)
- Running jobs available (optional, with deploy instructions)
- No other simulation active (required)

Each check shows pass/fail/warn status with fix instructions.
Launch button only enabled when all required checks pass.
Re-check button to retry. Applied to both grid and list views.
- Each run row has a View link → navigates to full run detail
- Checkboxes for multi-select, Compare button appears when 2+ selected
- Compare fetches full observation data and shows side-by-side metric table
- Metrics show avg, min–max range, and sample count per run
- Clear button to dismiss comparison
- Help text explaining View vs Compare workflow
Preflight now validates the full infrastructure chain:
- Kubernetes cluster reachable (kubectl cluster-info)
- flink-demo namespace exists
- Flink Operator running in flink-system
- Kafka, PostgreSQL, SeaweedFS pods running in flink-demo
- PostgreSQL storage connected (server config)
- Flink cluster reachable (REST API)
- FlinkDeployments exist (optional)
- Kafka instrument healthy (optional)
- No active simulation running

Checks run server-side via new simulationPreflight GraphQL query.
Docker-only clusters correctly show failures for K8s checks.
…arallel

- Check kubectl existence upfront via exec.LookPath — if not found,
  all K8s checks instantly return "fail" with install instructions
- Run all 11 checks in parallel via goroutines (was sequential)
- Reduce kubectl timeout from 5s to 3s per check
- Eliminates 35s+ hang when kubectl is missing or can't connect
Checks now short-circuit — if a required check fails, downstream
checks are not shown:

  kubectl → K8s cluster → namespace → [pods] → FlinkDeployments

Pod checks verify actual K8s pods by label selector and phase,
tied to specific services (Kafka:9092, PostgreSQL:5432,
SeaweedFS:8333, SQL Gateway:8083, reactor-server:8080, Operator).

Removes false-positive green checks for local Docker services
that aren't the minikube infrastructure.
Iceberg REST deployment manifest for minikube and an optional
preflight check that warns (not fails) when the catalog is missing.
Infrastructure manifests belong with the CLI that manages them,
not the console. Preflight fix hints now reference `flink-reactor sim up`
instead of raw manifest paths.
…theme

Phase 1 of the Tailwind Plus model transformation:

- Add Alert and Switch components to packages/ui
- Port HoverCard Arrow export + Portal wrapping
- Fix Select bg from bg-dash-elevated to bg-dash-panel
- Swap theme defaults: Gruvpuccin is now the base (no selector),
  Tokyo Night is the override via [data-palette="tokyo-night"]
- Migrate all 45 dashboard files from @/components/ui/* to @flink-reactor/ui
- Delete dashboard/src/components/ui/ (19 local duplicates removed)
- Copy CodeMirror editor themes to packages/ui/src/themes/
- Update dashboard theme switcher, index.html, and ui-store defaults
…link-reactor/ui

Phase 2 of the Tailwind Plus model:

Types (packages/ui/src/types/):
- cluster.ts: FlinkJob, TaskManager, ClusterOverview, JobVertex, JobEdge, etc.
- logs.ts: LogEntry, LogLevel, LogSource, ErrorGroup
- deployments.ts: BlueGreenDeployment, BlueGreenState
- tap.ts: TapConfig, TapMetadata, TapManifest
- materialized.ts: MaterializedTable, MaterializedTableRefreshStatus
- insights.ts: HealthSnapshot, HealthSubScore, HealthIssue, BottleneckScore
- metrics.ts: MetricDataPoint, MetricType, MetricUnit, MetricMeta
- monitoring.ts: JobCheckpointSummary, CheckpointTimelineEntry

Shared components (packages/ui/src/shared/):
- StackTrace, JobStatusBadge, MemoryBar, DurationCell
- TaskCountsBar, HealthScoreGauge, MetricChart

Utilities:
- formatBytes(), formatDuration(), formatTimestamp() in lib/format.ts
- formatMetricValue(), getChartColor(), getUnitBadgeLabel() from MetricChart

Build config: recharts added as optional peer dep, externalized in tsup
Phase 3 of the Tailwind Plus model:

Factory files (packages/ui/src/fixtures/):
- cluster.ts: createClusterOverview, createFlinkJob, createJobPlan, createJobVertex, etc.
- task-managers.ts: createTaskManager, createTaskManagerMetrics
- job-manager.ts: createJobManagerInfo, createJobManagerMetrics
- checkpoints.ts: createCheckpointDetail, createCheckpointSubtaskStats
- logs.ts: createLogEntry, createLogEntries(count)
- errors.ts: createJobException, createErrorGroup
- health.ts: createHealthSnapshot, createBottleneckScore, createRecommendation
- deployments.ts: createBlueGreenDeployment
- plans.ts: createSubtaskTimeline, createFlamegraphData
- catalogs.ts: createCatalogSchema, createCatalogColumn
- materialized.ts: createMaterializedTable
- monitoring.ts: createJobCheckpointSummary, createCheckpointTimelineEntry

Scenario presets:
- healthyCluster(): 3 TMs, 2 running jobs, health score 92
- degradedCluster(): elevated backpressure, checkpoint delays
- failingCluster(): OOM failure, high memory, health score 35
- emptyCluster(): fresh cluster, no workload

Build: separate tsup entry point (./fixtures), tree-shakeable
Phase 4 of the Tailwind Plus model:

Domain components (packages/ui/src/components/):
- overview/ (4): StatCard, ClusterInfo, SlotUtilization, JobStatusSummary
- jobs/ (13): JobsTable, JobHistoryTable, JobHeader, OperatorNode,
  StrategyEdge, SourceSinkCard, SourcesSinksTab, CheckpointsTab,
  ConfigurationTab, ExceptionsTab, VerticesTab, DataSkewTab, TimelineTab
- logs/ (4): LogLine, LogList, LogDetailPanel, LogHistogram
- errors/ (2): ErrorDetail, ErrorTimeline
- monitoring/ (4): AlertCard, CheckpointTimelineChart, StateSizeChart,
  CheckpointJobTable
- insights/ (5): HealthTrendChart, SubScoreGrid, TopIssuesList,
  BottleneckDag, BottleneckTable
- plan-analyzer/ (5): PlanDag, PlanOperatorNode, PlanStrategyEdge,
  PlanAntiPatternCard, PlanStateForecast
- catalogs/ (3): ColumnsTable, TemplateSelector, SqlHighlight
- tap/ (4): TapDataTable, TapStatusBar, TapSourceConfig, TapErrorPanel
- materialized-tables/ (1): RefreshStatusBadge

All components are prop-driven (no store imports).
Optional peer deps: @xyflow/react, recharts, date-fns, react-icons
Phase 5 of the Tailwind Plus model:

Templates (packages/ui/src/templates/):
- overview/: OverviewSection (stat cards + slot gauge + job lists)
- jobs/: JobsTableSection, JobDetailSection, CheckpointsSection,
  ExceptionsSection, JobGraphSection
- logs/: LogExplorerSection
- errors/: ErrorExplorerSection
- monitoring/: CheckpointAnalyticsSection, AlertsSection
- insights/: HealthDashboardSection, BottleneckSection
- task-managers/: TmListSection
- job-manager/: JmDetailSection
- deployments/: DeploymentsSection
- plan-analyzer/: PlanAnalyzerSection
- catalogs/: CatalogBrowserSection
- materialized-tables/: MatTablesSection

Each template has a companion .demo.tsx showing usage with fixture data.
Templates are dual-mode: importable via @flink-reactor/ui/templates/*
AND browsable as copyable source code.

Build: separate tsup entry per domain, tree-shakeable bundles
Phase 6 of the Tailwind Plus model:

- Vite + TanStack Router + Tailwind v4 app consuming @flink-reactor/ui
- Shell/Sidebar/Header layout using package's layout components
- Route pages: primitives, shared, domain, templates, scenarios
- Primitives page: Button variants, Badge, Alert, Switch, Progress, Card,
  Input, Label, Textarea, Skeleton demos
- Shared page: MetricCard, SeverityBadge, SourceBadge, JobStatusBadge,
  MemoryBar, DurationCell, HealthScoreGauge, EmptyState demos
- Templates page: OverviewSection demo with fixture data
- Scenarios page: links to healthy/degraded/failing/empty cluster views
- Added apps/* to pnpm-workspace.yaml
Phase 7 of the Tailwind Plus model:

Swapped 27 dashboard files from @/components/shared/* imports to
@flink-reactor/ui for pure components: MetricCard, EmptyState,
SeverityBadge, SourceBadge, TextViewer, StackTrace.

Store-dependent wrappers (SearchInput, TimeRange, ThreadDumpViewer,
StaticLogExplorer) remain as local dashboard imports since they
inject Zustand state.
- Add @source directive to scan UI package for Tailwind class names
- Fix Sidebar nav items to use href (not path)
- Add NavLink adapter for TanStack Router (href → to)
- Make domain cards clickable with expand/collapse component lists
The @source path was relative to the CSS file (src/), not the project
root. Fixed to ../../../packages/ui/src/ and also scan dist/ for
compiled class names. CSS output went from 30KB to 84KB — all UI
package utilities now generated correctly.
…bar nav

Add rich per-component demos across all 5 showcase sections:
- Primitives: all 20 with interactive demos, props tables, and rich data table
- Shared: all 15 with demos, controlled state, and fixture data
- Domain: 10 sub-pages covering all 45 components with fixture factories
- Templates: 12 sub-pages rendering all 17 template demo files
- Scenarios: 4 composed dashboard views (healthy/degraded/failing/empty)

Add shared utilities (PropsTable, Section, ShowcasePage, ImportSnippet) and
a sticky secondary sidebar with scroll-spy for navigating component sections.
Configure Vite @ path alias for lib imports.
Add @source directives for @flink-reactor/ui in dashboard CSS so
Tailwind v4 scans the UI package classes (fixes dialog not centered).

Remove border-color from glass-card transition to prevent white
border flash on page navigation during component mount.
@sandboxws sandboxws self-assigned this Mar 21, 2026
@sandboxws sandboxws changed the title Sim UI Package Add @flink-reactor/ui package, simulation engine, and EXPLAIN analyzer Mar 21, 2026
The checked-in .npmrc pointed all registry lookups at localhost:4873
(Verdaccio), causing pnpm/action-setup to fail in CI with
ECONNREFUSED. Since Verdaccio is only needed locally, .npmrc is now
gitignored and kept as a local-only file.
@sandboxws sandboxws merged commit 8fe942b into main Mar 21, 2026
0 of 4 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant