Add @flink-reactor/ui package, simulation engine, and EXPLAIN analyzer#1
Merged
Add @flink-reactor/ui package, simulation engine, and EXPLAIN analyzer#1
Conversation
- Bump flink-reactor to 0.1.8-rc.1 with schema introspection fixes - Add pnpm store prune to refresh-dsl.sh to clear stale integrity checksums
Flink's timeSinceLastHeartbeat field is an epoch timestamp, not a duration. The dashboard was computing Date.now() - timestamp, which produced near-zero values. Use the value directly as new Date(value).
…penSpec skills - EXPLAIN statement support: GraphQL mutation, resolver, dashboard sandbox integration with explain tab in synthesis output - Plan analyzer: parser (JSON + text), 7 analyzers (bottleneck, changelog, join, skew, state, watermark, window), DAG visualization components, 31 test fixtures, Zustand store - Sandbox UI: streamlined editor toolbar, removed redundant output header, added explain button - Sandbox bug fix report: structured analysis of 14 DSL codegen bugs - OpenSpec skills and prompts for Claude Code - IDE project files
Replace flat column list in catalog browser with a proper table featuring search, sortable columns, and pagination. Style JM config tag filter pills with their corresponding tag background colors.
Update all references after the DSL package rename: - dashboard dependency and dynamic imports - completions generator node_modules path - refresh-dsl.sh cache paths (using pnpm @scope+name encoding) - release.yml: remove `if: false`, drop DSL-repo packages (create-app, ts-plugin), keep only UI + dashboard builds
…visibility Server: - Add tap_manifests table (migration 007) with pipeline_name PK and JSONB manifest - Add TapManifestStore with Upsert/GetByPipeline/List/Delete operations - Replace filesystem tap.Loader with DB-backed tap.Store - Add POST /api/tap-manifests and DELETE /api/tap-manifests/:pipeline endpoints - Wire TapStore in main.go when storage is enabled - Fix detail_snapshot not captured for jobs first seen in terminal state Dashboard: - Fix tap manifest fetch using backend base URL (was hitting Vite dev server) - Show Tap tab only for running jobs with an existing manifest - Shorten tap job prefix from flink-reactor-tap- to fr-tap- - Update @flink-reactor/dsl to 0.1.8-rc.3
- Add IsNotFound() helper and FlexFloat64.MarshalJSON() for safe JSON roundtrip - Add GetJobByID() and UpsertJobSnapshot() for DB-backed job detail fallback - Add job_db_fallback.go with two-tier recovery (JSONB snapshot → normalized tables) - Add 006_job_detail_snapshot migration for detail_snapshot JSONB column - Extract mapJobDetailAggregate() shared mapper for live and DB paths - Add connector/ package with detector, vertex name patterns, and manifest parsing - Add sources_sinks.graphqls schema extending JobDetail with sourcesAndSinks field - Add Sources & Sinks tab to job detail with connector type cards and I/O metrics
sim-infra-01: K8s manifests for minikube simulation stack - SeaweedFS (S3-compatible checkpoint storage), Kafka KRaft, PostgreSQL, SQL Gateway, reactor-server with ConfigMap - Custom Flink image Dockerfile with S3 plugin - README with quick start guide sim-console-01: Job lifecycle mutations (savepoint, stop, rescale) - Go: TriggerSavepoint, StopWithSavepoint, RescaleJob service methods - GraphQL: triggerSavepoint, stopJobWithSavepoint, rescaleJob mutations - Dashboard: wire savepoint button to real API, add Stop button, add Stop All Jobs to cluster overview page
sim-console-02: Chaos engineering simulation system - New simulation package: engine orchestrator, 11 scenario presets (resource stress, checkpoint, load, failure), PostgreSQL store - DB migration 008: simulation_runs + simulation_observations tables - GraphQL schema: SimulationRun, SimulationPreset, SimulationObservation types with queries (list/get runs, presets) and mutations (run/stop) - Resolvers map between domain and GraphQL model types - Engine wired into main.go (conditional on storage enabled) - One simulation at a time (mutex-guarded), background goroutine execution
sim-console-03: Simulation dashboard UI - Admin sidebar group with Simulations and Benchmarks items - Simulation store (Zustand) with presets, runs, active polling - Preset grid organized by category (resource/checkpoint/load/failure) - Inline parameter configuration per preset card - Active simulation panel with live observation polling (3s) - History table with status badges and run detail links - Run detail page with observation timeline sim-console-04: Benchmark collection page - Run selector table with multi-select checkboxes (max 5) - Comparison cards showing metric averages per run - Empty state with link to Simulations page - GraphQL client functions for simulation queries/mutations
- Guard against undefined observations in timeline component - Include observations in runSimulation mutation response - Add error banner to simulations page (shows API errors like missing storage or infrastructure)
Clicking Run now opens a modal that checks: - Flink cluster reachable (required) - PostgreSQL storage connected (required) - Running jobs available (optional, with deploy instructions) - No other simulation active (required) Each check shows pass/fail/warn status with fix instructions. Launch button only enabled when all required checks pass. Re-check button to retry. Applied to both grid and list views.
- Each run row has a View link → navigates to full run detail - Checkboxes for multi-select, Compare button appears when 2+ selected - Compare fetches full observation data and shows side-by-side metric table - Metrics show avg, min–max range, and sample count per run - Clear button to dismiss comparison - Help text explaining View vs Compare workflow
Preflight now validates the full infrastructure chain: - Kubernetes cluster reachable (kubectl cluster-info) - flink-demo namespace exists - Flink Operator running in flink-system - Kafka, PostgreSQL, SeaweedFS pods running in flink-demo - PostgreSQL storage connected (server config) - Flink cluster reachable (REST API) - FlinkDeployments exist (optional) - Kafka instrument healthy (optional) - No active simulation running Checks run server-side via new simulationPreflight GraphQL query. Docker-only clusters correctly show failures for K8s checks.
…server-unreachable
…arallel - Check kubectl existence upfront via exec.LookPath — if not found, all K8s checks instantly return "fail" with install instructions - Run all 11 checks in parallel via goroutines (was sequential) - Reduce kubectl timeout from 5s to 3s per check - Eliminates 35s+ hang when kubectl is missing or can't connect
Checks now short-circuit — if a required check fails, downstream checks are not shown: kubectl → K8s cluster → namespace → [pods] → FlinkDeployments Pod checks verify actual K8s pods by label selector and phase, tied to specific services (Kafka:9092, PostgreSQL:5432, SeaweedFS:8333, SQL Gateway:8083, reactor-server:8080, Operator). Removes false-positive green checks for local Docker services that aren't the minikube infrastructure.
Iceberg REST deployment manifest for minikube and an optional preflight check that warns (not fails) when the catalog is missing.
Infrastructure manifests belong with the CLI that manages them, not the console. Preflight fix hints now reference `flink-reactor sim up` instead of raw manifest paths.
…theme Phase 1 of the Tailwind Plus model transformation: - Add Alert and Switch components to packages/ui - Port HoverCard Arrow export + Portal wrapping - Fix Select bg from bg-dash-elevated to bg-dash-panel - Swap theme defaults: Gruvpuccin is now the base (no selector), Tokyo Night is the override via [data-palette="tokyo-night"] - Migrate all 45 dashboard files from @/components/ui/* to @flink-reactor/ui - Delete dashboard/src/components/ui/ (19 local duplicates removed) - Copy CodeMirror editor themes to packages/ui/src/themes/ - Update dashboard theme switcher, index.html, and ui-store defaults
…link-reactor/ui Phase 2 of the Tailwind Plus model: Types (packages/ui/src/types/): - cluster.ts: FlinkJob, TaskManager, ClusterOverview, JobVertex, JobEdge, etc. - logs.ts: LogEntry, LogLevel, LogSource, ErrorGroup - deployments.ts: BlueGreenDeployment, BlueGreenState - tap.ts: TapConfig, TapMetadata, TapManifest - materialized.ts: MaterializedTable, MaterializedTableRefreshStatus - insights.ts: HealthSnapshot, HealthSubScore, HealthIssue, BottleneckScore - metrics.ts: MetricDataPoint, MetricType, MetricUnit, MetricMeta - monitoring.ts: JobCheckpointSummary, CheckpointTimelineEntry Shared components (packages/ui/src/shared/): - StackTrace, JobStatusBadge, MemoryBar, DurationCell - TaskCountsBar, HealthScoreGauge, MetricChart Utilities: - formatBytes(), formatDuration(), formatTimestamp() in lib/format.ts - formatMetricValue(), getChartColor(), getUnitBadgeLabel() from MetricChart Build config: recharts added as optional peer dep, externalized in tsup
Phase 3 of the Tailwind Plus model: Factory files (packages/ui/src/fixtures/): - cluster.ts: createClusterOverview, createFlinkJob, createJobPlan, createJobVertex, etc. - task-managers.ts: createTaskManager, createTaskManagerMetrics - job-manager.ts: createJobManagerInfo, createJobManagerMetrics - checkpoints.ts: createCheckpointDetail, createCheckpointSubtaskStats - logs.ts: createLogEntry, createLogEntries(count) - errors.ts: createJobException, createErrorGroup - health.ts: createHealthSnapshot, createBottleneckScore, createRecommendation - deployments.ts: createBlueGreenDeployment - plans.ts: createSubtaskTimeline, createFlamegraphData - catalogs.ts: createCatalogSchema, createCatalogColumn - materialized.ts: createMaterializedTable - monitoring.ts: createJobCheckpointSummary, createCheckpointTimelineEntry Scenario presets: - healthyCluster(): 3 TMs, 2 running jobs, health score 92 - degradedCluster(): elevated backpressure, checkpoint delays - failingCluster(): OOM failure, high memory, health score 35 - emptyCluster(): fresh cluster, no workload Build: separate tsup entry point (./fixtures), tree-shakeable
Phase 4 of the Tailwind Plus model: Domain components (packages/ui/src/components/): - overview/ (4): StatCard, ClusterInfo, SlotUtilization, JobStatusSummary - jobs/ (13): JobsTable, JobHistoryTable, JobHeader, OperatorNode, StrategyEdge, SourceSinkCard, SourcesSinksTab, CheckpointsTab, ConfigurationTab, ExceptionsTab, VerticesTab, DataSkewTab, TimelineTab - logs/ (4): LogLine, LogList, LogDetailPanel, LogHistogram - errors/ (2): ErrorDetail, ErrorTimeline - monitoring/ (4): AlertCard, CheckpointTimelineChart, StateSizeChart, CheckpointJobTable - insights/ (5): HealthTrendChart, SubScoreGrid, TopIssuesList, BottleneckDag, BottleneckTable - plan-analyzer/ (5): PlanDag, PlanOperatorNode, PlanStrategyEdge, PlanAntiPatternCard, PlanStateForecast - catalogs/ (3): ColumnsTable, TemplateSelector, SqlHighlight - tap/ (4): TapDataTable, TapStatusBar, TapSourceConfig, TapErrorPanel - materialized-tables/ (1): RefreshStatusBadge All components are prop-driven (no store imports). Optional peer deps: @xyflow/react, recharts, date-fns, react-icons
Phase 5 of the Tailwind Plus model: Templates (packages/ui/src/templates/): - overview/: OverviewSection (stat cards + slot gauge + job lists) - jobs/: JobsTableSection, JobDetailSection, CheckpointsSection, ExceptionsSection, JobGraphSection - logs/: LogExplorerSection - errors/: ErrorExplorerSection - monitoring/: CheckpointAnalyticsSection, AlertsSection - insights/: HealthDashboardSection, BottleneckSection - task-managers/: TmListSection - job-manager/: JmDetailSection - deployments/: DeploymentsSection - plan-analyzer/: PlanAnalyzerSection - catalogs/: CatalogBrowserSection - materialized-tables/: MatTablesSection Each template has a companion .demo.tsx showing usage with fixture data. Templates are dual-mode: importable via @flink-reactor/ui/templates/* AND browsable as copyable source code. Build: separate tsup entry per domain, tree-shakeable bundles
Phase 6 of the Tailwind Plus model: - Vite + TanStack Router + Tailwind v4 app consuming @flink-reactor/ui - Shell/Sidebar/Header layout using package's layout components - Route pages: primitives, shared, domain, templates, scenarios - Primitives page: Button variants, Badge, Alert, Switch, Progress, Card, Input, Label, Textarea, Skeleton demos - Shared page: MetricCard, SeverityBadge, SourceBadge, JobStatusBadge, MemoryBar, DurationCell, HealthScoreGauge, EmptyState demos - Templates page: OverviewSection demo with fixture data - Scenarios page: links to healthy/degraded/failing/empty cluster views - Added apps/* to pnpm-workspace.yaml
Phase 7 of the Tailwind Plus model: Swapped 27 dashboard files from @/components/shared/* imports to @flink-reactor/ui for pure components: MetricCard, EmptyState, SeverityBadge, SourceBadge, TextViewer, StackTrace. Store-dependent wrappers (SearchInput, TimeRange, ThreadDumpViewer, StaticLogExplorer) remain as local dashboard imports since they inject Zustand state.
- Add @source directive to scan UI package for Tailwind class names - Fix Sidebar nav items to use href (not path) - Add NavLink adapter for TanStack Router (href → to) - Make domain cards clickable with expand/collapse component lists
The @source path was relative to the CSS file (src/), not the project root. Fixed to ../../../packages/ui/src/ and also scan dist/ for compiled class names. CSS output went from 30KB to 84KB — all UI package utilities now generated correctly.
…bar nav Add rich per-component demos across all 5 showcase sections: - Primitives: all 20 with interactive demos, props tables, and rich data table - Shared: all 15 with demos, controlled state, and fixture data - Domain: 10 sub-pages covering all 45 components with fixture factories - Templates: 12 sub-pages rendering all 17 template demo files - Scenarios: 4 composed dashboard views (healthy/degraded/failing/empty) Add shared utilities (PropsTable, Section, ShowcasePage, ImportSnippet) and a sticky secondary sidebar with scroll-spy for navigating component sections. Configure Vite @ path alias for lib imports.
Add @source directives for @flink-reactor/ui in dashboard CSS so Tailwind v4 scans the UI package classes (fixes dialog not centered). Remove border-color from glass-card transition to prevent white border flash on page navigation during component mount.
The checked-in .npmrc pointed all registry lookups at localhost:4873 (Verdaccio), causing pnpm/action-setup to fail in CI with ECONNREFUSED. Since Verdaccio is only needed locally, .npmrc is now gitignored and kept as a local-only file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
📦 @flink-reactor/ui package extraction (Phases 1–7)
@flink-reactor/ui, swap default theme to Gruvpuccin.demo.tsxfilesapps/showcase/) with full component coverage and secondary sidebar nav@flink-reactor/uiimports🧪 Simulation engine (chaos engineering)
🔍 EXPLAIN integration & plan analyzer
🚀 Job lifecycle & storage
🔧 Other
@flink-reactor/dsl(0.1.8-rc.3), enable release workflowclasignatures, fix permissionsTest plan
pnpm --filter showcase dev)@flink-reactor/uiimports (pnpm build)plan-analyzer.test.ts)