Skip to content

Create scheduled replay orchestrator for event streams and block streams #24924

@akdev

Description

@akdev

Summary

Create a scheduler/orchestrator for replaying historical data into a node state at a configurable rate.

The scheduler should support two replay backends:

  1. Pre-block-stream cutover: replay from event stream using the existing PCLI path
  2. Post-block-stream cutover: replay from block stream using ApplyBlocksCommand

The scheduler should abstract over the cutover boundary so callers do not need separate workflows for event-stream replay vs block-stream replay.

Problem

Today we have replay capabilities, but not a unified scheduler that can:

  • pull replay input from either a local filesystem directory or a bucket/object-store source
  • run at a specified rate
  • select the correct replay engine based on stream format / cutover state
  • provide a consistent interface for differential testing, scheduled reprocessing, and controlled backfill scenarios

This creates unnecessary operational and scripting complexity around replay workflows.

Proposed Solution

Add a scheduler component that coordinates replay from a source into a target state using a pluggable backend model (if possible).

Replay backends

  • Event stream backend
    • Uses PCLI event replay before block-stream cutover
  • Block stream backend
    • Uses ApplyBlocksCommand after block-stream cutover

Source adapters

  • Local filesystem directory
  • Bucket/object-store source

Scheduler responsibilities

  • accept a replay source
  • determine or be configured with the replay mode:
    • event-stream
    • block-stream
    • auto
  • pace replay at a configured rate
  • support one-shot and scheduled execution
  • pass through the required replay parameters for the selected backend
  • emit useful logs/metrics/status
  • fail clearly on continuity gaps, missing input, hash mismatch, invalid configuration, or backend failures

Scope

In scope

  • unified scheduling/orchestration layer
  • source abstraction for local dir and bucket
  • backend abstraction for event-stream and block-stream replay
  • configurable replay rate
  • support for bootstrap state, node ID, output path, and target round style controls
  • observability and error handling
  • automated tests

Out of scope

  • re-implementing the replay logic already provided by PCLI or ApplyBlocksCommand
  • changing replay semantics of the underlying tools
  • broad data-transformation features unrelated to replay scheduling

Functional Requirements

  1. The scheduler can read replay input from:

    • a local filesystem directory
    • a bucket/object-store location
  2. The scheduler supports replay modes:

    • event-stream
    • block-stream
    • auto
  3. In event-stream mode, the scheduler invokes the existing PCLI replay flow.

  4. In block-stream mode, the scheduler invokes ApplyBlocksCommand.

  5. The scheduler supports a configurable replay rate.

    • At minimum this should allow replay to be throttled/paced instead of running as fast as possible.
  6. The scheduler supports one-shot execution and scheduled execution.

  7. The scheduler supports passing backend-specific parameters such as:

    • bootstrap/input state
    • node ID
    • output directory
    • target/final round (optional)
    • expected hash where applicable
  8. The scheduler preserves deterministic ordering of replay input.

  9. The scheduler surfaces actionable failures for:

    • missing source data
    • malformed configuration
    • continuity gaps
    • missing blocks/events
    • hash mismatch
    • backend command failure
  10. The scheduler emits logs and basic metrics/status for:

    • selected backend
    • source type
    • current replay progress
    • effective replay rate
    • success/failure outcome

Acceptance Criteria

  • A caller can configure replay from a local directory
  • A caller can configure replay from a bucket/object-store source
  • A caller can choose event-stream, block-stream, or auto
  • event-stream mode uses the existing PCLI replay command
  • block-stream mode uses ApplyBlocksCommand
  • Replay can be paced by a configurable rate
  • The scheduler supports one-shot execution
  • The scheduler supports scheduled execution
  • Required parameters for both backends can be supplied through a unified config/interface
  • Failures are surfaced with clear diagnostics
  • Tests cover:
    • local source + event-stream backend
    • local source + block-stream backend
    • bucket source + event-stream backend
    • bucket source + block-stream backend
    • rate limiting / pacing
    • backend selection logic

Suggested Design Notes

  • Prefer a small orchestration layer over duplicating replay logic.

  • Keep source access separate from backend execution.

  • Consider an interface like:

    • ReplaySource
    • ReplayBackend
    • ReplayScheduler
  • Consider a config model like:

    • source type/location
    • replay mode
    • rate
    • schedule
    • bootstrap state
    • node ID
    • output path
    • target round
    • expected hash

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions