Summary
Create a scheduler/orchestrator for replaying historical data into a node state at a configurable rate.
The scheduler should support two replay backends:
- Pre-block-stream cutover: replay from event stream using the existing PCLI path
- Post-block-stream cutover: replay from block stream using
ApplyBlocksCommand
The scheduler should abstract over the cutover boundary so callers do not need separate workflows for event-stream replay vs block-stream replay.
Problem
Today we have replay capabilities, but not a unified scheduler that can:
- pull replay input from either a local filesystem directory or a bucket/object-store source
- run at a specified rate
- select the correct replay engine based on stream format / cutover state
- provide a consistent interface for differential testing, scheduled reprocessing, and controlled backfill scenarios
This creates unnecessary operational and scripting complexity around replay workflows.
Proposed Solution
Add a scheduler component that coordinates replay from a source into a target state using a pluggable backend model (if possible).
Replay backends
- Event stream backend
- Uses PCLI event replay before block-stream cutover
- Block stream backend
- Uses
ApplyBlocksCommand after block-stream cutover
Source adapters
- Local filesystem directory
- Bucket/object-store source
Scheduler responsibilities
- accept a replay source
- determine or be configured with the replay mode:
event-stream
block-stream
auto
- pace replay at a configured rate
- support one-shot and scheduled execution
- pass through the required replay parameters for the selected backend
- emit useful logs/metrics/status
- fail clearly on continuity gaps, missing input, hash mismatch, invalid configuration, or backend failures
Scope
In scope
- unified scheduling/orchestration layer
- source abstraction for local dir and bucket
- backend abstraction for event-stream and block-stream replay
- configurable replay rate
- support for bootstrap state, node ID, output path, and target round style controls
- observability and error handling
- automated tests
Out of scope
- re-implementing the replay logic already provided by PCLI or
ApplyBlocksCommand
- changing replay semantics of the underlying tools
- broad data-transformation features unrelated to replay scheduling
Functional Requirements
-
The scheduler can read replay input from:
- a local filesystem directory
- a bucket/object-store location
-
The scheduler supports replay modes:
event-stream
block-stream
auto
-
In event-stream mode, the scheduler invokes the existing PCLI replay flow.
-
In block-stream mode, the scheduler invokes ApplyBlocksCommand.
-
The scheduler supports a configurable replay rate.
- At minimum this should allow replay to be throttled/paced instead of running as fast as possible.
-
The scheduler supports one-shot execution and scheduled execution.
-
The scheduler supports passing backend-specific parameters such as:
- bootstrap/input state
- node ID
- output directory
- target/final round (optional)
- expected hash where applicable
-
The scheduler preserves deterministic ordering of replay input.
-
The scheduler surfaces actionable failures for:
- missing source data
- malformed configuration
- continuity gaps
- missing blocks/events
- hash mismatch
- backend command failure
-
The scheduler emits logs and basic metrics/status for:
- selected backend
- source type
- current replay progress
- effective replay rate
- success/failure outcome
Acceptance Criteria
Suggested Design Notes
-
Prefer a small orchestration layer over duplicating replay logic.
-
Keep source access separate from backend execution.
-
Consider an interface like:
ReplaySource
ReplayBackend
ReplayScheduler
-
Consider a config model like:
- source type/location
- replay mode
- rate
- schedule
- bootstrap state
- node ID
- output path
- target round
- expected hash
Summary
Create a scheduler/orchestrator for replaying historical data into a node state at a configurable rate.
The scheduler should support two replay backends:
ApplyBlocksCommandThe scheduler should abstract over the cutover boundary so callers do not need separate workflows for event-stream replay vs block-stream replay.
Problem
Today we have replay capabilities, but not a unified scheduler that can:
This creates unnecessary operational and scripting complexity around replay workflows.
Proposed Solution
Add a scheduler component that coordinates replay from a source into a target state using a pluggable backend model (if possible).
Replay backends
ApplyBlocksCommandafter block-stream cutoverSource adapters
Scheduler responsibilities
event-streamblock-streamautoScope
In scope
Out of scope
ApplyBlocksCommandFunctional Requirements
The scheduler can read replay input from:
The scheduler supports replay modes:
event-streamblock-streamautoIn
event-streammode, the scheduler invokes the existing PCLI replay flow.In
block-streammode, the scheduler invokesApplyBlocksCommand.The scheduler supports a configurable replay rate.
The scheduler supports one-shot execution and scheduled execution.
The scheduler supports passing backend-specific parameters such as:
The scheduler preserves deterministic ordering of replay input.
The scheduler surfaces actionable failures for:
The scheduler emits logs and basic metrics/status for:
Acceptance Criteria
event-stream,block-stream, orautoevent-streammode uses the existing PCLI replay commandblock-streammode usesApplyBlocksCommandSuggested Design Notes
Prefer a small orchestration layer over duplicating replay logic.
Keep source access separate from backend execution.
Consider an interface like:
ReplaySourceReplayBackendReplaySchedulerConsider a config model like: