feat: TraceBuffer and simulation Replay#263
Conversation
Worked on serialization and deserialization of `dyn SendData` and friends until rustc capitulated. It is now possible to run a simulation and capture the inputs, states, effects, and their responses in a binary CBOR log that can later be deserialized again and used to replay a StageGraph to the same state as the traced simulation run. Note that currently the log must be complete, otherwise initial states will be missing and the replay will fail because stage behaviour will be different. Note also that while serialization can turn any `dyn SendData` into byes, it is not possible to get back the same `dyn SendData` by deserializing a `TraceEntry`. This is impossible because we'd need to know the actual deserialization code for the concrete underlying type (which may have type parameters etc.). So the deserialization result is a generic CBOR `Value` that is then redeserialized in the context where the concrete value is needed, which is also where the concrete type is known and the `Deserialize` instance can be captured. Signed-off-by: Roland Kuhn <rk@rkuhn.info>
WalkthroughAlright mate, here’s the lowdown: This batch of changes is a full-on overhaul of the simulation and messaging framework, adding serde serialization/deserialization everywhere, introducing a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant SimulationBuilder
participant StageGraph
participant TraceBuffer
participant Stage
participant Clock
User->>SimulationBuilder: Build simulation (with trace buffer, clock)
SimulationBuilder->>StageGraph: Wire up stages (with SendData, serde)
User->>SimulationBuilder: Run simulation
SimulationBuilder->>Stage: Send input message (serialized)
Stage->>TraceBuffer: Record Input event (CBOR)
Stage->>Clock: Query current time
Stage->>Stage: Process message, produce effect (serialized)
Stage->>TraceBuffer: Record Suspend/Resume/State events (CBOR)
User->>SimulationBuilder: Replay trace
SimulationBuilder->>TraceBuffer: Hydrate entries, replay events
TraceBuffer->>Stage: Restore state, effects, clock
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (3)
⏰ Context from checks skipped due to timeout of 90000ms (8)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 16
🔭 Outside diff range comments (3)
crates/pure-stage/src/simulation/inputs.rs (2)
34-48:⚠️ Potential issuePossible compile error:
castisn’t onBox<dyn SendData>Inside the
map_err, you’re doinge.0.msg.cast::<Msg>().
UnlessSendData : Anyand you’ve provided an extension method namedcast, this won’t compile (stableBox<dyn Any>usesdowncast).- *e.0.msg.cast::<Msg>().expect("message was just boxed") + *e.0 + .msg + .downcast::<Msg>() + .expect("message was just boxed")If
SendDatadoesn’t inheritAny, add it:pub trait SendData: erased_serde::Serialize + Send + Sync + 'static + std::any::Any {}
41-50: 🛠️ Refactor suggestionThe oneshot
rx.await.ok();swallows send-side errorsIf the callee drops the ACK channel without sending, you’ll silently accept it and report
Ok(()).
Bubble the error so the caller knows the message never landed.- rx.await.ok(); - Ok(()) + rx.await.map_err(|_| SenderError::AckDropped)(Replace
SenderErrorwith whatever suits.)
Keeps the contract watertight like Master Chief’s MJOLNIR, yeah?crates/pure-stage/src/simulation/running.rs (1)
615-637: 🛠️ Refactor suggestionDuplicate clock-resume logic – DRY it up, mate
resume_clock_internalhere is a copy-paste of the one insimulation::resume. Keeping two versions is asking for drift and late-night debugging sessions. Re-export and call the shared helper instead of maintaining this local clone.
🧹 Nitpick comments (28)
crates/pure-stage/src/sender.rs (1)
4-6: Tidy up the doc blurb for extra clarityThe new comment is grand, but a tiny tweak would make it crystal-clear to newcomers: explicitly mention that the generic
Msgmust implementSendData. Keeps the docs in sync with the rest of the crate’s messaging story.simulation/amaru-sim/src/echo/message.rs (2)
48-50: Ye’ve got an unused import chillin’ here
SendDatanever gets a cameo in this test module, so the compiler will nag (or even fail ifunused_importsis denied). Either drop it or slap an#[allow(unused_imports)].- use pure_stage::SendData; + // use pure_stage::SendData; // uncomment when a test needs it
64-67: Hard-codedtype_namestring is brittle
std::any::type_name::<T>()bakes in the crate path; one refactor and this assertion goes boom like a bad barrel in Borderlands. Consider a looser check (e.g.ends_with) or snapshot testing instead.- assert_eq!( - env.typetag_name(), - "amaru_sim::echo::message::Envelope<amaru_sim::echo::message::EchoMessage>" - ); + assert!(env + .typetag_name() + .ends_with("Envelope<amaru_sim::echo::message::EchoMessage>"));Cargo.toml (1)
23-33: Mind the CBOR double-upWe now depend on both
minicbor(legacy) andcbor4ii(new hotness). Having two CBOR libs in the workspace can bloat code size and confuse callers (e.g. which encoder do they use?).Consider consolidating on one crate (preferably
cbor4iisince it brings Serde). If a full switch isn’t feasible yet, drop a TODO so folks know the dual stack is intentional.crates/slot-arithmetic/src/lib.rs (2)
41-45: Serialising error enums: check privacy & bloatDeriving Serde on
SlotArithmeticErrorleaks the full enum variants over the wire. That’s usually fine for tests/replay, but if these errors bubble out to remote callers we might expose internal slot numbers we’d rather keep private.If exposure is a concern, wrap the error in an opaque string or add
#[serde(skip)]on verbose fields.
345-353: Same note forTimeHorizonErrorDitto. Make sure we’re comfy shipping these variants outside the process boundary.
crates/pure-stage/src/stage_ref.rs (1)
21-27: Consider addingEqandHashto round-out the handle’s ergonomicsDeriving/implementing
EqandHash(mirroring thePartialEqyou just added) lets aStageRefbe used as a key in hash-maps or sets—handy for routing tables and the like.
The extra derive costs nothing and won’t change the serialised form because the trait is purely semantic.-#[derive(serde::Serialize, serde::Deserialize)] +#[derive(serde::Serialize, serde::Deserialize, Eq, Hash)]crates/pure-stage/README.md (1)
52-60: Tiny copy-editing nits for smoother reading
- Line 57: “weighs too heavily; therefore a different solution is required.”
- Line 59: “where the compiler can therefore instantiate …”
Just a dab of polish so future readers don’t trip over the prose, mate.
🧰 Tools
🪛 LanguageTool
[style] ~57-~57: Consider an alternative to strengthen your wording.
Context: ...striction that weighs too heavily, thus a different solution is required. The current design shifts...(DIFFERENT_ALTERNATIVE)
[typographical] ~59-~59: The word “thus” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...e concrete target type can be named and where thus the compiler can instantiate a `Deseria...(THUS_SENTENCE)
crates/pure-stage/src/lib.rs (1)
1-13: Publicserdemodule name clashes with upstream crateCalling the local helper-module
serdemeans downstream users now have bothpure_stage::serdeand the canonicalserdecrate in scope. In code it’s easy to fat-fingerserde::and grab the wrong one. A tiny rename (e.g.serde_support) or at least a short doc comment warning consumers would save a future Mick-Dundee sized “that’s not the serde you’re looking for” moment.simulation/amaru-sim/src/simulator/simulate.rs (1)
79-81: RedundantSerialize/Deserializebounds?
pure_stage::SendDataalready promises serialisability, so slappingSerialize + DeserializeOwneddirectly ontoMsgmay be belt-and-suspenders. If callers always passMsg: SendData, consider trimming the extra bounds to keep the generic list from looking like the end-credits of a LOTR movie.crates/pure-stage/src/simulation/blocked.rs (1)
61-66: Consumesselfinassert_breakpointunlike siblings
assert_breakpointtakes ownership whereas the otherassert_*methods borrow. That asymmetry trips muscle memory and prevents chained assertions. Consider taking&selfand returning&Effect(or cloning) to keep the API jazz in the same key.crates/pure-stage/src/simulation/state.rs (1)
19-24:Transitionnow boxes twoSendDataobjects – watch the heap churnSwapping
&dynfor ownedBox<dyn SendData>in the transition signature means every message/state hop allocates. If perf becomes a bottleneck during long replays, maybe switch to&dynor smart-pointer tricks (à laArc<dyn SendData>) to keep the garbage compactor quiet.crates/amaru-consensus/src/consensus/store.rs (1)
206-209: No error propagation on insert failures
BTreeMap::insertcan’t fail, but if we ever migrate to a fallible store (sled, rocksdb, etc.) this silentOk(())blanket will swallow IO errors. Tiny note to future devs: bubble up the underlying error once we switch to real storage.crates/pure-stage/src/logging/layer.rs (1)
23-27: G'day mate, looks like we've got some unfinished business here!The WIP note combined with
#![allow(unused)]is a bit like leaving the debug mode on in your favorite RPG - fine for development, but not great for production. Consider removing these once the implementation is complete, yeah?-// ******************************************************** -// NOTE: This is a work in progress. -// ******************************************************** - -#![allow(unused)]crates/pure-stage/src/tokio.rs (1)
49-55: Hey mate, that emptyadvance_tocould use some context!Like an NPC with no dialogue, an empty method implementation leaves players wondering what's going on. A comment explaining why Tokio's clock can't be advanced would be ace!
impl Clock for TokioClock { fn now(&self) -> Instant { Instant::now() } - fn advance_to(&self, _instant: Instant) {} + fn advance_to(&self, _instant: Instant) { + // No-op: Tokio uses real time which cannot be manipulated + } }crates/pure-stage/src/time.rs (1)
103-127: Love the humor in those error messages, but they're still panic bombs!"290 billion years" - that's longer than the entire Mass Effect timeline! But jokes aside, consider using saturating arithmetic or returning Results instead of panicking.
impl std::ops::Add<Duration> for Instant { type Output = Instant; - #[allow(clippy::expect_used)] - fn add(self, duration: Duration) -> Self { - Instant( - self.0 - .checked_add(duration) - .expect("simulation is not supposed to run for more than 290 billion years"), - ) + fn add(self, duration: Duration) -> Self { + Instant(self.0.saturating_add(duration)) } }crates/pure-stage/src/trace_buffer.rs (1)
119-151: That's some intricate buffer juggling, mate!This push logic is more complex than the inventory management in Resident Evil! While it looks correct, the multiple conditions and edge cases make it a bit hard to follow. Consider adding more inline comments or breaking it into smaller helper methods?
fn push(&mut self, msg: Vec<u8>) { if self.max_size == 0 { return; } self.used_size += msg.len(); + // First pass: remove old messages until we're under max_size + // but keep at least min_entries #[allow(clippy::expect_used)] while self.used_size > self.max_size && self.messages.len() > self.min_entries { // ... existing code ... } + // Second pass: if we can fit by removing just one more entry, do it let one_more = self.messages.front().map(|m| m.len()).unwrap_or(0); // ... rest of the logiccrates/pure-stage/src/serde.rs (1)
195-208: Sweet optimization with that thread-local buffer!Like keeping a stash of health potions, reusing buffers saves allocations. But that
expect()could bite you if someone passes a value that can't be serialized (like a type with a broken Serialize impl).Consider returning a Result:
-pub fn to_cbor<T: serde::Serialize>(value: &T) -> Vec<u8> { +pub fn to_cbor<T: serde::Serialize>(value: &T) -> Result<Vec<u8>, cbor4ii::serde::Error> { thread_local! { static BUFFER: RefCell<Vec<u8>> = const { RefCell::new(Vec::new()) }; } BUFFER.with_borrow_mut(|buffer| { - #[allow(clippy::expect_used)] - to_writer(&mut *buffer, value).expect("serialization should not fail"); + to_writer(&mut *buffer, value)?; let ret = Vec::from(buffer.as_slice()); buffer.clear(); - ret + Ok(ret) }) }crates/pure-stage/src/output.rs (2)
40-46: Debug output leaks concrete type name – deliberate?
typetag_name()lands the full path (incl. module) in logs. Grand for debugging, but it’ll spam traces like Diablo’s loot drops.
If that’s too noisy in production, consider stripping the path or gating behinddebug.
56-66: Silent send failures – decide if that’s kosher
runlogs atdebugand returns()when the channel is closed. If the output stage disappearing is exceptional in your sim, surfacing an error (or at leastwarn!) might save some late-night head-scratching.crates/pure-stage/tests/simulation.rs (1)
84-139: Golden-string snapshot is brittleHard-coding the entire pretty-printed trace means one comma change & your test keels over like a beached koala.
Snapshot-testing crates (insta,expect-test) or selective matching (e.g. assert on variants/order) keep intent while dodging churn.crates/pure-stage/src/simulation/replay.rs (1)
124-146: Status helpersunwrap()on missing stages – convert to polite look-ups
is_running/is_idle/is_failedpanic when the caller passes an unknown stage name. A gracefulfalse(orOption<bool>) would make the API as friendly as a pub-quiz mate.crates/pure-stage/src/effect.rs (1)
125-131: Clock skew risk – using wall-clock for call timeout
now = self.clock.now()bypasses the effect system’sClockeffect, meaning replays driven purely by trace won’t re-emit this instant.
If determinism is king, consider funnelling time via the existingclock()effect so traces capture the read.crates/pure-stage/src/simulation/running.rs (2)
411-414: Missing trace entry for Clock resumeWe resume the stage but don’t
push_resume/push_suspendfor the clock response. Replay might become inconsistent. Consider logging it the same way as other resumes.
398-406: Back-pressure not observable in traceWhen the mailbox is full we queue the
(from,msg)insenders, but no trace entry records this drop/park. If replay fidelity matters, emit a trace event here.crates/pure-stage/src/simulation/resume.rs (1)
93-116: Trace only records Receives
resume_clock_internal(and siblings) don’t touchTraceBuffer, unlikeresume_receive_internal. If replay depends on matching resumes, consider adding trace hooks for the other resume paths as well.crates/pure-stage/src/stagegraph.rs (1)
46-50:dummy_responseleaks if used at scale
oneshot::channel().0allocates a real channel that’s never received. For mass-deserialisation this leaks. PreferSender::new()viatokio::sync::oneshot::Sender::default()or a customNoopSenderthat does nothing.crates/pure-stage/src/simulation.rs (1)
144-148: Expose TraceBuffer sizing knobs
with_trace_bufferjust swaps in a ready buffer. Offering a builder-stylewith_trace_capacity(messages, bytes)would save callers the extra dance and encourage consistent sizing.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (41)
Cargo.toml(3 hunks)crates/amaru-consensus/Cargo.toml(1 hunks)crates/amaru-consensus/src/consensus/mod.rs(1 hunks)crates/amaru-consensus/src/consensus/store.rs(4 hunks)crates/amaru-consensus/src/consensus/store_header.rs(1 hunks)crates/amaru-consensus/src/consensus/validate_header.rs(3 hunks)crates/amaru-consensus/src/peer.rs(1 hunks)crates/amaru-kernel/src/lib.rs(1 hunks)crates/amaru-kernel/src/protocol_parameters.rs(1 hunks)crates/amaru/src/stages/ledger.rs(1 hunks)crates/ouroboros-traits/Cargo.toml(1 hunks)crates/ouroboros-traits/src/praos/mod.rs(1 hunks)crates/pure-stage/Cargo.toml(1 hunks)crates/pure-stage/README.md(1 hunks)crates/pure-stage/src/effect.rs(19 hunks)crates/pure-stage/src/lib.rs(1 hunks)crates/pure-stage/src/logging.rs(1 hunks)crates/pure-stage/src/logging/layer.rs(1 hunks)crates/pure-stage/src/output.rs(2 hunks)crates/pure-stage/src/receiver.rs(1 hunks)crates/pure-stage/src/sender.rs(1 hunks)crates/pure-stage/src/serde.rs(1 hunks)crates/pure-stage/src/simulation.rs(10 hunks)crates/pure-stage/src/simulation/blocked.rs(1 hunks)crates/pure-stage/src/simulation/inputs.rs(2 hunks)crates/pure-stage/src/simulation/replay.rs(1 hunks)crates/pure-stage/src/simulation/resume.rs(1 hunks)crates/pure-stage/src/simulation/running.rs(35 hunks)crates/pure-stage/src/simulation/state.rs(2 hunks)crates/pure-stage/src/stage_ref.rs(1 hunks)crates/pure-stage/src/stagegraph.rs(7 hunks)crates/pure-stage/src/time.rs(2 hunks)crates/pure-stage/src/tokio.rs(7 hunks)crates/pure-stage/src/trace_buffer.rs(1 hunks)crates/pure-stage/src/types.rs(4 hunks)crates/pure-stage/src/util.rs(0 hunks)crates/pure-stage/tests/simulation.rs(12 hunks)crates/pure-stage/tests/tokio.rs(1 hunks)crates/slot-arithmetic/src/lib.rs(2 hunks)simulation/amaru-sim/src/echo/message.rs(1 hunks)simulation/amaru-sim/src/simulator/simulate.rs(5 hunks)
💤 Files with no reviewable changes (1)
- crates/pure-stage/src/util.rs
🧰 Additional context used
🧠 Learnings (1)
crates/pure-stage/src/simulation/state.rs (2)
Learnt from: rkuhn
PR: pragma-org/amaru#206
File: crates/pure-stage/src/simulation/state.rs:4-8
Timestamp: 2025-05-05T08:15:24.192Z
Learning: The `State` trait in the pure-stage crate already requires `Send` with its definition: `pub trait State: Any + fmt::Debug + Send + 'static`, making additional `+ Send` bounds redundant when using `Box<dyn State>`.
Learnt from: rkuhn
PR: pragma-org/amaru#206
File: crates/pure-stage/src/simulation/state.rs:4-8
Timestamp: 2025-05-05T08:15:24.192Z
Learning: The `State` trait in the pure-stage crate already requires `Send` with its definition: `pub trait State: Any + fmt::Debug + Send + 'static`, making additional `+ Send` bounds redundant when using `Box<dyn State>`.
🧬 Code Graph Analysis (5)
simulation/amaru-sim/src/echo/message.rs (1)
crates/pure-stage/src/types.rs (1)
typetag_name(43-45)
crates/amaru-consensus/src/consensus/store.rs (3)
crates/amaru-kernel/src/network.rs (1)
fmt(279-286)crates/amaru-consensus/src/consensus/store_block.rs (7)
load_header(84-86)store_header(88-90)get_nonces(96-98)put_nonces(100-106)era_history(108-110)load_block(92-94)store_block(79-82)crates/amaru-stores/src/rocksdb/consensus.rs (14)
load_header(53-58)load_header(128-130)store_header(61-67)store_header(132-134)get_nonces(69-76)get_nonces(136-138)put_nonces(78-84)put_nonces(140-143)era_history(86-88)era_history(145-147)load_block(90-98)load_block(149-151)store_block(100-106)store_block(153-155)
crates/amaru-consensus/src/consensus/validate_header.rs (3)
crates/amaru-consensus/src/consensus/store_header.rs (4)
eq(33-35)new(47-49)default_store(28-30)store(51-57)crates/pure-stage/src/effect.rs (4)
eq(589-661)run(198-198)run(246-248)run(285-287)crates/amaru-ledger/src/state.rs (5)
get_pool(700-710)slot_to_kes_period(712-714)max_kes_evolutions(716-718)latest_opcert_sequence_number(720-723)new(111-131)
crates/pure-stage/src/simulation/blocked.rs (3)
crates/pure-stage/src/types.rs (2)
from(163-165)as_ref(151-153)crates/amaru/src/stages/ledger.rs (1)
name(40-42)crates/pure-stage/src/stage_ref.rs (1)
name(53-55)
crates/pure-stage/src/simulation/running.rs (6)
crates/pure-stage/src/simulation/resume.rs (6)
resume_call_internal(143-174)resume_clock_internal(93-116)resume_external_internal(206-228)resume_receive_internal(10-49)resume_respond_internal(176-204)resume_send_internal(63-91)crates/pure-stage/src/trace_buffer.rs (1)
take(159-161)crates/pure-stage/src/effect.rs (12)
new(54-66)new(264-266)clock(97-102)m(473-473)m(518-518)run(198-198)run(246-248)run(285-287)call(119-152)at_stage(436-446)fmt(45-50)eq(589-661)crates/pure-stage/src/simulation/inputs.rs (3)
new(13-15)new(25-32)sender(34-53)crates/pure-stage/src/simulation.rs (4)
msg(217-218)run(270-315)state(216-216)stage(195-244)crates/pure-stage/src/time.rs (1)
fmt(44-48)
🪛 LanguageTool
crates/pure-stage/README.md
[style] ~57-~57: Consider an alternative to strengthen your wording.
Context: ...striction that weighs too heavily, thus a different solution is required. The current design shifts...
(DIFFERENT_ALTERNATIVE)
[typographical] ~59-~59: The word “thus” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ...e concrete target type can be named and where thus the compiler can instantiate a `Deseria...
(THUS_SENTENCE)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Build on macos-latest with target aarch64-apple-darwin
- GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
- GitHub Check: Build on ubuntu-latest with target riscv32im-risc0-zkvm-elf
- GitHub Check: Build on ubuntu-latest with target x86_64-unknown-linux-gnu
- GitHub Check: Build on ubuntu-latest with target aarch64-unknown-linux-musl
- GitHub Check: Snapshots (preprod, 10.1.4)
- GitHub Check: Sanity
- GitHub Check: Coverage
🔇 Additional comments (27)
crates/pure-stage/src/receiver.rs (1)
8-8: Doc reference aligns with theStageGraphrename — tidy work, mateNothing but a quick spring-clean of the docs, swapping the old link for the shiny new one. Code untouched, compile flags green.
crates/ouroboros-traits/Cargo.toml (1)
17-17: Double-check you’ve got thederivefeature onserdeIf any structs in this crate pull a “Luke, I am your
Serialize/Deserialize” move, you’ll need thederivefeature turned on. Unless the workspace root already enables it, consider:-serde.workspace = true +serde = { workspace = true, features = ["derive"] }Better safe than face-palming at compile time.
crates/amaru-consensus/Cargo.toml (1)
21-21: Same serde song, second verseThe consensus crate also leans on serde. Make sure the
derivemics are plugged in here too, or the band won’t play.-serde.workspace = true +serde = { workspace = true, features = ["derive"] }Give it a whirl and confirm the derives light up.
crates/pure-stage/tests/tokio.rs (1)
24-24: API rename fromsendertoinputslides in smoothlyCall site updated, test still reads crystal clear. Fire it up and the messages should flow like Mario on Rainbow Road.
crates/pure-stage/src/logging.rs (1)
1-1: Module stub added — remember to ship the goods
mod layer;is the door; make surelayer.rs(orlayer/mod.rs) is parked in the same folder so the compiler doesn’t throw a tanty.crates/amaru-kernel/src/protocol_parameters.rs (1)
312-344: Serde derives FTWAdding
Serialize/DeserializetoGlobalParametersslots perfectly into the project-wide CBOR/JSON push. No concerns from this corner.crates/amaru-kernel/src/lib.rs (1)
143-147: Double-check Serde wire format forPointDeriving Serde gives us JSON/Bincode/etc. for free, lovely. Just be sure any on-disk/on-wire consumers weren’t relying on the hand-rolled CBOR layout defined below—Serde’s default sequence encoding might not match.
crates/amaru-consensus/src/peer.rs (1)
16-18:Peeris now Serde-friendly – nice oneNothing else to add; this keeps the consensus layer in step with the new trace/replay goodies.
crates/amaru/src/stages/ledger.rs (1)
116-116: All good – semantic no-opSwapping
return Err(err)for the expression formErr(err)is kosher inside thematch; no behaviour or early-return semantics change.crates/ouroboros-traits/src/praos/mod.rs (1)
51-58: ConfirmHash<32>implementsserde::{Serialize, Deserialize}Deriving Serde on
Noncesis sweet as, but it implicitly requires every field—includingtail: Hash<32>—to implement the same traits. IfHash(fromamaru_kernel) hasn’t picked up those derives in this PR, the build will face-plant.Would you run a quick grep or compile check to ensure
amaru_kernel::HashcarriesSerialize+Deserialize?
If it’s missing, we can either add derives there or gate the Serde impl behind a feature flag.Cargo.toml (2)
70-71:typetagbringsinventory– ensure the linker’s happy
typetagdepends oninventory, which in turn uses linker section gymnastics. In-tree crates that build forno_stdor exotic targets sometimes hiccup on this. Nothing to fix now, just flagging for future cross-compiles.
88-96: Dev-deps tidy-up looks aceLove the addition of
pretty_assertions; diff-driven tests read like a graphic novel now.crates/amaru-consensus/src/consensus/mod.rs (1)
78-86: Serde skip attr looks good, but double-check path
Span::noneexists intracing0.1.40, so thedefault = "Span::none"should compile. If CI pops a “can’t find function” error, switch to the fully-qualified path::tracing::Span::none.Also applies to: 91-92
crates/pure-stage/src/stage_ref.rs (1)
29-33:PartialEqignores the phantom types – double-check the intentionRight now,
StageRef<FooMsg, FooState>will compare equal to a differentStageRef<FooMsg, BarState>if the names match.
That’s probably fine (the phantom types are compile-time only), but if the state parameter is used to encode “channel kind” elsewhere this could become a sneaky foot-gun.
Just flagging it so you’re sure the semantics line up with your mental model.crates/pure-stage/Cargo.toml (1)
15-24: 🛠️ Refactor suggestionEnable
serde/stdto avoid duplicateallocfeature gymnasticsYou’re explicitly turning on
alloc+rc, but notstd. That’s perfectly kosher forno_stdcrates, yetpure-stageis clearlystd-bound (thread pools,tokio, etc.).
Turning on thestdfeature keeps you on the happy path and avoids accidentally compiling two copies of serde (onestd, onealloc).-serde = { workspace = true, features = ["derive", "alloc", "rc"] } +serde = { workspace = true, features = ["derive", "std", "alloc", "rc"] }Likely an incorrect or invalid review comment.
crates/pure-stage/src/lib.rs (1)
27-28: Re-exportingtypetagleaks a dependency
pub use typetag;turnstypetaginto part of your public API. That locks you into its semver story and forces every downstream crate to pull exactly the same version. Most folks hide this behind their own new-type or re-export it withpub(crate)visibility. Double-check if you really want to open that kimono.simulation/amaru-sim/src/simulator/simulate.rs (1)
341-342: Doublewithout_state()call might strip too much
stageis already aStageRef<_, Void>afterwire_up; another.without_state()here could yieldStageRef<_, ()>, losing the niceVoidphantom type we rely on elsewhere. Probably harmless, but worth a quick eyeball to be sure we’re not accidentally changing dispatch behaviour.crates/pure-stage/src/logging/layer.rs (1)
224-275: Bonzer test coverage there, mate!Love the thorough testing of different Rust types - tuples, enums, options, the whole shebang! It's like checking all the different weapon types in Monster Hunter. The assertions on exact byte patterns ensure the CBOR format stays consistent.
crates/pure-stage/src/tokio.rs (2)
1-5: Top-notch documentation, legend!Love how you're guiding folks to write reusable stage construction - it's like teaching players to build modular bases in Minecraft. This kind of architectural guidance is gold!
80-152: Smooth as butter migration to SendData!The switch from
Message/StatetoSendDatais cleaner than a headshot in CS:GO. This unification makes the API more consistent and easier to work with. Well played!crates/pure-stage/src/time.rs (1)
129-148: Solid implementation of the time system!The
LazyLockfor EPOCH is clever - like setting a permanent spawn point in your game world. The tests cover all the edge cases nicely too.crates/pure-stage/src/trace_buffer.rs (1)
23-68: Clever dual-enum pattern here, champ!Having both owned and ref variants is like having both single-player and co-op modes - gives you flexibility without sacrificing performance. The serialization attributes are spot on too!
crates/pure-stage/src/serde.rs (1)
89-173: Ripper test suite for the serialization code!Testing tuples, structs, and enums - you've covered more ground than a completionist in The Witcher 3! The round-trip tests ensure the serialization is rock solid.
crates/amaru-consensus/src/consensus/validate_header.rs (1)
150-158: Double-check trait bounds forResult<Nonces, NoncesError>
runwraps theResultinBox<dyn SendData>. Make sure bothNoncesandNoncesErrorimplementSendData; otherwise deserialisation during replay will faceplant.crates/pure-stage/tests/simulation.rs (1)
175-180: Downcast in breakpoint matcher usesAny– double-checkSendDatais'staticThe closure relies on
(&*msg as &dyn Any).downcast_ref. Ensure everySendDatapassed here is'static; otherwise the cast will silently fail and the breakpoint won’t trigger.crates/pure-stage/src/effect.rs (1)
505-525:assert_callrelies on downcast then moves the message – beware double-use
extract(*m.cast::<Msg2>()…)consumes the boxed message. Subsequent equality checks (e.g. inassert_send) on the same effect will now fail.
Maybe clone the value first or leave the original in place.crates/pure-stage/src/simulation/running.rs (1)
172-175: Zero-delay wake-ups disallowed
assert!(duration > Duration::ZERO)forbids instantaneous waits. That’s fine if intentional, but if callers passDuration::ZERO(e.g. to yield) this will panic. Double-check the contract.
| impl PartialEq for StoreHeader { | ||
| fn eq(&self, _other: &Self) -> bool { | ||
| true | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
PartialEq always returns true – this can bite
Two completely different StoreHeaders will compare equal.
If somebody sticks them in a Vec::dedup() or uses them as map keys, chaos will ensue faster than Mario on a Rainbow Road shortcut.
Either remove the impl or compare on an identity (e.g. Arc::as_ptr(&store)).
🤖 Prompt for AI Agents
In crates/amaru-consensus/src/consensus/store_header.rs around lines 32 to 36,
the PartialEq implementation for StoreHeader always returns true, causing all
instances to be considered equal incorrectly. Fix this by either removing the
custom PartialEq impl to use the derived or default behavior, or implement a
proper equality check based on a unique identity such as comparing pointer
addresses with Arc::as_ptr(&store) to ensure only truly identical StoreHeader
instances compare equal.
| #[derive(serde::Serialize, serde::Deserialize)] | ||
| pub struct StoreHeader { | ||
| #[serde(skip, default = "default_store")] | ||
| pub store: Arc<Mutex<dyn ChainStore<Header>>>, | ||
| } | ||
| fn default_store() -> Arc<Mutex<dyn ChainStore<Header>>> { | ||
| Arc::new(Mutex::new(super::store::FakeStore::default())) | ||
| } |
There was a problem hiding this comment.
Serialising drops the actual store – replay may load with an empty ledger
#[serde(skip, default = "default_store")] swaps the real Arc<Mutex<dyn ChainStore>> for a fresh in-memory FakeStore on deserialise.
That’s grand for unit tests, but in a live replay you’ll silently lose all persisted headers and later lookups will come up blank.
Consider emitting a stub handle that can re-attach to the real store (e.g. via a registry key) or, at minimum, loudly logging the downgrade so folks don’t wonder why the chain’s vanished.
🤖 Prompt for AI Agents
In crates/amaru-consensus/src/consensus/store_header.rs around lines 23 to 30,
the current serde attribute skips serializing the actual store and replaces it
with a default FakeStore on deserialization, causing loss of the real store data
during replay. To fix this, modify the serialization logic to emit a stub or
identifier that can be used to re-attach to the real store on deserialization,
such as a registry key or handle. Additionally, add explicit logging when
falling back to the FakeStore to alert users that the real store was not
restored, preventing silent data loss.
| #[derive(Debug, PartialEq, serde::Serialize, serde::Deserialize)] | ||
| struct State(u64, StageRef<Envelope<EchoMessage>, Void>); | ||
|
|
There was a problem hiding this comment.
State derives Serialize but contains StageRef
We’re deriving Serde for State, yet it embeds a StageRef. Unless StageRef now carries #[typetag::serde] goodies, this will fail to compile (or worse, serialise into nonsense during replay). If the struct is only used in-process, drop the derive to dodge unnecessary trait churn.
🤖 Prompt for AI Agents
In simulation/amaru-sim/src/simulator/simulate.rs around lines 298 to 300, the
State struct derives Serialize but contains a StageRef field which likely does
not support serialization. To fix this, remove the serde::Serialize and
serde::Deserialize derives from State to avoid compilation errors or incorrect
serialization, assuming State is only used in-process.
There was a problem hiding this comment.
all code in this PR does indeed compile
There was a problem hiding this comment.
@rkuhn Fair dinkum, mate! You're absolutely right - I jumped the gun there like a kangaroo on a trampoline! 🦘
Looking at the broader picture of this PR, I can see that StageRef was actually given the full serde treatment as part of the TraceBuffer functionality. Cheers for the correction - sometimes even us code-reviewing legends need a reality check!
The serialization support across the board here is actually quite comprehensive, which makes perfect sense for the replay functionality you've built. My bad for not connecting those dots properly.
✏️ Learnings added
Learnt from: rkuhn
PR: pragma-org/amaru#263
File: simulation/amaru-sim/src/simulator/simulate.rs:298-300
Timestamp: 2025-06-14T16:31:53.110Z
Learning: StageRef in the pure-stage crate supports serde serialization and deserialization (derives serde::Serialize and serde::Deserialize), enabling it to be used in structs that also derive these traits for TraceBuffer and replay functionality.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| Idle(Box<dyn SendData>), | ||
| Running(BoxFuture<'static, anyhow::Result<Box<dyn SendData>>>), | ||
| Failed(String), | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Failed(String) loses error context
Migrating from Failed unit to Failed(String) is grand, but we’re throwing away the original anyhow::Error (backtrace, source chain, the whole shebang). A boxed anyhow::Error or Box<dyn std::error::Error + Send + Sync> would keep detective Pikachu’s evidence intact.
🤖 Prompt for AI Agents
In crates/pure-stage/src/simulation/state.rs around lines 33 to 36, the Failed
variant currently stores a String, which loses the original error context such
as backtrace and source chain. Change the Failed variant to store a boxed
anyhow::Error or Box<dyn std::error::Error + Send + Sync> instead of String to
preserve full error information for better debugging.
There was a problem hiding this comment.
We can’t do that because trait objects cannot be cloned — and anyhow::Error may look magical but it isn’t an exception.
There was a problem hiding this comment.
@rkuhn Ah, absolutely spot on! You're bang on about the cloning constraint, mate. 🎯
In a simulation system where you need to snapshot, serialize, and replay states, trait objects are indeed a no-go. Even anyhow::Error with all its bells and whistles still can't escape the fundamental limitation that trait objects aren't Clone. When you're building a deterministic replay system like this TraceBuffer, the string representation is the pragmatic choice - you get the error info you need while keeping everything cloneable and serializable.
Cheers for the clarification! The architectural constraints of the simulation framework definitely trump the ideal of preserving full error context here.
✏️ Learnings added
Learnt from: rkuhn
PR: pragma-org/amaru#263
File: crates/pure-stage/src/simulation/state.rs:33-36
Timestamp: 2025-06-14T16:36:04.474Z
Learning: In simulation and replay systems that require cloneable and serializable states, error types must often be converted to String rather than stored as trait objects (like Box<dyn Error> or anyhow::Error) because trait objects cannot be cloned, which breaks the snapshotting and replay functionality needed for deterministic simulation.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| impl PartialEq for ValidateHeader { | ||
| fn eq(&self, _other: &Self) -> bool { | ||
| true | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Equality implementation always returns true – steer clear of accidental foot-guns
PartialEq that unconditionally yields true can mask logic errors and derail HashMap/Set behaviour faster than a choc-top on a summer day.
Consider either removing PartialEq or providing a meaningful comparison of the members you actually care about.
🤖 Prompt for AI Agents
In crates/amaru-consensus/src/consensus/validate_header.rs around lines 67 to
71, the PartialEq implementation for ValidateHeader always returns true, which
can cause incorrect equality checks and break collections relying on it. Fix
this by either removing the PartialEq implementation if not needed or
implementing eq to compare the relevant fields of ValidateHeader that determine
equality meaningfully.
| fn default_ledger() -> Arc<dyn HasStakeDistribution> { | ||
| struct Fake; | ||
| impl HasStakeDistribution for Fake { | ||
| fn get_pool( | ||
| &self, | ||
| _slot: amaru_kernel::Slot, | ||
| _pool: &amaru_kernel::PoolId, | ||
| ) -> Option<amaru_ouroboros::PoolSummary> { | ||
| unimplemented!() | ||
| } | ||
|
|
||
| fn slot_to_kes_period(&self, _slot: amaru_kernel::Slot) -> u64 { | ||
| unimplemented!() | ||
| } | ||
|
|
||
| fn max_kes_evolutions(&self) -> u64 { | ||
| unimplemented!() | ||
| } | ||
|
|
||
| fn latest_opcert_sequence_number(&self, _pool: &amaru_kernel::PoolId) -> Option<u64> { | ||
| unimplemented!() | ||
| } | ||
| } | ||
| Arc::new(Fake) | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
default_ledger panics on every method – supply a harmless stub instead
These unimplemented!() calls will blow up at replay time if any code path touches the fake ledger.
Returning sensible fall-backs (e.g. None, 0) keeps replays crash-free while still signalling “not a real ledger”.
🤖 Prompt for AI Agents
In crates/amaru-consensus/src/consensus/validate_header.rs between lines 73 and
97, the default_ledger function's Fake struct methods use unimplemented!() which
causes panics if called. Replace these with harmless stub implementations that
return safe default values like None for Option returns and 0 for u64 returns to
avoid crashes during replay while indicating this is a non-functional stub.
| pub fn run_trace(&mut self, trace: Vec<TraceEntry>) -> anyhow::Result<()> { | ||
| let mut trace_buffer = TraceBuffer::new(0, 0); | ||
|
|
||
| for (idx, entry) in trace.into_iter().enumerate() { |
There was a problem hiding this comment.
Creating a TraceBuffer with (0, 0) – is that valid?
If TraceBuffer::new expects positive capacity or drop-count, this will slap you with a divide-by-zero or similar nasties.
Probably safer to reuse the shared buffer or pass a minimal non-zero size.
🤖 Prompt for AI Agents
In crates/pure-stage/src/simulation/replay.rs around lines 40 to 43, the
TraceBuffer is being created with zero values for capacity and drop-count, which
may cause runtime errors if these parameters must be positive. To fix this,
initialize TraceBuffer with a minimal non-zero size or reuse an existing shared
buffer to avoid potential divide-by-zero or similar issues.
There was a problem hiding this comment.
No, those arguments ensure a buffer that will quickly drop anything you put in.
There was a problem hiding this comment.
@rkuhn Cheers mate! That makes perfect sense - having a drop-happy buffer during replay means you're not actually storing new trace entries, just using the buffer as a temporary mechanism for the replay process. Brilliant design choice, and thanks for setting me straight on that one!
✏️ Learnings added
Learnt from: rkuhn
PR: pragma-org/amaru#263
File: crates/pure-stage/src/simulation/replay.rs:40-43
Timestamp: 2025-06-14T16:40:23.307Z
Learning: TraceBuffer::new(0, 0) creates a buffer that will quickly drop anything put into it. This is intentional behavior, particularly useful in replay scenarios where you don't want to store new trace entries but need the buffer for temporary processing.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| *effect.lock() = Some(Right(response)); | ||
| let result = pin.as_mut().poll(&mut Context::from_waker(Waker::noop())); | ||
|
|
||
| if let Poll::Ready(result) = result { | ||
| match result { | ||
| Ok(state) => { | ||
| trace_buffer.push_state(&name, &state); | ||
| data.state = StageState::Idle(state); |
There was a problem hiding this comment.
Clear the air-lock after polling
We shove a Right(response) into effect.lock() before polling, but when the future finishes Ready we never wipe it. The next poll (or another stage) could stumble over stale data in the air-lock. Add a effect.lock().take(); after the Poll::Ready branch.
🤖 Prompt for AI Agents
In crates/pure-stage/src/simulation/running.rs around lines 868 to 875, after
polling the future and handling the Poll::Ready result, the effect lock still
holds the Right(response) value which can cause stale data issues on subsequent
polls. To fix this, add a call to effect.lock().take() immediately after
processing the Poll::Ready branch to clear the stored value and prevent stale
data from affecting future operations.
There was a problem hiding this comment.
No, the protocol is that the other side will have taken the value out (a reference would be of no use)
There was a problem hiding this comment.
@rkuhn Cheers mate! Thanks for clarifying the air-lock protocol - I see now that it's designed as a proper handshake where the stage consumes the value during polling, not just peeking at it. That's a much cleaner design than having to manually mop up afterwards. Appreciate you setting the record straight on how the communication protocol actually works! 🍻
✏️ Learnings added
Learnt from: rkuhn
PR: pragma-org/amaru#263
File: crates/pure-stage/src/simulation/running.rs:868-875
Timestamp: 2025-06-14T16:41:13.041Z
Learning: In the pure-stage simulation framework, the effect air-lock protocol is designed so that when a stage is polled, the stage implementation consumes/takes the value from the effect lock during polling. There's no need to manually clear the effect lock after Poll::Ready because "the other side will have taken the value out" - this is by design, not a bug.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| impl<T: SendData> PartialEq for MpscSender<T> { | ||
| fn eq(&self, _other: &Self) -> bool { | ||
| true | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
PartialEq always ‑-true-- : looks dodgy
impl PartialEq for MpscSender<T> (and Receiver) returns true unconditionally. Two distinct channel handles compare equal, which can mask bugs in state comparisons or test assertions. Either implement a meaningful equality (e.g. compare as_ptr()) or drop PartialEq.
🤖 Prompt for AI Agents
In crates/pure-stage/src/types.rs around lines 192 to 197, the PartialEq
implementation for MpscSender<T> always returns true, causing distinct channel
handles to compare equal incorrectly. Fix this by implementing a meaningful
equality check, such as comparing the internal pointers using as_ptr(), or
remove the PartialEq implementation entirely if meaningful comparison is not
feasible.
|
|
||
| use super::DecodedChainSyncEvent; | ||
|
|
||
| #[derive(serde::Serialize, serde::Deserialize)] |
There was a problem hiding this comment.
@abailly @stevana We need to get rid of all these active pieces of state within the pure stages, all effects need to be moved to ExternalEffect. This will then mean that we only need to deal with skipping this store when persisting the ExternalEffect — deserialization is optional for those because it is not needed for replaying a trace.
| } | ||
|
|
||
| #[derive(Clone)] | ||
| #[derive(Clone, serde::Serialize, serde::Deserialize)] |
There was a problem hiding this comment.
This is polluted because it is used a “state” in a pure stage, which is of course not how it is supposed to be.
Signed-off-by: Roland Kuhn <rk@rkuhn.info>
Signed-off-by: Roland Kuhn <rk@rkuhn.info>
Worked on serialization and deserialization of
dyn SendDataandfriends until rustc capitulated. It is now possible to run a simulation
and capture the inputs, states, effects, and their responses in a binary
CBOR log that can later be deserialized again and used to replay a
StageGraph to the same state as the traced simulation run.
Note that currently the log must be complete, otherwise initial states
will be missing and the replay will fail because stage behaviour will be
different.
Note also that while serialization can turn any
dyn SendDataintobyes, it is not possible to get back the same
dyn SendDatabydeserializing a
TraceEntry. This is impossible because we'd need toknow the actual deserialization code for the concrete underlying type
(which may have type parameters etc.). So the deserialization result is
a generic CBOR
Valuethat is then redeserialized in the context wherethe concrete value is needed, which is also where the concrete type is
known and the
Deserializeinstance can be captured.Signed-off-by: Roland Kuhn rk@rkuhn.info
Summary by CodeRabbit
New Features
Refactor
MessageandStatetraits with a unifiedSendDatatrait supporting serialization and dynamic casting.SendDataand require serialization and deserialization traits.Clocktrait object throughout the simulation stack.SendDataand the new clock abstraction.Documentation
Style
Tests