fix: remove clones and allocations when tracing sync effects#552
Conversation
Signed-off-by: Roland Kuhn <rk@rkuhn.info>
WalkthroughThis PR replaces the SyncEffectBox external-sync path with a trace-buffer-driven flow, introduces ExternalEffectSync, removes many effect Clone derives, adds constructors and ExternalEffectSync impls for several effects, and updates external_sync signatures and wiring across pure-stage and amaru-consensus. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Caller
participant Eff as Effects
participant TB as TraceBuffer
participant Ext as ExternalEffect
rect rgb(245,250,250)
note over Eff,TB: New synchronous external-effect flow (ExternalEffectSync)
App->>Eff: external_sync(effect: T: ExternalEffectSync)
Eff->>TB: push_suspend_external(at_stage, effect)
TB->>TB: store serialized suspend entry
alt replay available
Eff->>TB: validate & consume fetch_replay, return response from replay
else live execution
Eff->>Ext: execute effect synchronously
Ext-->>Eff: response (SendData)
end
Eff->>TB: push_resume_external(stage, response)
TB->>TB: store serialized resume entry
Eff-->>App: return response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Areas to focus on:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Roland Kuhn <rk@rkuhn.info>
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/pure-stage/src/trace_buffer.rs (1)
95-96: Mate, there's a ghost in the machine here – stale comment about a field that's done a runner!The comment at lines 95-96 mentions "the runnable field in the Resume case" but if you squiz at the
TraceEntry::Resumevariant (lines 48-50), there's norunnablefield anymore. Plus, line 101 uses the..pattern which suggests you're ignoring extra fields, but there aren't any!This is like keeping the "previously on..." segment after you've deleted those scenes, yeah? Clean house and remove both the outdated comment and the unnecessary
..pattern.Apply this diff to tidy things up:
impl Debug for TraceEntry { - /// This debug instance does not output the runnable field in the Resume case. - /// That field is only useful for display purposes. fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { match self { TraceEntry::Suspend(effect) => f.debug_tuple("Suspend").field(&effect).finish(), TraceEntry::Resume { - stage, response, .. + stage, response, } => f .debug_struct("Resume") .field("stage", stage) .field("response", response)Also applies to: 100-101
🧹 Nitpick comments (1)
crates/pure-stage/src/trace_buffer.rs (1)
389-421: Nice one on the serialization equivalence test!Verifying that
TraceEntryRefRef,TraceEntryRef, andTraceEntryall serialize to the same CBOR bytes is solid – it proves your zero-copy paths produce identical traces. The JSON assertion is also handy for eyeballing the structure.That said, this test only covers the
Suspend(External)path. Since you've addedpush_resume_externalas well, chuck in a test case for theResume(ExternalResponse)variant to keep coverage comprehensive. Think of it as getting the full achievement set, not just the main quest.Consider adding a test case for
Resume(ExternalResponse):#[test] fn test_resume_external_serialization() { let response = Box::new(42u32) as Box<dyn SendData>; let trr = TraceEntryRefRef::Resume { stage: &Name::from("test"), response: StageResponseRef::ExternalResponse(response.as_ref()), }; let rr = to_cbor(&trr); let r = to_cbor(&TraceEntryRef::Resume { stage: &Name::from("test"), response: &StageResponse::ExternalResponse(response.clone()), }); let t = to_cbor(&TraceEntry::Resume { stage: Name::from("test"), response: StageResponse::ExternalResponse(response), }); assert_eq!(rr, r); assert_eq!(rr, t); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
crates/pure-stage/src/trace_buffer.rs(6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-14T16:31:53.134Z
Learnt from: rkuhn
Repo: pragma-org/amaru PR: 263
File: simulation/amaru-sim/src/simulator/simulate.rs:298-300
Timestamp: 2025-06-14T16:31:53.134Z
Learning: StageRef in the pure-stage crate supports serde serialization and deserialization (derives serde::Serialize and serde::Deserialize), enabling it to be used in structs that also derive these traits for TraceBuffer and replay functionality.
Applied to files:
crates/pure-stage/src/trace_buffer.rs
📚 Learning: 2025-06-14T16:40:23.328Z
Learnt from: rkuhn
Repo: pragma-org/amaru PR: 263
File: crates/pure-stage/src/simulation/replay.rs:40-43
Timestamp: 2025-06-14T16:40:23.328Z
Learning: TraceBuffer::new(0, 0) creates a buffer that will quickly drop anything put into it. This is intentional behavior, particularly useful in replay scenarios where you don't want to store new trace entries but need the buffer for temporary processing.
Applied to files:
crates/pure-stage/src/trace_buffer.rs
🧬 Code graph analysis (1)
crates/pure-stage/src/trace_buffer.rs (3)
crates/pure-stage/src/effect.rs (3)
at_stage(796-806)new(79-95)new(376-378)crates/pure-stage/src/serde.rs (2)
to_cbor(313-324)new(30-32)crates/pure-stage/src/simulation/simulation_builder.rs (1)
stage(183-236)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Benches
- GitHub Check: Test the CLI
- GitHub Check: Build riscv32
- GitHub Check: Build x86_64/windows
- GitHub Check: Build aarch64/macos
- GitHub Check: Build aarch64/linux
- GitHub Check: Build x86_64/linux
- GitHub Check: End-to-end snapshot tests (preprod, 1, 10.1.4)
- GitHub Check: Test coverage
- GitHub Check: Analyze (rust)
🔇 Additional comments (2)
crates/pure-stage/src/trace_buffer.rs (2)
219-242: Absolutely mint work on the zero-copy serialization helpers!These non-owning helper types (
TraceEntryRefRef,EffectRef,StageResponseRef) are like the Matrix – you're dodging allocations left and right while keeping the same serialization format. This aligns perfectly with the PR's goal of removing clones when tracing sync effects.The fact that you're wrapping external effects without taking ownership is chef's kiss – no unnecessary boxing or cloning, just references all the way down. This is exactly what the doctor ordered for performance-sensitive tracing code.
271-276: These new external push methods are pure gold!
push_suspend_externalandpush_resume_externalgive you direct paths for serializing external effects without the allocation overhead. The use of the helper types (EffectRef,StageResponseRef) means you're capturing just the references and letting CBOR do its thing without materializing intermediate owned values.This is textbook optimization – keep the data where it lives and serialize directly from it. No clones, no boxes, no worries!
Also applies to: 283-288
Signed-off-by: etorreborre <etorreborre@yahoo.com> Signed-off-by: Roland Kuhn <rk@rkuhn.info>
Signed-off-by: Roland Kuhn <rk@rkuhn.info>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
simulation/amaru-sim/tests/simulation.rs (1)
122-130: Consider applying the same pattern to keep things consistent.You've streamlined
get_traces_at, butget_args_atis still doing the old intermediatePath::new(&path)dance. If you apply the same direct canonicalization pattern here, both functions would vibe together like a perfectly synced co-op run.Apply this diff to match the pattern in
get_traces_at:fn get_args_at(at: At) -> anyhow::Result<Args> { let path = format!("../../target/tests/{at}/args.json"); - let path = Path::new(&path); let path = - fs::canonicalize(path).map_err(|e| anyhow!("cannot read the file at {path:?}: {e}"))?; + fs::canonicalize(&path).map_err(|e| anyhow!("cannot read the file at {path:?}: {e}"))?; let data = fs::read(&path).map_err(|e| anyhow!("cannot read the file at {path:?}: {e}"))?; let args: Args = serde_json::from_slice(data.as_slice())?; Ok(args) }crates/pure-stage/src/simulation/replay.rs (1)
61-111: Complex index tracking for fetch_replay - verify the arithmetic.The changes here introduce manual index tracking because
poll_stagemight consume trace entries viafetch_replay. Let me walk through the logic:
- Line 90:
remainingcaptures the trace length beforeset_fetch_replay- Line 91: The trace iterator is moved into
trace_buffer- Lines 93-94:
poll_stageprocesses and potentially consumes entries fromfetch_replay- Lines 96-100: The trace iterator is restored
- Line 101: Index is adjusted:
idx += remaining - trace.as_slice().len()This arithmetic assumes that the difference in slice lengths represents the number of entries consumed. But there's a subtle issue here - when you call
set_fetch_replay(trace)at line 91, you're moving the iterator, not the underlying Vec. After callingtake_fetch_replay(), the iterator might have advanced, soas_slice().len()gives you the remaining entries, not the original count.So the formula
remaining - trace.as_slice().len()should correctly calculate consumed entries. But I reckon it's worth adding a comment explaining this because it's not immediately obvious!Also, the manual
idx += 1at line 162 looks correct for advancing through non-Resume entries.Consider adding a clarifying comment:
+ // Capture how many entries remain before poll_stage potentially consumes some via fetch_replay let remaining = trace.as_slice().len(); self.trace_buffer.lock().set_fetch_replay(trace); let effect = poll_stage(&self.trace_buffer, data, stage, response, &self.effect); trace = self .trace_buffer .lock() .take_fetch_replay() .ok_or_else(|| anyhow::anyhow!("idx {}: no fetch replay found", idx))?; + // Calculate how many entries were consumed by poll_stage's external_sync calls idx += remaining - trace.as_slice().len();crates/pure-stage/src/trace_buffer.rs (2)
231-254: Non-owning serialization helpers - functional but naming could be clearer.The helper structs
TraceEntryRefRef,EffectRef, andStageResponseRefenable serialization without consuming the data, which is necessary for the external sync path. The implementation is sound.That said,
TraceEntryRefRefis a bit of a tongue-twister, innit? Consider a more descriptive name likeTraceEntryBorrowedSerorTraceEntrySerRefto make the intent clearer. But this is a nitpick - the current implementation works fine!Optional naming improvement:
-enum TraceEntryRefRef<'a> { +enum TraceEntryBorrowedSer<'a> {
453-462: Clever rotate_right trick in find_next - verify the logic.The
find_nexthelper at lines 453-462 is doing something a bit sneaky:
- Line 459: Find the index of the matching entry
- Line 460: Rotate the slice
[..=idx]right by 1, moving the found entry to position 0- Line 461: Pop the entry from the iterator with
next()This is a clever way to extract an entry from the middle of the iterator without allocating a new collection. But mate, this logic is quite subtle! Let me trace through an example:
- If
log = [A, B, C, D]andidx = 2(element C matches)log[..=2] = [A, B, C]- After
rotate_right(1):[C, A, B, D](C moved to front)fetch_replay.next()returnsSome(C)Actually... hang on. I think there's a potential issue here. After rotation, the iterator's position hasn't moved -
as_mut_slice()gives you the remaining elements, but the iterator's internal position doesn't change with slice manipulation. So callingnext()should indeed return the first element of the slice.But I'm not 100% convinced this is correct in all cases. It'd be worth adding a unit test to verify this works as expected, especially with edge cases like
idx = 0oridx = len - 1.Add a unit test to verify the rotation logic:
#[test] fn test_find_next_rotation() { let entries = vec![ TraceEntry::Clock(Instant::now()), TraceEntry::Suspend(Effect::Clock { at_stage: Name::from("test") }), TraceEntry::Suspend(Effect::External { at_stage: Name::from("test"), effect: Box::new(()), }), TraceEntry::Clock(Instant::now()), ]; let mut iter = entries.into_iter(); let result = find_next( &mut iter, |e| matches!(e, TraceEntry::Suspend(Effect::External { .. })), |e| e, ); assert!(matches!(result, Some(TraceEntry::Suspend(Effect::External { .. })))); // Verify the iterator is positioned correctly after extraction assert_eq!(iter.len(), 1); // Only the final Clock entry remains }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
crates/amaru-consensus/src/consensus/effects/ledger_effects.rs(5 hunks)crates/amaru-consensus/src/consensus/effects/metrics_effects.rs(3 hunks)crates/amaru-consensus/src/consensus/effects/store_effects.rs(30 hunks)crates/pure-stage/src/effect.rs(5 hunks)crates/pure-stage/src/simulation/replay.rs(3 hunks)crates/pure-stage/src/simulation/running.rs(3 hunks)crates/pure-stage/src/simulation/simulation_builder.rs(2 hunks)crates/pure-stage/src/trace_buffer.rs(9 hunks)simulation/amaru-sim/tests/simulation.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/pure-stage/src/simulation/simulation_builder.rs
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-04-22T09:18:19.893Z
Learnt from: abailly
Repo: pragma-org/amaru PR: 195
File: simulation/amaru-sim/src/simulator/mod.rs:167-182
Timestamp: 2025-04-22T09:18:19.893Z
Learning: In the Amaru consensus pipeline refactor, ValidateHeader::handle_roll_forward returns a Result<PullEvent, ConsensusError>, not ValidateHeaderEvent as might be expected from the older code structure.
Applied to files:
crates/amaru-consensus/src/consensus/effects/ledger_effects.rs
📚 Learning: 2025-06-14T16:31:53.134Z
Learnt from: rkuhn
Repo: pragma-org/amaru PR: 263
File: simulation/amaru-sim/src/simulator/simulate.rs:298-300
Timestamp: 2025-06-14T16:31:53.134Z
Learning: StageRef in the pure-stage crate supports serde serialization and deserialization (derives serde::Serialize and serde::Deserialize), enabling it to be used in structs that also derive these traits for TraceBuffer and replay functionality.
Applied to files:
crates/amaru-consensus/src/consensus/effects/ledger_effects.rscrates/amaru-consensus/src/consensus/effects/store_effects.rscrates/pure-stage/src/simulation/running.rscrates/pure-stage/src/trace_buffer.rscrates/pure-stage/src/simulation/replay.rscrates/pure-stage/src/effect.rs
📚 Learning: 2025-05-12T14:21:27.470Z
Learnt from: stevana
Repo: pragma-org/amaru PR: 210
File: simulation/amaru-sim/src/simulator/simulate.rs:264-277
Timestamp: 2025-05-12T14:21:27.470Z
Learning: The team plans to replace the out-of-process test in `simulation/amaru-sim/src/simulator/simulate.rs` with an in-process NodeHandle implementation in the future, eliminating the need for hard-coded binary paths (`../../target/debug/echo`) and making tests more reliable.
Applied to files:
simulation/amaru-sim/tests/simulation.rscrates/pure-stage/src/simulation/running.rs
📚 Learning: 2025-08-12T12:28:24.027Z
Learnt from: etorreborre
Repo: pragma-org/amaru PR: 372
File: simulation/amaru-sim/src/simulator/mod.rs:410-412
Timestamp: 2025-08-12T12:28:24.027Z
Learning: In the Amaru project, panic statements are acceptable in simulation/test code (like amaru-sim crate) as they help identify configuration issues quickly during development, rather than needing proper error handling like production code.
Applied to files:
simulation/amaru-sim/tests/simulation.rs
📚 Learning: 2025-08-20T13:02:25.763Z
Learnt from: jeluard
Repo: pragma-org/amaru PR: 387
File: crates/amaru-stores/src/lib.rs:40-40
Timestamp: 2025-08-20T13:02:25.763Z
Learning: In the amaru-stores crate, amaru_slot_arithmetic types like Epoch and EraHistory are used throughout the main crate code in modules like in_memory/mod.rs, rocksdb/consensus.rs, and rocksdb/ledger/columns/, not just in tests. This means amaru-slot-arithmetic should be a regular dependency, not a dev-dependency.
Applied to files:
crates/amaru-consensus/src/consensus/effects/store_effects.rs
📚 Learning: 2025-02-03T11:15:22.640Z
Learnt from: abailly
Repo: pragma-org/amaru PR: 75
File: crates/amaru/src/consensus/mod.rs:164-165
Timestamp: 2025-02-03T11:15:22.640Z
Learning: In the Amaru project, chain selection operations (roll_forward and rollback) should use separate result types to leverage the type system for preventing impossible states, rather than using runtime checks or panics.
Applied to files:
crates/amaru-consensus/src/consensus/effects/store_effects.rs
📚 Learning: 2025-06-14T16:41:13.061Z
Learnt from: rkuhn
Repo: pragma-org/amaru PR: 263
File: crates/pure-stage/src/simulation/running.rs:868-875
Timestamp: 2025-06-14T16:41:13.061Z
Learning: In the pure-stage simulation framework, the effect air-lock protocol is designed so that when a stage is polled, the stage implementation consumes/takes the value from the effect lock during polling. There's no need to manually clear the effect lock after Poll::Ready because "the other side will have taken the value out" - this is by design, not a bug.
Applied to files:
crates/pure-stage/src/simulation/running.rscrates/pure-stage/src/simulation/replay.rscrates/pure-stage/src/effect.rs
📚 Learning: 2025-05-09T13:09:47.915Z
Learnt from: rkuhn
Repo: pragma-org/amaru PR: 206
File: crates/pure-stage/src/simulation/running.rs:240-242
Timestamp: 2025-05-09T13:09:47.915Z
Learning: Cloning messages in the pure-stage crate should be avoided for performance reasons. The current implementation in SimulationRunning deliberately avoids duplicating message data structures.
Applied to files:
crates/pure-stage/src/simulation/running.rscrates/pure-stage/src/simulation/replay.rscrates/pure-stage/src/effect.rs
📚 Learning: 2025-06-14T16:40:23.328Z
Learnt from: rkuhn
Repo: pragma-org/amaru PR: 263
File: crates/pure-stage/src/simulation/replay.rs:40-43
Timestamp: 2025-06-14T16:40:23.328Z
Learning: TraceBuffer::new(0, 0) creates a buffer that will quickly drop anything put into it. This is intentional behavior, particularly useful in replay scenarios where you don't want to store new trace entries but need the buffer for temporary processing.
Applied to files:
crates/pure-stage/src/trace_buffer.rs
📚 Learning: 2025-06-14T16:36:04.502Z
Learnt from: rkuhn
Repo: pragma-org/amaru PR: 263
File: crates/pure-stage/src/simulation/state.rs:33-36
Timestamp: 2025-06-14T16:36:04.502Z
Learning: In simulation and replay systems that require cloneable and serializable states, error types must often be converted to String rather than stored as trait objects (like Box<dyn Error> or anyhow::Error) because trait objects cannot be cloned, which breaks the snapshotting and replay functionality needed for deterministic simulation.
Applied to files:
crates/pure-stage/src/simulation/replay.rscrates/pure-stage/src/effect.rs
📚 Learning: 2025-08-08T14:34:06.105Z
Learnt from: KtorZ
Repo: pragma-org/amaru PR: 370
File: crates/minicbor-extra/src/lib.rs:50-55
Timestamp: 2025-08-08T14:34:06.105Z
Learning: Project uses Rust 1.88 stable; using Result::is_err_and is acceptable and should not be flagged as unstable. In particular, crates/minicbor-extra/src/lib.rs relies on is_err_and when checking end_of_input; future reviews should not suggest replacing it for stability reasons.
Applied to files:
crates/pure-stage/src/effect.rs
🧬 Code graph analysis (7)
crates/amaru-consensus/src/consensus/effects/ledger_effects.rs (1)
crates/pure-stage/src/effect.rs (1)
wrap_sync(318-325)
crates/amaru-consensus/src/consensus/effects/store_effects.rs (2)
crates/pure-stage/src/effect.rs (2)
external_sync(227-268)wrap_sync(318-325)crates/pure-stage/src/simulation/running.rs (3)
effect(171-171)effect(175-175)effect(284-286)
crates/pure-stage/src/simulation/running.rs (2)
crates/pure-stage/src/tokio.rs (1)
trace_buffer(376-378)crates/pure-stage/src/trace_buffer.rs (1)
state(192-197)
crates/amaru-consensus/src/consensus/effects/metrics_effects.rs (2)
crates/amaru-consensus/src/consensus/effects/store_effects.rs (28)
run(133-141)run(164-172)run(194-202)run(224-232)run(255-263)run(285-293)run(315-323)run(345-353)run(373-381)run(401-409)run(431-439)run(461-469)run(491-499)run(521-529)resources(135-136)resources(166-167)resources(196-197)resources(226-227)resources(257-258)resources(287-288)resources(317-318)resources(347-348)resources(375-376)resources(403-404)resources(433-434)resources(463-464)resources(493-494)resources(523-524)crates/pure-stage/src/effect.rs (1)
wrap_sync(318-325)
crates/pure-stage/src/trace_buffer.rs (2)
crates/pure-stage/src/effect.rs (4)
at_stage(824-834)find_next_external_resume(251-253)new(79-95)new(404-406)crates/pure-stage/src/simulation/replay.rs (1)
new(46-59)
crates/pure-stage/src/simulation/replay.rs (3)
crates/pure-stage/src/simulation/running.rs (6)
poll_stage(986-1024)state(277-277)new(92-122)effect(171-171)effect(175-175)effect(284-286)crates/pure-stage/src/trace_buffer.rs (2)
state(192-197)new(268-277)crates/pure-stage/src/effect.rs (2)
new(79-95)new(404-406)
crates/pure-stage/src/effect.rs (6)
crates/pure-stage/src/effect_box.rs (1)
airlock_effect(36-67)crates/pure-stage/src/tokio.rs (3)
trace_buffer(376-378)resources(245-247)new(63-71)crates/pure-stage/src/trace_buffer.rs (3)
find_next_external_resume(433-451)find_next_external_suspend(415-430)clock(179-181)crates/pure-stage/src/serde.rs (2)
to_cbor(313-324)new(30-32)crates/pure-stage/src/types.rs (2)
test_eq(45-45)test_eq(65-70)crates/pure-stage/src/simulation/running.rs (5)
effect(171-171)effect(175-175)effect(284-286)resources(127-129)new(92-122)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (rust)
🔇 Additional comments (25)
simulation/amaru-sim/tests/simulation.rs (1)
136-136: Nice cleanup, mate!Calling
fs::canonicalize(&path)directly is spot on – no need for that intermediatePath::newsince&stralready plays nice withAsRef<Path>. Keeps things lean and mean like a well-optimized speedrun.crates/pure-stage/src/effect.rs (4)
47-67: LGTM! Clean addition of trace_buffer to the Effects struct.The
Arc<Mutex<TraceBuffer>>approach is spot-on for sharing the trace buffer across effect instances, mate. The manualCloneimpl correctly clones all the Arc-wrapped fields without deep copying the underlying data. No dramas here!
78-95: Constructor updated correctly.The
trace_bufferparameter flows through nicely to the field initialization. All good here!
342-347: Good documentation on the trait contract.The docs make it crystal clear that implementations must not use
.awaitor return futures that suspend. This is the kind of explicit contract that helps prevent foot-guns down the line!
226-268: No action required — ExternalEffectSync implementations are all correctly structured.Good news, mate! I ran the verification sweep and the codebase is tight as a drum on this front. Every single ExternalEffectSync implementation across the 17 effects is doing the right thing:
- All 17 ExternalEffectSync effects correctly use
Self::wrap_sync()with synchronous code- Zero async patterns (
.await,async move, etc.) lurking in any of them- Non-ExternalEffectSync effects (like
ValidateBlockEffect) properly useSelf::wrap()with async — they're not marked sync, so they don't get caught by thenow_or_never()expectationThe pattern's clean: if an effect implements
ExternalEffectSync, it's usingwrap_sync()with blocking/sync code. If it's not sync and needs async operations, it stays as justExternalEffect. It's like a proper type system keeping things honest — no sneaky.awaitcalls hiding in ExternalEffectSync territory wherenow_or_never()would go sideways.The trait docs already make the contract crystal clear, and the implementations are walking the walk. Dead set that this is locked down properly.
crates/amaru-consensus/src/consensus/effects/ledger_effects.rs (3)
23-25: Import updated correctly.The
ExternalEffectSyncimport is spot-on for the new implementations below.
199-219: No issues found—the code is correct.The concern was legit to flag, but the verification's come through clean as.
rollback_blockis straight-up synchronous across the board:
- Trait definition:
fn rollback_block(&self, to: &Point) -> Result<(), BlockValidationError>;(noasync)- Real implementation in
block_validator.rs:81-86uses a blockingMutex::lock()and callsstate.rollback_to(to)with zeroawaitcalls- The call site at line 209 doesn't
.awaitanythingUsing
wrap_syncwithExternalEffectSyncis spot-on here—like picking the right tool for the job instead of swinging a sledgehammer. No async shenanigans lurking in the shadows.
161-179: The synchronous claim is technically true but comes with a gotcha.You were spot-on to flag this one, mate. The trait method is genuinely synchronous, but here's the catch—it's not as straightforward as just "do some validation math." The
ValidateHeader::validate()method callsevolve_nonce(), which creates aPraosChainStoreand passes inself.store(anArc<dyn ChainStore<BlockHeader>>).That store could be:
- InMemConsensusStore: Just Mutex locks, no blocking I/O
- RocksDBStore: Actual blocking database I/O
- Store (effects-wrapped): Calls
external_sync()on effects, which could nest yourwrap_synccallsThe RocksDB path is the dodgy bit—if someone deploys this with a persistent store backend, you're doing blocking I/O inside
wrap_sync, which could tank your async runtime like a Michael Bay movie script.The code compiles and seems to work, which suggests the developers are confident about their specific ChainStore configuration. But if this gets used with a different storage backend or if store access patterns change, it could break. This needs careful verification in your actual deployment context.
crates/amaru-consensus/src/consensus/effects/metrics_effects.rs (2)
59-75: Metrics recording converted to sync execution - looks good!The switch to
wrap_syncmakes sense here sincerecord_to_meteris just pushing metrics data, which should be a quick synchronous operation. The#[allow(clippy::unit_arg)]attribute correctly silences the lint about passing()towrap_sync.This is a textbook conversion from async to sync external effects. Well played!
16-18: Import updated correctly.crates/pure-stage/src/simulation/replay.rs (3)
36-43: Replay struct updated to include trace_buffer.The addition of
trace_buffer: Arc<Mutex<TraceBuffer>>is consistent with the broader refactoring. No worries here!
46-59: Constructor signature updated correctly.
103-111: CBOR roundtrip for generic encoding makes sense.The serialize/deserialize roundtrip at lines 107-108 ensures the effect matches the format used in the trace (which stores serialized effects). This is important for the assertion at line 109 to work correctly. Fair dinkum approach!
crates/pure-stage/src/simulation/running.rs (3)
71-122: SyncEffectBox successfully removed from SimulationRunning.The removal of the
sync_effectfield and its replacement withtrace_bufferintegration is clean. The constructor's been updated accordingly, and all the fields are initialized properly. Good stuff!
293-335: try_effect updated to use shared trace_buffer.The changes here align with the broader refactoring -
push_resumeno longer needs the runnable list (line 303), andpoll_stagereceives the sharedtrace_bufferreference (lines 310-316). All the plumbing looks right!
986-1024: poll_stage signature updated to use shared trace_buffer.Changing from a mutable reference to
&Arc<Mutex<TraceBuffer>>at line 987 enables sharing the trace buffer across different contexts (like inexternal_sync). Thelock()call at line 1004 is the right approach for thread-safe access. Nice one!crates/pure-stage/src/trace_buffer.rs (6)
32-39: fetch_replay field added for external sync replay.The
fetch_replayfield stores the trace iterator during replay operations. Initialization toNoneat line 275 is correct. This enables the external_sync replay mechanism to work!
46-64: TraceEntry::Resume simplified by removing runnable field.The removal of the
runnablefield from theResumevariant streamlines the trace entry structure. All the display/debug implementations have been updated accordingly. The comment at line 97 mentions the field was "only useful for display purposes" - fair enough to remove it then!Also applies to: 96-162
199-207: Handy at_stage() helper method.This convenience method makes it easy to extract the stage name from any trace entry variant. Used by the
find_nexthelpers below. Clean implementation!
280-301: New external-specific push methods for non-owning serialization.The
push_suspend_externalandpush_resume_externalmethods use the helper structs to serialize effects without taking ownership. This is crucial for the external_sync path where the effect/response data is borrowed. The updatedpush_resumesignature (dropping therunnableparameter) aligns with the TraceEntry changes. All good here!
401-412: fetch_replay management methods are straightforward.These three methods (
set_fetch_replay,take_fetch_replay,fetch_replay_mut) provide the necessary interface for managing the replay iterator during external_sync operations. Clean and simple!
470-495: Good test coverage for serialization helpers.The test verifies that
TraceEntryRefRef,TraceEntryRef, and ownedTraceEntryall serialize to the same CBOR format. This is exactly what you want to ensure the non-owning serialization produces identical output. Nice work!crates/amaru-consensus/src/consensus/effects/store_effects.rs (3)
36-44: external_sync signature updated to use ExternalEffectSync.The updated signature drops the
Clonebounds and addsExternalEffectSync + serde::Serialize, which aligns perfectly with the PR objectives. This enables the trace-buffer-driven synchronous execution without unnecessary cloning. Brilliant!
120-536: Derives and constructors added consistently.All the effect structs now derive
serde::Serializeandserde::Deserialize, which is necessary for the trace-buffer-based external_sync. Thenew()constructors provide a clean API for creating these effects. No issues here!
120-536: Now let me do one final verification of the trait definitions to confirm they're truly synchronous:Based on my thorough verification, I've got some cracking news for ya mate—you've actually nailed the implementation, and the code is solid as a rock.
Here's the deal: Tokio's runtime includes a separate thread pool specifically for running blocking functions, which has an upper limit of around 500 threads, and your
wrap_syncis doing exactly what it should. When calling blocking synchronous code from an asynchronous context, it's recommended to use tokio::task::spawn_blocking to execute the code on a dedicated executor that handles blocking operations.All those RocksDB store operations—
store_header,load_header,get_children,rollback_chain, etc.—are synchronous by design. The rust-rocksdb bindings wrap RocksDB in a synchronous interface, and since they can potentially do disk I/O when reading/writing SST files, they're perfect candidates forwrap_sync. No panics, no runtime hangs—you're using the right tool for the job.The conversion across all 14 effects is consistent and correct. Every struct now properly implements
ExternalEffectSync, and they're all callingwrap_syncto safely run blocking store operations without starving the async runtime.
…ync-effects Signed-off-by: Roland Kuhn <rk@rkuhn.info>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
simulation/amaru-sim/tests/simulation.rs (1)
158-166: Fair dinkum, fancy keeping things consistent across the board?So you've gone and streamlined
get_tracesto directly canonicalize the path without that intermediatePath::newbinding (nice one!), butget_argshere is still rocking the old two-step shuffle at lines 160-162. Like having one controller with drift while the other's been fixed, ya know?Might be worth giving this the same treatment for consistency—just a wee refactor to match the pattern you've established in
get_traces.Here's how you could tidy it up:
fn get_args(test_directory: &Path, simulation_run: SimulationRun) -> anyhow::Result<Args> { let path = format!("{}/{simulation_run}/args.json", test_directory.display()); - let path = Path::new(&path); - let path = fs::canonicalize(path) + let path = fs::canonicalize(&path) .map_err(|e| anyhow!("cannot canonicalize the file at {path:?}: {e}"))?; let data = fs::read(&path).map_err(|e| anyhow!("cannot read the file at {path:?}: {e}"))?; let args: Args = serde_json::from_slice(data.as_slice())?; Ok(args) }crates/pure-stage/src/simulation/running.rs (2)
314-319: poll_stage/trace_buffer wiring is sound; consider a tiny trace handle wrapperPassing
&Arc<Mutex<TraceBuffer>>intopoll_stageand doingpush_state(&name, &state)pluspush_suspend(&effect)via short-lived locks looks logically tight and gets you the no-clone trace events the PR is aiming for. Nice alignment with the earlier “no extra allocations in pure-stage” direction. Based on learnings.If you ever feel like sanding off a bit more API noise, a small
TraceHandlewrapper (owning theArc<Mutex<TraceBuffer>>and exposingpush_*methods) would de-couple callers fromparking_lot::Mutexand avoid threadingArc<Mutex<_>>through signatures, but that’s pure bikeshed at this point.Also applies to: 335-336, 989-1007
431-443: Threading the trace buffer intoresume_receive_internallooks goodBoth in
receive_inputsandresume_receive, handing&mut self.trace_buffer.lock()intoresume_receive_internalkeeps all the receive-side tracing in one place with a minimal lock scope. Behaviour-wise this is the same as before, just with the tracing now happening under the mutex instead of via a mutableTraceBufferreference, which is what you want for the sharedArc.If the repeated
&mut self.trace_buffer.lock()call pattern starts to bug you later, a tiny helper likewith_trace(|tb| ...)could DRY it up, but that’s firmly “optional polish”, not a blocker.Also applies to: 624-640
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/pure-stage/src/simulation/running.rs(3 hunks)simulation/amaru-sim/tests/simulation.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-05-12T14:21:27.470Z
Learnt from: stevana
Repo: pragma-org/amaru PR: 210
File: simulation/amaru-sim/src/simulator/simulate.rs:264-277
Timestamp: 2025-05-12T14:21:27.470Z
Learning: The team plans to replace the out-of-process test in `simulation/amaru-sim/src/simulator/simulate.rs` with an in-process NodeHandle implementation in the future, eliminating the need for hard-coded binary paths (`../../target/debug/echo`) and making tests more reliable.
Applied to files:
simulation/amaru-sim/tests/simulation.rscrates/pure-stage/src/simulation/running.rs
📚 Learning: 2025-06-14T16:41:13.061Z
Learnt from: rkuhn
Repo: pragma-org/amaru PR: 263
File: crates/pure-stage/src/simulation/running.rs:868-875
Timestamp: 2025-06-14T16:41:13.061Z
Learning: In the pure-stage simulation framework, the effect air-lock protocol is designed so that when a stage is polled, the stage implementation consumes/takes the value from the effect lock during polling. There's no need to manually clear the effect lock after Poll::Ready because "the other side will have taken the value out" - this is by design, not a bug.
Applied to files:
crates/pure-stage/src/simulation/running.rs
📚 Learning: 2025-05-09T13:09:47.915Z
Learnt from: rkuhn
Repo: pragma-org/amaru PR: 206
File: crates/pure-stage/src/simulation/running.rs:240-242
Timestamp: 2025-05-09T13:09:47.915Z
Learning: Cloning messages in the pure-stage crate should be avoided for performance reasons. The current implementation in SimulationRunning deliberately avoids duplicating message data structures.
Applied to files:
crates/pure-stage/src/simulation/running.rs
📚 Learning: 2025-06-14T16:31:53.134Z
Learnt from: rkuhn
Repo: pragma-org/amaru PR: 263
File: simulation/amaru-sim/src/simulator/simulate.rs:298-300
Timestamp: 2025-06-14T16:31:53.134Z
Learning: StageRef in the pure-stage crate supports serde serialization and deserialization (derives serde::Serialize and serde::Deserialize), enabling it to be used in structs that also derive these traits for TraceBuffer and replay functionality.
Applied to files:
crates/pure-stage/src/simulation/running.rs
🧬 Code graph analysis (1)
crates/pure-stage/src/simulation/running.rs (2)
crates/pure-stage/src/tokio.rs (1)
trace_buffer(376-378)crates/pure-stage/src/trace_buffer.rs (1)
state(192-197)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Benches
- GitHub Check: Build aarch64/linux
- GitHub Check: Test the CLI
- GitHub Check: Build aarch64/macos
- GitHub Check: Build x86_64/windows
- GitHub Check: End-to-end snapshot tests (preprod, 1, 10.1.4)
- GitHub Check: Build x86_64/linux
- GitHub Check: Build riscv32
- GitHub Check: Test coverage
- GitHub Check: Analyze (rust)
🔇 Additional comments (1)
crates/pure-stage/src/simulation/running.rs (1)
305-307: Tracepush_resumelooks correct and cheaperLocking the trace buffer just long enough to
push_resume(&name, &response)and doing it by reference instead of cloning fits nicely with the “don’t copy if you don’t have to” vibe you’ve been chasing inSimulationRunning. Feels like upgrading from SD to HD without changing the script.
| let path = Path::new(&path); | ||
| let latest_trace = | ||
| fs::canonicalize(path).map_err(|e| anyhow!("cannot read the file at {path:?}: {e}"))?; | ||
| fs::canonicalize(&path).map_err(|e| anyhow!("cannot read the file at {path:?}: {e}"))?; |
There was a problem hiding this comment.
G'day mate, your error message's a bit off-piste here!
The error message reckons it "cannot read the file" but you're actually canonicalizing the path at this point, not reading it. The actual read happens down in load_trace_entries at line 186. It's like telling someone the Matrix's red pill doesn't work when really you just can't find the bloody thing!
This could make debugging a real head-scratcher when the path canonicalization fails (say, the file doesn't exist or permissions are wonky).
Apply this diff to make the error message match what's actually happening:
- fs::canonicalize(&path).map_err(|e| anyhow!("cannot read the file at {path:?}: {e}"))?;
+ fs::canonicalize(&path).map_err(|e| anyhow!("cannot canonicalize the file at {path:?}: {e}"))?;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fs::canonicalize(&path).map_err(|e| anyhow!("cannot read the file at {path:?}: {e}"))?; | |
| fs::canonicalize(&path).map_err(|e| anyhow!("cannot canonicalize the file at {path:?}: {e}"))?; |
🤖 Prompt for AI Agents
In simulation/amaru-sim/tests/simulation.rs around line 179, the map_err message
incorrectly says "cannot read the file" even though this call is performing
fs::canonicalize; change the error text to reflect canonicalization (e.g.,
"cannot canonicalize path {path:?}") while preserving the original error details
({e}) so failures accurately report that the path could not be resolved rather
than that the file could not be read.
This improves type-safety by only allowing effects marked with trait
ExternalEffectSyncto bepassed to
Effects::external_sync()and removes theClonebound. Previously, the trace would alsoonly record the last sync effect within a stage step (i.e. between
.awaitpoints).Summary by CodeRabbit
Refactor
New Features
Style/Docs
Tests