sketch possible stage API and wiring#187
Conversation
no implementation at all, just API for now
WalkthroughThis pull request updates the Rust version in the main project configuration and its toolchain files, while adding a new dependency for a fresh crate called "pure-stage." The new crate, including its own Cargo manifest, source library, and README, implements a type‐safe, asynchronous framework for actor model-like networks. Additionally, several new JSON configuration files are introduced for different Cardano eras, outlining genesis parameters, governance settings, and network topology. Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer
participant NT as Network
participant LS as Left Stage
participant RS as Right Stage
Dev->>NT: Create network instance (Network::new)
Dev->>NT: Add stages (Network::stage)
Note over NT, Dev: Stages are created with type-safe inputs/outputs
Dev->>NT: Wire stages together (Network::wire)
NT-->>Dev: Connection established between LS and RS
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
Note 🎁 Summarized by CodeRabbit FreeYour organization has reached its limit of developer seats under the Pro Plan. For new users, CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please add seats to your subscription by visiting https://app.coderabbit.ai/login.If you believe this is a mistake and have available seats, please assign one to the pull request author through the subscription management page using the link above. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
crates/amaru-kernel/src/network.rs (1)
39-40: 💡 Verification agent🧩 Analysis chain
Missed a few conversions there, like side quests in an RPG!
While you've updated some
From::from(...)calls to the more explicitSlot::from(...), there are quite a few that haven't been touched. For consistency's sake, it'd be bonzer to update all of them to use the explicit conversion.Bound { time_ms: 0, - slot: From::from(0), + slot: Slot::from(0), epoch: 0, },And similar updates for all other instances. This ensures the codebase has a consistent approach to type conversions - like having the same control scheme across all games in a series!
Also applies to: 55-56, 60-61, 71-72, 76-77, 88-89, 93-94, 105-106, 110-111, 122-123, 127-128, 139-140, 144-145
🏁 Script executed:
#!/bin/bash # Count how many From::from uses for slot conversions remain rg "slot: From::from\(" crates/amaru-kernel/src/network.rs | wc -lLength of output: 71
Subject: Consistency in Slot Conversions – Update Remaining Instances
Hey mate, looks like the code’s still holding onto 13 instances of
slot: From::from(incrates/amaru-kernel/src/network.rs. While you’ve made some moves by replacing a few with the more explicitSlot::from(...), we need to bring the whole crew in line for consistency—think of it like making sure every character in your RPG party is geared up the same!What needs doing:
Update all remaining occurrences of
slot: From::from(…)toslot: Slot::from(…); this applies to lines 39-40, 55-56, 60-61, 71-72, 76-77, 88-89, 93-94, 105-106, 110-111, 122-123, 127-128, 139-140, 144-145 in the file.The fix should make each conversion explicit, similar to this diff:
Bound { time_ms: 0, - slot: From::from(0), + slot: Slot::from(0), epoch: 0, },Please update these to keep our conversion style as tight as a well-tuned control scheme across all levels!
🧹 Nitpick comments (6)
crates/pure-stage/README.md (4)
20-20: Propose a more direct word than “needed”G’day, mate! “Needed” might be feeling a bit too plain, hey? “Required” usually gives off a more precise ring.
- The main disadvantage is that additional states are needed for modelling (biased) input operations + The main disadvantage is that additional states are required for modelling (biased) input operations🧰 Tools
🪛 LanguageTool
[style] ~20-~20: To elevate your writing, try using a synonym like ‘required’ here. Or, to avoid using the passive voice, try replacing the past participle ‘needed’ with an adjective.
Context: ...advantage is that additional states are needed for modelling (biased) input operations...(IS_NEEDED_NECESSARY)
24-24: Remove indefinite article "an" before.awaitThough “an
.await” has a bit of flare, it’s simpler to say “a.await” and keeps the text easier to parse.- because the return type of an `.await` point cannot be constrained + because the return type of a `.await` point cannot be constrained🧰 Tools
🪛 LanguageTool
[grammar] ~24-~24: If ‘type’ is a classification term, ‘an’ is not necessary. Use “type of”. (The phrases ‘kind of’ and ‘sort of’ are informal if they mean ‘to some extent’.)
Context: ...using runtime checks because the return type of an.awaitpoint cannot be constrained. ...(KIND_OF_A)
88-88: Add missing period for completenessToss a period at the end of that line to tie it up neatly, like adding the final flourish to your best coffee foam art.
- ... difficult to think up and probably still as bugs I didn’t realise + ... difficult to think up and probably still as bugs I didn’t realise.🧰 Tools
🪛 LanguageTool
[uncategorized] ~88-~88: A period might be missing here.
Context: ...icult to think up and probably still as bugs I didn’t realise - state progression is...(AI_EN_LECTOR_MISSING_PUNCTUATION_PERIOD)
89-89: Swap "hard" with a fancier alternative“Hard” is kind of, well, bland, mate. Go for a more vivid word like “challenging” to jazz up your doc.
- state progression is hard to see when reading the source code + state progression is challenging to see when reading the source code🧰 Tools
🪛 LanguageTool
[style] ~89-~89: To elevate your writing, try using a synonym here.
Context: ...I didn’t realise - state progression is hard to see when reading the source code ##...(HARD_TO)
crates/pure-stage/src/lib.rs (2)
185-190: Handle or eliminatetodo!()blocksYou’ve dropped a few placeholder “to-do” bombs in here, champ. If ya fancy concurrency or want to test your logic, maybe stub out those branches to keep your code lively.
Do you want a rough snippet for these branches?
127-220: Improve test coverageYour
#[cfg(test)]module is a nice start—like a gamer’s tutorial zone. But it’s short on actual gameplay, with so manytodo!(). Consider fleshing out a basic end-to-end scenario.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
Cargo.toml(1 hunks)crates/amaru-kernel/src/network.rs(5 hunks)crates/pure-stage/Cargo.toml(1 hunks)crates/pure-stage/README.md(1 hunks)crates/pure-stage/src/lib.rs(1 hunks)rust-toolchain.toml(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Security audit
Cargo.toml
[error] cargo-machete found unused dependencies: amaru-consensus, amaru-kernel, pallas-network, tokio.
crates/pure-stage/Cargo.toml
[error] cargo-machete found unused dependencies: amaru-consensus, amaru-kernel, pallas-network, tokio.
🪛 LanguageTool
crates/pure-stage/README.md
[style] ~20-~20: To elevate your writing, try using a synonym like ‘required’ here. Or, to avoid using the passive voice, try replacing the past participle ‘needed’ with an adjective.
Context: ...advantage is that additional states are needed for modelling (biased) input operations...
(IS_NEEDED_NECESSARY)
[grammar] ~24-~24: If ‘type’ is a classification term, ‘an’ is not necessary. Use “type of”. (The phrases ‘kind of’ and ‘sort of’ are informal if they mean ‘to some extent’.)
Context: ...using runtime checks because the return type of an .await point cannot be constrained. ...
(KIND_OF_A)
[uncategorized] ~88-~88: A period might be missing here.
Context: ...icult to think up and probably still as bugs I didn’t realise - state progression is...
(AI_EN_LECTOR_MISSING_PUNCTUATION_PERIOD)
[style] ~89-~89: To elevate your writing, try using a synonym here.
Context: ...I didn’t realise - state progression is hard to see when reading the source code ##...
(HARD_TO)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: Snapshots (preprod, 10.1.4)
- GitHub Check: Build on ubuntu-24.04 with target aarch64-unknown-linux-musl
- GitHub Check: Build on ubuntu-latest with target x86_64-unknown-linux-gnu
- GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
- GitHub Check: Build on macos-latest with target aarch64-apple-darwin
- GitHub Check: Sanity
🔇 Additional comments (8)
Cargo.toml (1)
10-10: G'day! Rust version bump looks ace!You've kept the helpful comment about updating the toolchain file too - crisp like a fresh pint of Guinness!
🧰 Tools
🪛 GitHub Actions: Security audit
[error] cargo-machete found unused dependencies: amaru-consensus, amaru-kernel, pallas-network, tokio.
rust-toolchain.toml (1)
2-3: Deadly toolchain update, mate!The Rust version bump to 1.85 pairs perfectly with your Cargo.toml change - like Batman and Robin, they're better together! The component list formatting is tidier now too.
crates/amaru-kernel/src/network.rs (3)
15-16: Top-notch import refactoring, mate!Separating out the
Slotimport makes it clear what's being used directly vs. what's being re-exported. It's like organizing your Steam library by "Currently Playing" and "Backlog"!
44-44: Explicit type conversions are the bee's knees!Using
Slot::from(...)instead ofFrom::from(...)is clearer and more type-safe. It's like choosing Master Chief over a generic space marine - you know exactly what you're getting!Also applies to: 168-169, 173-174
295-295: Test fixes are sorted, nice one!You've updated the tests to match the new conversion style - that's proper good coding hygiene, like remembering to save before a boss fight!
Also applies to: 325-327, 333-333, 336-337, 340-341
crates/pure-stage/README.md (1)
14-37: Solid conceptual coverage for your design elementsLove the clarity about state machines versus the wiring. Reminds me of leveling up in a classic RPG—structure’s all there, just waiting to be fleshed out. Rock on!
🧰 Tools
🪛 LanguageTool
[style] ~20-~20: To elevate your writing, try using a synonym like ‘required’ here. Or, to avoid using the passive voice, try replacing the past participle ‘needed’ with an adjective.
Context: ...advantage is that additional states are needed for modelling (biased) input operations...(IS_NEEDED_NECESSARY)
[grammar] ~24-~24: If ‘type’ is a classification term, ‘an’ is not necessary. Use “type of”. (The phrases ‘kind of’ and ‘sort of’ are informal if they mean ‘to some extent’.)
Context: ...using runtime checks because the return type of an.awaitpoint cannot be constrained. ...(KIND_OF_A)
crates/pure-stage/src/lib.rs (2)
1-87: Great start for your type-level designYour Zero, Succ, and Nat traits sparkle like a finely aged Aussie shiraz, mate! They nicely express compile-time invariants. Good job, no major grief here.
108-108:Details
❓ Verification inconclusive
Confirm
std::ops::AsyncFnusageTap the brakes, friend! In stable Rust,
std::ops::AsyncFnmightn’t exist. Double-check if this is a custom nightly feature or a local extension.
🏁 Script executed:
#!/bin/bash # Searching for AsyncFn references in the codebase rg -A 5 "AsyncFn"Length of output: 399
Heads up, mate!
std::ops::AsyncFnisn’t standard on stable Rust.
It looks like we only see it referenced incrates/pure-stage/src/lib.rswith no additional definitions in the repository. Could you confirm whether this is an intentional nightly feature or perhaps a custom extension? If it’s meant to be a nightly-only or locally defined trait, maybe adding a brief comment or appropriate feature gating would clear things up.
stevana
left a comment
There was a problem hiding this comment.
Nice, thanks for putting this together. A couple of thoughts:
- How confident are you/we that the rest of the consensus node can be expressed using this API, i.e. is it expressive enough?
- To me it's not immediately clear that a system written using this API will be easy to simulation test. How can we deterministically "step" one stage, i.e. feed it with an input and observe its output and state change? (So that we can collect a trace of
(state, input, output, state)tuples that we can later write properties against and visualise in a debugger.)
| - fully back-pressured | ||
| - ability for biased reading from inputs (which is necessary to avoid deadlocks with back-pressure) | ||
| - wiring code should be nicely readable | ||
|
|
There was a problem hiding this comment.
Can we add "ability to (easily) dump the state before and after processing an input"? (So that we can write a nice debugger.)
|
|
||
| - no side effects, only explicit effects (which does include internal state changes) | ||
| - easy to use, including the ability to hold references between invocations | ||
| - executable on concurrent thread pool or using a deterministic simulator |
There was a problem hiding this comment.
This seems to suggest that the interleaving of threads doesn't matter in the "production deployment" (when using concurrent thread pool) as opposed to when in "simulation/testing deployment".
That's one strategy, but I think it would make it much harder to argue what we deploy is what we test this way. Alternatively we can try to stay deterministic while performant (in the production deployment) a la Disruptor.
There was a problem hiding this comment.
If we use discipline do ensure that the only effects within a stage consist of updates to local state updates, then it shouldn’t matter whether parallelism is introduced between different stages. One stage will never be executed concurrently with itself, so we get the same ability to parallelise as the actor model provides.
One assumption we need to make is that causality assumptions are transported between stages via proper messaging and not out-of-band (which is included in the above effect assumption). This allows a later state to depend for example on some information having been added to an append-only content-addressed storage facility.
My gut feeling is that adequate node performance — especially in the Leios era — will require parallel execution of processing steps, so we should design for this ability from the get go.
There was a problem hiding this comment.
One stage will never be executed concurrently with itself
I assumed we'd potentially want to shard slow stages (e.g. put two CPUs/threads on one stage and diverge even items to one CPU/thread and odd to the other). This is something that Disruptor can do without breaking determinism, while simply adding a thread pool of workers to each stage breaks determinism (which is what I thought you were suggesting when you mentioned a thread pool.)
There was a problem hiding this comment.
I am not fond of committing those. This is going to confuse the hell out of people if we start versioning configuration for other software at the root of the project. Few notes:
- If we explicitly need those configuration files for testing, let's put them in a tests folder, and make sure we add a relevant README.
- I would rather rely on a git submodule for those. The config "often" changes and are network-dependent. So it's best to rely on something like: https://github.com/cardano-foundation/cardano-configurations
- If they are only versioned for one's convenience because we sometimes need to run a Cardano node for our own development, then: don't 😅 ... Put them somewhere else on your machine, but not under Amaru.
There was a problem hiding this comment.
Of course I didn’t intend to check these in. I have an update to .gitignore pending on another branch (IIRC), and I heroically managed to ignore these files in the beginning, but then they slipped through.
So basically: thanks for highlighting that .gitignore needs some love :-)
There was a problem hiding this comment.
.git/info/exclude would probably be a better place for those as they don't universally apply to the project.
There was a problem hiding this comment.
Hmm, we may be talking past each other or this topic needs more communication between us.
My proposal is to include the instructions for ignoring files at a certain path within the version-controlled repository contents because within that contents there also are a README and a Makefile that populate that certain path, so everyone who follows the instructions will have to manually ignore that path. What downside would there be do have battery shieldings included upon cloning? (the batteries being files that are automatically downloaded for the reasons you state above)
I’ll include the concrete change in #149 so that we can review it there.
| - quite complicated | ||
| - was difficult to think up and probably still as bugs I didn’t realise | ||
| - state progression is hard to see when reading the source code |
There was a problem hiding this comment.
As already discussed, I (we) have written code like that in the past. By separating the interpretation of each transition into functions, things become quite readable and more importantly, the state machine structure becomes obvious.
There was a problem hiding this comment.
In the above example, importing all the constructors' variants would already go a long way towards increasing legibility.
There was a problem hiding this comment.
Perhaps something like this be useful: https://hoverbear.org/blog/rust-state-machine-pattern/ ?
But I suppose that the problem with any "raw state machine" (explicit function from (i,s) -> (o, s)) approach is the "intermediate states", that are not part of what's interesting. For example if we want to add async IO/RPC as part of a transition, then we need to introduce "intermediate states" that suspend the transition and resume it when the IO completes, these intermediate states are clutter if you are familiar with awaits (they don't make things more obvious, but rather obscure the intent).
In the Maelstrom inspired DSL that I've been working with I've hidden these intermediate states in the interpreter/event loop, away from the programmer. What Roland is suggesting is to rely on the Rust compiler to fill in the intermediate states via awaits.
I still don't see exactly how simulating awaits would work though, because it's exactly at these intermediate states where you want to yield to the simulator. It would be good if we could sketch this a bit further. I've been trying to read the async from scratch series: https://natkr.com/tags/async-from-scratch-series/ but it's still not clear to me.
There's also be the option of doing some kind of DSL in Rust, closer to how I and other's have implemented the Maelstrom DSL in Rust. This DSL then gets interpreted into a "raw state machine", but this would obviously require reimplementing some of the await machinery. One potential advantage of this approach is that because we have the "raw state machine" (i.e. explicit function from (i, s) -> (o, s)) in the end, we can easily dump the state between steps in the simulator, which gives us a trace of state transitions which is useful for making assertions and writing a debugger. With the async approach we'd need to dump the state manually somehow, but should still be doable.
There was a problem hiding this comment.
I agree with @stevana that the key here to have a convincing "framework" for writing state machines using async code is to demonstrate how this is amenable to simulating, deterministically, various execution paths generated from a model.
I did some archaeology and found that we discussed different approaches for Hydra in April 2021: https://github.com/cardano-scaling/hydra/wiki/Logbook-2021-H1#2021-04-26
I unfortunately don't have the code for the various other approaches (Monad stack, free monads) we tried.
There was a problem hiding this comment.
Yes, I agree. I’m working on that (besides catching up to main with the downstream server), should have a demonstrator next week.
BTW: the Church numerals were only useful for the initial API sketch, they aren’t practical for an actual implementation. Will probably go with tuples and macros — we don’t expect more than a handful of inputs or outputs, right?
| async fn chain_sync(state: ChainSync, input: ChainSyncMsg) -> (ChainSync, Vec<ChainSyncOutput>) { | ||
| match (state, input) { | ||
| (ChainSync::Initial(tip), ChainSyncMsg::BlockOp(op)) => | ||
| // mostly like the above, but with less intermediate states e.g. for reading the store or for output |
There was a problem hiding this comment.
So basically effects would be just plain async fn functions on which the code awaits?
|
What happend/is going to happen with this PR? |
|
Hi @stevana! I’m developing a better API with an actual implementation (same branch) but it isn’t in a state where anyone (or the rabbit) should look at it. Will open a new PR once I have something working. |
no implementation at all, just API for now
Summary by CodeRabbit
New Features
Chores
Documentation