Skip to content

feat: added EDR for errors#346

Closed
jeluard wants to merge 1 commit into
mainfrom
jeluard/edr-errors
Closed

feat: added EDR for errors#346
jeluard wants to merge 1 commit into
mainfrom
jeluard/edr-errors

Conversation

@jeluard

@jeluard jeluard commented Jul 22, 2025

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • Documentation
    • Added a new architecture decision record outlining a standardized error handling strategy, including usage guidelines and examples for consistent error management across the project.

Signed-off-by: jeluard <jeluard@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jul 22, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

A new architecture decision record was added, laying out a standardised error handling approach for the Amaru node project. It prescribes using structured error enums with the thiserror crate, Result<T, E> types for fallible functions, and the anyhow crate for application-level error management, complete with context-rich propagation and examples.

Changes

File(s) Change Summary
engineering-decision-records/011-error-handling.md Added architecture decision record detailing consistent error handling strategy, error enums, usage of thiserror and anyhow, and example implementations.

Estimated code review effort

1 (~2 minutes)

Poem

G'day, mate, a record's here,
To keep our errors crystal clear.
With enums, context, never fear—
No panics lurking in the rear!
Like Mario dodging shells anew,
Our code's robust, and stylish too.
Cheers to errors handled right,
Now bugs won't keep us up at night!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
engineering-decision-records/011-error-handling.md (5)

19-23: Mind the clash with our current “panic-first” boot-strapping phase

The ADR outlaws panic! except in “unreachable” paths, yet we still have an explicit expect/unwrap-for-fatal policy in several subsystems (see rkuhn’s notes). Without a transition plan we’ll end up with a mixed bag of philosophies – a bit like shipping both VHS and Betamax in the same box.


Consider adding one short paragraph that:

  1. Acknowledges the existing panic hotspots.
  2. Defines a milestone (e.g. “before the 0.3 release”) for migrating them to Result.
  3. Explains the temporary coexistence rule of thumb.

This keeps the record truthful and gives the team a north-star without blocking today’s builds.


57-60: Nit: grammar tweak & smoother wording

A couple of tiny hiccups break the reading flow:

-// If extra contexts is required, use anyhow
+// If extra context is required, use `anyhow`

While you’re there, maybe slip in back-ticks around context() for consistency with the rest of the snippet.


77-81: Typo – “makese” looks like a glitch in the Matrix

-// Sometimes it makese sense to have Error close to some module sub-system
+// Sometimes it makes sense to keep Error close to a specific sub-system

93-107: Example can be leaner – duplicate call & verbose match

Calling fn1_bis() twice and wrapping the result in an extra match with a trailing ? is a tad over-caffeinated. A slimmer version reads easier and still shows down-casting:

-fn main() -> anyhow::Result<()> {
-    fn1()?;
-    fn1_bis()?;
-    match fn1_bis() {
-        Ok(_) => Ok::<(), anyhow::Error>(()),
-        Err(e) => {
-            match e.downcast_ref() {
-                Some(Error::IoError(_)) => Ok(()),
-                Some(_) => Ok(()),
-                None => Ok(()),
-            }
-        }
-    }?;
-    Ok(())
-}
+fn main() -> anyhow::Result<()> {
+    fn1()?;          // propagates as anyhow::Error
+
+    if let Err(err) = fn1_bis() {
+        if let Some(Error::IoError(_)) = err.downcast_ref::<Error>() {
+            // handle the specific case
+        }
+    }
+    Ok(())
+}

Same teaching value, half the carbs.


41-49: Top-level #[error("Error")] dilutes the enum’s flavour

The derive macro already formats variants nicely; the generic “Error” string at enum level will prepend every message, turning “IO Error: …” into “Error: IO Error: …”. That double prefix feels like watching the opening credits twice.

Unless you really want that duplication, drop the outer attribute:

-#[derive(thiserror::Error, Debug)]
-#[error("Error")]
+#[derive(thiserror::Error, Debug)]
 pub enum Error {

Keeps the final messages crisp.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7757809 and 94ad105.

📒 Files selected for processing (1)
  • engineering-decision-records/011-error-handling.md (1 hunks)
🧠 Learnings (2)
📓 Common learnings
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward.rs:73-75
Timestamp: 2025-04-20T18:02:25.073Z
Learning: In the current development stage, rkuhn prefers using explicit panics (via `.expect()` or `.unwrap()`) for fatal errors in the application code that would tear down the node, rather than propagating errors with `Result`. The intention is to eventually transition to proper error handling with `Result` as the codebase matures.
Learnt from: abailly
PR: pragma-org/amaru#75
File: crates/amaru/src/consensus/mod.rs:164-165
Timestamp: 2025-02-03T11:15:22.640Z
Learning: In the Amaru project, chain selection operations (roll_forward and rollback) should use separate result types to leverage the type system for preventing impossible states, rather than using runtime checks or panics.
Learnt from: KtorZ
PR: pragma-org/amaru#0
File: :0-0
Timestamp: 2025-04-04T16:49:53.462Z
Learning: The Amaru project follows a design decision to use traits for state management interfaces that integrate validation and state updates in a single pass, while maintaining flexibility and pluggability.
Learnt from: rkuhn
PR: pragma-org/amaru#263
File: crates/pure-stage/src/simulation/state.rs:33-36
Timestamp: 2025-06-14T16:36:04.502Z
Learning: In simulation and replay systems that require cloneable and serializable states, error types must often be converted to String rather than stored as trait objects (like Box<dyn Error> or anyhow::Error) because trait objects cannot be cloned, which breaks the snapshotting and replay functionality needed for deterministic simulation.
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward/test_infra.rs:272-285
Timestamp: 2025-04-20T17:57:23.233Z
Learning: In test infrastructure code, rkuhn prefers explicit panics (using .unwrap() or similar) over returning Result types, as test failures should be immediate and obvious.
Learnt from: abailly
PR: pragma-org/amaru#195
File: simulation/amaru-sim/src/simulator/mod.rs:167-182
Timestamp: 2025-04-22T09:18:19.893Z
Learning: In the Amaru consensus pipeline refactor, ValidateHeader::handle_roll_forward returns a Result<PullEvent, ConsensusError>, not ValidateHeaderEvent as might be expected from the older code structure.
Learnt from: stevana
PR: pragma-org/amaru#210
File: simulation/amaru-sim/src/simulator/simulate.rs:264-277
Timestamp: 2025-05-12T14:21:27.470Z
Learning: The team plans to replace the out-of-process test in `simulation/amaru-sim/src/simulator/simulate.rs` with an in-process NodeHandle implementation in the future, eliminating the need for hard-coded binary paths (`../../target/debug/echo`) and making tests more reliable.
engineering-decision-records/011-error-handling.md (5)

Learnt from: rkuhn
PR: #149
File: crates/amaru/src/stages/consensus/chain_forward.rs:73-75
Timestamp: 2025-04-20T18:02:25.073Z
Learning: In the current development stage, rkuhn prefers using explicit panics (via .expect() or .unwrap()) for fatal errors in the application code that would tear down the node, rather than propagating errors with Result. The intention is to eventually transition to proper error handling with Result as the codebase matures.

Learnt from: rkuhn
PR: #149
File: crates/amaru/src/stages/consensus/chain_forward/test_infra.rs:272-285
Timestamp: 2025-04-20T17:57:23.233Z
Learning: In test infrastructure code, rkuhn prefers explicit panics (using .unwrap() or similar) over returning Result types, as test failures should be immediate and obvious.

Learnt from: rkuhn
PR: #263
File: crates/pure-stage/src/simulation/state.rs:33-36
Timestamp: 2025-06-14T16:36:04.502Z
Learning: In simulation and replay systems that require cloneable and serializable states, error types must often be converted to String rather than stored as trait objects (like Box or anyhow::Error) because trait objects cannot be cloned, which breaks the snapshotting and replay functionality needed for deterministic simulation.

Learnt from: abailly
PR: #75
File: crates/amaru/src/consensus/mod.rs:164-165
Timestamp: 2025-02-03T11:15:22.640Z
Learning: In the Amaru project, chain selection operations (roll_forward and rollback) should use separate result types to leverage the type system for preventing impossible states, rather than using runtime checks or panics.

Learnt from: abailly
PR: #195
File: simulation/amaru-sim/src/simulator/mod.rs:167-182
Timestamp: 2025-04-22T09:18:19.893Z
Learning: In the Amaru consensus pipeline refactor, ValidateHeader::handle_roll_forward returns a Result<PullEvent, ConsensusError>, not ValidateHeaderEvent as might be expected from the older code structure.

🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward.rs:73-75
Timestamp: 2025-04-20T18:02:25.073Z
Learning: In the current development stage, rkuhn prefers using explicit panics (via `.expect()` or `.unwrap()`) for fatal errors in the application code that would tear down the node, rather than propagating errors with `Result`. The intention is to eventually transition to proper error handling with `Result` as the codebase matures.
Learnt from: abailly
PR: pragma-org/amaru#75
File: crates/amaru/src/consensus/mod.rs:164-165
Timestamp: 2025-02-03T11:15:22.640Z
Learning: In the Amaru project, chain selection operations (roll_forward and rollback) should use separate result types to leverage the type system for preventing impossible states, rather than using runtime checks or panics.
Learnt from: KtorZ
PR: pragma-org/amaru#0
File: :0-0
Timestamp: 2025-04-04T16:49:53.462Z
Learning: The Amaru project follows a design decision to use traits for state management interfaces that integrate validation and state updates in a single pass, while maintaining flexibility and pluggability.
Learnt from: rkuhn
PR: pragma-org/amaru#263
File: crates/pure-stage/src/simulation/state.rs:33-36
Timestamp: 2025-06-14T16:36:04.502Z
Learning: In simulation and replay systems that require cloneable and serializable states, error types must often be converted to String rather than stored as trait objects (like Box<dyn Error> or anyhow::Error) because trait objects cannot be cloned, which breaks the snapshotting and replay functionality needed for deterministic simulation.
Learnt from: rkuhn
PR: pragma-org/amaru#149
File: crates/amaru/src/stages/consensus/chain_forward/test_infra.rs:272-285
Timestamp: 2025-04-20T17:57:23.233Z
Learning: In test infrastructure code, rkuhn prefers explicit panics (using .unwrap() or similar) over returning Result types, as test failures should be immediate and obvious.
Learnt from: abailly
PR: pragma-org/amaru#195
File: simulation/amaru-sim/src/simulator/mod.rs:167-182
Timestamp: 2025-04-22T09:18:19.893Z
Learning: In the Amaru consensus pipeline refactor, ValidateHeader::handle_roll_forward returns a Result<PullEvent, ConsensusError>, not ValidateHeaderEvent as might be expected from the older code structure.
Learnt from: stevana
PR: pragma-org/amaru#210
File: simulation/amaru-sim/src/simulator/simulate.rs:264-277
Timestamp: 2025-05-12T14:21:27.470Z
Learning: The team plans to replace the out-of-process test in `simulation/amaru-sim/src/simulator/simulate.rs` with an in-process NodeHandle implementation in the future, eliminating the need for hard-coded binary paths (`../../target/debug/echo`) and making tests more reliable.
engineering-decision-records/011-error-handling.md (5)

Learnt from: rkuhn
PR: #149
File: crates/amaru/src/stages/consensus/chain_forward.rs:73-75
Timestamp: 2025-04-20T18:02:25.073Z
Learning: In the current development stage, rkuhn prefers using explicit panics (via .expect() or .unwrap()) for fatal errors in the application code that would tear down the node, rather than propagating errors with Result. The intention is to eventually transition to proper error handling with Result as the codebase matures.

Learnt from: rkuhn
PR: #149
File: crates/amaru/src/stages/consensus/chain_forward/test_infra.rs:272-285
Timestamp: 2025-04-20T17:57:23.233Z
Learning: In test infrastructure code, rkuhn prefers explicit panics (using .unwrap() or similar) over returning Result types, as test failures should be immediate and obvious.

Learnt from: rkuhn
PR: #263
File: crates/pure-stage/src/simulation/state.rs:33-36
Timestamp: 2025-06-14T16:36:04.502Z
Learning: In simulation and replay systems that require cloneable and serializable states, error types must often be converted to String rather than stored as trait objects (like Box or anyhow::Error) because trait objects cannot be cloned, which breaks the snapshotting and replay functionality needed for deterministic simulation.

Learnt from: abailly
PR: #75
File: crates/amaru/src/consensus/mod.rs:164-165
Timestamp: 2025-02-03T11:15:22.640Z
Learning: In the Amaru project, chain selection operations (roll_forward and rollback) should use separate result types to leverage the type system for preventing impossible states, rather than using runtime checks or panics.

Learnt from: abailly
PR: #195
File: simulation/amaru-sim/src/simulator/mod.rs:167-182
Timestamp: 2025-04-22T09:18:19.893Z
Learning: In the Amaru consensus pipeline refactor, ValidateHeader::handle_roll_forward returns a Result<PullEvent, ConsensusError>, not ValidateHeaderEvent as might be expected from the older code structure.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Coverage
  • GitHub Check: Build on windows-latest with target x86_64-pc-windows-msvc
  • GitHub Check: Snapshots (preprod, 1, 10.1.4)
  • GitHub Check: Build on ubuntu-latest with target riscv32im-risc0-zkvm-elf

@codecov

codecov Bot commented Jul 22, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
see 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@abailly abailly left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a couple of "counterexamples" to show what not to do

@KtorZ KtorZ left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @abailly

@KtorZ KtorZ closed this Aug 9, 2025
@jeluard

jeluard commented Aug 9, 2025

Copy link
Copy Markdown
Contributor Author

Any reason to close this @KtorZ ?

@KtorZ

KtorZ commented Aug 9, 2025

Copy link
Copy Markdown
Contributor

Yes, I merged it; but manually after a rebase and a rename (other EDRs got accepted and merged in between). So Github shows it as closed 🤷. I should've commented here, my bad.

I also took care of adding some points about "what not to do" and ported some of @rkuhn's comments from that gist you shared some time ago.

-> https://github.com/pragma-org/amaru/blob/main/engineering-decision-records/013-error-handling-strategies.md

@jeluard

jeluard commented Aug 10, 2025

Copy link
Copy Markdown
Contributor Author

Ahh oups 🤦 I was about to push some pitfalls, great to see you did it already :)

@KtorZ

KtorZ commented Aug 10, 2025

Copy link
Copy Markdown
Contributor

Feel free to make a follow-up PR 🫡; I just got tired seeing it hanging there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants