Handling of ra machine failures

I have noticed that the system ends up in a bad state (?) if the `apply` callback crashes (`erlang:error` or `erlang:exit`).

If `apply` crashes, the `ra_server_proc` proccess exits, and is restarted by its supervisor (`ra_server_sup`), but then it fails again, and the supervisor reaches its max restart, and then its supervisor (`ra_server_sup_sup`) detects this.  However, the child `ra_server_sup` has restart strategy `temporary`, so it just ignores this error.    Here's an attempt to illustrate the supervision tree:

The system is called `store`.

```
ra_sup [one_for_one, max 1 restarts in 5 secs]
  +-- PERM ra_systems_sup [one_for_one, max 1 restarts in 5 secs]
  |          +-- PERM <0.195.0>/ra_system_sup [one_for_all, max 1 restarts in 5 secs]
  |                     +-- PERM ra_store_server_sup_sup/ra_server_sup_sup [simple_one_for_one, max 1 restarts in 5 secs]
  |                     |          +-- TEMP <0.241.0>/ra_server_sup [one_for_one, max 2 restarts in 5 secs]
  |                     |                     +-- TRAN store_ra/ra_server_proc
  |                     +-- PERM ra_store_log_sup/ra_log_sup [one_for_all, max 5 restarts in 5 secs]
  |                     |          +-- PERM <0.205.0>/ra_log_wal_sup [one_for_one, max 1 restarts in 5 secs]
  |                     |          |          +-- PERM ra_store_log_wal/ra_log_wal
  |                     |          +-- PERM ra_store_segment_writer/ra_log_segment_writer
  |                     |          +-- PERM ra_store_log_meta/ra_log_meta
  |                     |          +-- PERM <0.200.0>/ra_log_pre_init
  |                     +-- PERM ra_store_log_ets/ra_log_ets
  +-- PERM ra_file_handle
  +-- PERM ra_metrics_ets
  +-- PERM ra_machine_ets
```

I expected the error to propagate and eventually terminate the application.   What is the intended way to handle these kinds of errors?  My current workaround is to find the temporary supervisor (<0.241.0> above) and monitor it from another process, but this requires peeking into the internal state of ra, which doesn't seem quite right.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of ra machine failures #293

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handling of ra machine failures #293

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions