I have noticed that the system ends up in a bad state (?) if the apply callback crashes (erlang:error or erlang:exit).
If apply crashes, the ra_server_proc proccess exits, and is restarted by its supervisor (ra_server_sup), but then it fails again, and the supervisor reaches its max restart, and then its supervisor (ra_server_sup_sup) detects this. However, the child ra_server_sup has restart strategy temporary, so it just ignores this error. Here's an attempt to illustrate the supervision tree:
The system is called store.
ra_sup [one_for_one, max 1 restarts in 5 secs]
+-- PERM ra_systems_sup [one_for_one, max 1 restarts in 5 secs]
| +-- PERM <0.195.0>/ra_system_sup [one_for_all, max 1 restarts in 5 secs]
| +-- PERM ra_store_server_sup_sup/ra_server_sup_sup [simple_one_for_one, max 1 restarts in 5 secs]
| | +-- TEMP <0.241.0>/ra_server_sup [one_for_one, max 2 restarts in 5 secs]
| | +-- TRAN store_ra/ra_server_proc
| +-- PERM ra_store_log_sup/ra_log_sup [one_for_all, max 5 restarts in 5 secs]
| | +-- PERM <0.205.0>/ra_log_wal_sup [one_for_one, max 1 restarts in 5 secs]
| | | +-- PERM ra_store_log_wal/ra_log_wal
| | +-- PERM ra_store_segment_writer/ra_log_segment_writer
| | +-- PERM ra_store_log_meta/ra_log_meta
| | +-- PERM <0.200.0>/ra_log_pre_init
| +-- PERM ra_store_log_ets/ra_log_ets
+-- PERM ra_file_handle
+-- PERM ra_metrics_ets
+-- PERM ra_machine_ets
I expected the error to propagate and eventually terminate the application. What is the intended way to handle these kinds of errors? My current workaround is to find the temporary supervisor (<0.241.0> above) and monitor it from another process, but this requires peeking into the internal state of ra, which doesn't seem quite right.
I have noticed that the system ends up in a bad state (?) if the
applycallback crashes (erlang:errororerlang:exit).If
applycrashes, thera_server_procproccess exits, and is restarted by its supervisor (ra_server_sup), but then it fails again, and the supervisor reaches its max restart, and then its supervisor (ra_server_sup_sup) detects this. However, the childra_server_suphas restart strategytemporary, so it just ignores this error. Here's an attempt to illustrate the supervision tree:The system is called
store.I expected the error to propagate and eventually terminate the application. What is the intended way to handle these kinds of errors? My current workaround is to find the temporary supervisor (<0.241.0> above) and monitor it from another process, but this requires peeking into the internal state of ra, which doesn't seem quite right.