You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The operator (PID 1 in its container) ran its monitoring loop with no
rescue, so a single transient Redis error in update() -> hgetall (the
kind of network blip that also makes targets reconnect) would unwind
run(), exit the process, and trigger a full container restart.
- operator.rb: wrap the monitoring cycle in a rescue so a failed cycle
is logged and the loop keeps running, recovering on the next cycle.
Only StandardError is caught, so SIGTERM/SystemExit still shut down
cleanly.
- store_autoload.rb / store_implementation.py: configure the Redis/Valkey
client with equal-jitter reconnect backoff (cap=5s, base=0.625, 3
retries) so transient blips are absorbed inside the client and many
clients don't reconnect in lockstep. Ruby samples the jittered delay
array per connection since redis-rb takes a fixed delay array rather
than a per-failure backoff callable.
- Add specs covering operator loop resilience and the backoff config in
both Ruby and Python.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0 commit comments