Refactored health check logic for MultiDBClient#3994
Conversation
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
Pull request overview
This PR refactors MultiDBClient health checking to support per-health-check configuration (probes/delay/timeout), adds a configurable health-check timeout, and updates execution to run health checks concurrently, with updated tests and docs to match the new behavior.
Changes:
- Move probes/delay/timeout configuration onto individual
HealthCheckimplementations (viaAbstractHealthCheck), and wire config defaults intoPingHealthCheck. - Add
DEFAULT_HEALTH_CHECK_TIMEOUT/health_check_timeoutand enforce timeouts during health check execution. - Update sync + asyncio clients/tests/docs to reflect concurrent health check execution and new configuration surface.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
redis/multidb/healthcheck.py |
Refactors health check interfaces/policies, adds per-check timeout and concurrent execution (threads). |
redis/multidb/config.py |
Adds health_check_timeout to config and passes health-check params into default PingHealthCheck. |
redis/multidb/client.py |
Updates policy instantiation and adjusts background health-check round execution handling. |
redis/asyncio/multidb/healthcheck.py |
Async equivalent of per-check config + timeout and concurrent execution (tasks). |
redis/asyncio/multidb/config.py |
Adds health_check_timeout and passes params into default async PingHealthCheck. |
redis/asyncio/multidb/client.py |
Updates policy instantiation and changes background health-check gathering behavior. |
tests/test_multidb/conftest.py |
Updates health-check fixture to include probes/delay/timeout properties. |
tests/test_multidb/test_healthcheck.py |
Updates unit tests for new policy constructors, parallel behavior, and timeout cases. |
tests/test_multidb/test_client.py |
Adjusts expectations around health-check call counts and timing to reflect concurrency. |
tests/test_multidb/test_pipeline.py |
Updates pipeline tests for concurrent/background health-check behavior. |
tests/test_asyncio/test_multidb/conftest.py |
Async fixture updates for probes/delay/timeout properties. |
tests/test_asyncio/test_multidb/test_healthcheck.py |
Async unit tests updated for new constructors/parallelism/timeout behavior. |
tests/test_asyncio/test_multidb/test_client.py |
Async client tests updated for concurrent/background health-check behavior and timing. |
tests/test_asyncio/test_multidb/test_pipeline.py |
Async pipeline tests updated to match concurrent background health checks. |
docs/geographic_failover.rst |
Documents new health_check_timeout and updated custom health check example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…dis-py into vv-multidb-healthcheck-refactor
…dis-py into vv-multidb-healthcheck-refactor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
* Refactored sync health check * Refactored async multidb health check * Removed double exception handling * Update docs/geographic_failover.rst Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/test_asyncio/test_multidb/test_healthcheck.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/test_multidb/test_healthcheck.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Refactored health checks to be completion-based * Codestlye fixes * Fixed argument name * Fixed redundant timeout * Refactored health check to be async-only * Refactored event loop execution in sync client * Fixed proper handling of failures and updated docs * Fixed correct initial health check execution * Fixed event loop issue * Fixed flacky test * Fixed coroutine execution * Fixed flacky test * Fixed correct connection cleanup * Fixed 3.12 compatibility issue * Fixed race condition * Updated exception handling * Fix avoiding indefinite hangs * Refactor health checks to use clients instead of pools * Removed unused if statement * Updated to broaded exception * Fixed passing not supported arguments * Updated docs, added new properties to health check cluster client --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: petyaslavova <petya.slavova@redis.com>

Description of change
Refactoring this PR brings:
TimeoutErrorto the end user, but it makes more sense to continue operating and mark the specific database as unhealthy instead.
Pull Request check-list
Please make sure to review and check all of these items:
Note
Medium Risk
Refactors core MultiDBClient health-check execution (sync + asyncio), including new timeout/concurrency behavior and dedicated connection management; regressions could affect failover/circuit state transitions under load. Changes are well-covered by updated/added tests but touch critical availability logic.
Overview
MultiDBClient health checks are reworked to be async-only across both sync and asyncio clients, with per-health-check configuration (
health_check_probes,health_check_delay, newhealth_check_timeout) moved ontoAbstractHealthCheckand propagated via config defaults.Health checks now run concurrently (across checks and databases) with per-check
asyncio.wait_fortimeouts; timeouts/exceptions are treated as an unhealthy database (UnhealthyDatabaseException) rather than bubbling interval timeouts to callers. Policies now manage dedicated cached health-check clients per database and exposeclose(); both clients ensure these pools are closed (MultiDBClient.close()/ async clientaclose()), and the sync client uses an enhancedBackgroundSchedulerwith a shared background event loop to run async health checks safely.Docs and tests are updated to reflect the new async custom health-check interface/signature, new pytest marker (
no_mock_connections), and expanded coverage for timeout behavior, concurrency, and client lifecycle.Written by Cursor Bugbot for commit acde146. This will update automatically on new commits. Configure here.