Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR implements incremental change feed processing for the Partition Key Range (PKR) cache in Azure Cosmos DB Python SDK. Instead of discarding and fully refreshing the entire cache on partition splits, the SDK now uses ETag-based change tracking to fetch only the changes since the last refresh, significantly reducing latency and network overhead.
Key Changes:
- Adds incremental change feed support with ETag tracking to routing map provider
- Implements smart merge logic with defensive guards (missing parent detection, coverage validation, automatic fallback)
- Adds comprehensive test coverage for split scenarios, fallback behavior, and edge cases
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/azure/cosmos/_routing/routing_map_provider.py | Refactored to use change feed with incremental updates, added fallback logic for incomplete merges |
| sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py | Async version of routing_map_provider with same incremental change feed logic |
| sdk/cosmos/azure-cosmos/azure/cosmos/_routing/collection_routing_map.py | Added ETag property and try_combine method for incremental merge |
| sdk/cosmos/azure-cosmos/azure/cosmos/_gone_retry_policy.py | Updated to support collection-scoped incremental refresh |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_gone_retry_policy_async.py | New async retry policy for handling partition splits with async refresh |
| sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_client_connection.py | Updated refresh_routing_map_provider to support targeted incremental refresh |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client_connection_async.py | Async version with same incremental refresh support |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_retry_utility_async.py | Integrated async retry policy with proper await handling |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py | Added containerRID to feed options |
| sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/multi_execution_aggregator.py | Added partition split handling during iteration |
| sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/multi_execution_aggregator.py | Async version with split handling and priority queue reset |
| sdk/cosmos/azure-cosmos/tests/test_routing_map.py | Updated assertion logic to handle extra change feed metadata |
| sdk/cosmos/azure-cosmos/tests/test_partition_split_query_async.py | Added 5 comprehensive tests for incremental merge scenarios |
| sdk/cosmos/azure-cosmos/tests/test_partition_split_query.py | Added 5 comprehensive tests (sync versions) |
| sdk/cosmos/azure-cosmos/tests/routing/test_routing_map_provider.py | Updated mock to accept **kwargs parameter |
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Hi @dibahlfi. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days. |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
* fix: initial commit * fix: adding tests for the sync version * fix: adding tests for the async version * fix: fixing copilot suggestions * fix: fixing tests * fix: addressing comments * fix: addressing pylint comments * fix: addressing pylint comments * fix: addressing pylint comments * fix: addressing pylint comments * fixing passing containerRid header bug * adding new tests * adding new tests * fixing pylint errors * adding new tests * fixing tests * fixing pytest * fixing tests * refactoring * refactoring and fixing bugs * fixing bug * addressing copilot comments * addressing copilot comments * addressing copilot comments * addressing comments * refacotring and fixing comments * updated chnage log * resolving comments and refactoring code * resolving comments and refactoring code * fixing pylint errors * fixing pylint errors * fixing tests * fixing tests
This PR implements incremental change feed processing for the Partition Key Range (PKR) cache, significantly reducing latency and network overhead when handling partition splits in Azure Cosmos DB.
Problem Statement
Previously, when a partition split occurred, the SDK would:
Solution: Incremental Change Feed
The routing map provider now supports incremental updates using Cosmos DB's change feed mechanism:
Each CollectionRoutingMap now maintains a _change_feed_etag
When refreshing, the SDK passes this etag to get only changes since last refresh
Reduces payload size from full partition list to just split/merged ranges
Takes incremental partition range updates (children + deleted parents)
Validates complete coverage of the partition key space
Merges new ranges with existing cache in-place
Returns None if merge would create gaps (triggers fallback)
Missing Parent Detection: Validates all parent IDs exist before removing them
Coverage Validation: Ensures merged ranges span the full keyspace ('' -> 'FF')
Automatic Fallback: Reverts to full refresh if incremental merge fails