Skip to content

incremental change feed processing#44080

Merged
simorenoh merged 36 commits intomainfrom
users/dibahl/pkrange-cache-optimization
Apr 3, 2026
Merged

incremental change feed processing#44080
simorenoh merged 36 commits intomainfrom
users/dibahl/pkrange-cache-optimization

Conversation

@dibahlfi
Copy link
Copy Markdown
Member

This PR implements incremental change feed processing for the Partition Key Range (PKR) cache, significantly reducing latency and network overhead when handling partition splits in Azure Cosmos DB.

Problem Statement
Previously, when a partition split occurred, the SDK would:

  1. Detect the split via a 410 Gone error
  2. Discard the entire PKR cache for the affected collection
  3. Perform a full refresh by re-fetching all partition ranges from scratch
  4. This was inefficient, especially for collections with many partitions

Solution: Incremental Change Feed
The routing map provider now supports incremental updates using Cosmos DB's change feed mechanism:

  1. ETag-Based Change Tracking
    Each CollectionRoutingMap now maintains a _change_feed_etag
    When refreshing, the SDK passes this etag to get only changes since last refresh
    Reduces payload size from full partition list to just split/merged ranges
  2. Smart Merge Logic (try_combine method)
    Takes incremental partition range updates (children + deleted parents)
    Validates complete coverage of the partition key space
    Merges new ranges with existing cache in-place
    Returns None if merge would create gaps (triggers fallback)
  3. Defensive Guards
    Missing Parent Detection: Validates all parent IDs exist before removing them
    Coverage Validation: Ensures merged ranges span the full keyspace ('' -> 'FF')
    Automatic Fallback: Reverts to full refresh if incremental merge fails

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements incremental change feed processing for the Partition Key Range (PKR) cache in Azure Cosmos DB Python SDK. Instead of discarding and fully refreshing the entire cache on partition splits, the SDK now uses ETag-based change tracking to fetch only the changes since the last refresh, significantly reducing latency and network overhead.

Key Changes:

  • Adds incremental change feed support with ETag tracking to routing map provider
  • Implements smart merge logic with defensive guards (missing parent detection, coverage validation, automatic fallback)
  • Adds comprehensive test coverage for split scenarios, fallback behavior, and edge cases

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/azure/cosmos/_routing/routing_map_provider.py Refactored to use change feed with incremental updates, added fallback logic for incomplete merges
sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Async version of routing_map_provider with same incremental change feed logic
sdk/cosmos/azure-cosmos/azure/cosmos/_routing/collection_routing_map.py Added ETag property and try_combine method for incremental merge
sdk/cosmos/azure-cosmos/azure/cosmos/_gone_retry_policy.py Updated to support collection-scoped incremental refresh
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_gone_retry_policy_async.py New async retry policy for handling partition splits with async refresh
sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_client_connection.py Updated refresh_routing_map_provider to support targeted incremental refresh
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client_connection_async.py Async version with same incremental refresh support
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_retry_utility_async.py Integrated async retry policy with proper await handling
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py Added containerRID to feed options
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/multi_execution_aggregator.py Added partition split handling during iteration
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/multi_execution_aggregator.py Async version with split handling and priority queue reset
sdk/cosmos/azure-cosmos/tests/test_routing_map.py Updated assertion logic to handle extra change feed metadata
sdk/cosmos/azure-cosmos/tests/test_partition_split_query_async.py Added 5 comprehensive tests for incremental merge scenarios
sdk/cosmos/azure-cosmos/tests/test_partition_split_query.py Added 5 comprehensive tests (sync versions)
sdk/cosmos/azure-cosmos/tests/routing/test_routing_map_provider.py Updated mock to accept **kwargs parameter

Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/tests/test_partition_split_query.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_client_connection.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client_connection_async.py Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 17, 2025

API Change Check

APIView identified API level changes in this PR and created the following API reviews

azure-cosmos

Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/aio/_gone_retry_policy_async.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/aio/_gone_retry_policy_async.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/aio/_retry_utility_async.py Outdated
Comment thread sdk/cosmos/azure-cosmos/tests/test_partition_split_query.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/collection_routing_map.py Outdated
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_routing/aio/routing_map_provider.py Outdated
@dibahlfi
Copy link
Copy Markdown
Member Author

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@github-actions
Copy link
Copy Markdown

Hi @dibahlfi. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

@github-actions github-actions Bot added the no-recent-activity There has been no recent activity on this issue. label Jan 30, 2026
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_gone_retry_policy_base.py
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/aio/_retry_utility_async.py
Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_gone_retry_policy_base.py
@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented Apr 2, 2026

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented Apr 2, 2026

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented Apr 2, 2026

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented Apr 3, 2026

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@dibahlfi
Copy link
Copy Markdown
Member Author

dibahlfi commented Apr 3, 2026

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Member

@aayush3011 aayush3011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@simorenoh simorenoh merged commit 7f83c8e into main Apr 3, 2026
38 checks passed
@simorenoh simorenoh deleted the users/dibahl/pkrange-cache-optimization branch April 3, 2026 20:18
fafhrd91 pushed a commit to fafhrd91/azure-sdk-for-python that referenced this pull request Apr 28, 2026
* fix: initial commit

* fix: adding tests for the sync version

* fix: adding tests for the async version

* fix: fixing copilot suggestions

* fix: fixing tests

* fix: addressing comments

* fix: addressing pylint comments

* fix: addressing pylint comments

* fix: addressing pylint comments

* fix: addressing pylint comments

* fixing passing containerRid header bug

* adding new tests

* adding new tests

* fixing pylint errors

* adding new tests

* fixing tests

* fixing pytest

* fixing tests

* refactoring

* refactoring and fixing bugs

* fixing bug

* addressing copilot comments

* addressing copilot comments

* addressing copilot comments

* addressing comments

* refacotring and fixing comments

* updated chnage log

* resolving comments and refactoring code

* resolving comments and refactoring code

* fixing pylint errors

* fixing pylint errors

* fixing tests

* fixing tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants