Skip to content

Revert Dual Endpoint Tracking#40451

Merged
tvaron3 merged 22 commits intoAzure:mainfrom
tvaron3:tvaron3/revertDualEndpoint
Sep 29, 2025
Merged

Revert Dual Endpoint Tracking#40451
tvaron3 merged 22 commits intoAzure:mainfrom
tvaron3:tvaron3/revertDualEndpoint

Conversation

@tvaron3
Copy link
Copy Markdown
Member

@tvaron3 tvaron3 commented Apr 9, 2025

Background

The SDK used to track two endpoints for writes. One was the global endpoint, https://account.documents.azure.com, and the other was the regional endpointhttps://account-eastus.documents.azure.com. Gateway would return these randomly for the write region and the sdk would use that one to write and use the other endpoint as the backup. This was done for write availability in single write region accounts and for load balancing across compute federations.

Changes

The dual endpoint logic is being abstracted away using a service called Azure Traffic Manager https://learn.microsoft.com/en-us/azure/traffic-manager/traffic-manager-overview. Gateway will now only send the regional endpoint to the sdk and the global endpoint should only be used for metadata calls in the sdk. This pr removes all the dual endpoint logic from the sdk and tests.

Other Changes

Simplified health check logic as no alternate endpoints to check anymore.

flowchart TD
    A[Request] --> B(fails due to connection issues)
    B --> C{ has there been 3 retries for this endpoint? }
    C --> K(Mark endpoint as unavailable)
    K -->|Yes| I{Is there another region/endpoint available?} 
    I --> |No| J(Bubble up failure to customer)
    I --> |Yes| D[read or write request?]
    C -->|No| E(Retries in same region)
    E --> A
    D -->|Read| F(Retry on another region)
    D -->|Write| G{multi-write account?}
    G --> |Yes| F
    G --> |No| H(Retry on alternate endpoint)
    H --> A
    F --> A
Loading

@azure-sdk
Copy link
Copy Markdown
Collaborator

API change check

API changes are not detected in this pull request.

@tvaron3 tvaron3 marked this pull request as ready for review April 10, 2025 17:34
Copilot AI review requested due to automatic review settings April 10, 2025 17:34
@tvaron3 tvaron3 requested review from a team and annatisch as code owners April 10, 2025 17:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (3)

sdk/cosmos/azure-cosmos/tests/test_service_retry_policies_async.py:386

  • Verify that the expected number of request endpoints (4) now reflects the intended in-region retry behavior after the dual endpoint removal and that no necessary retries are inadvertently omitted.
assert len(connection_retry_policy.request_endpoints) == 4

sdk/cosmos/azure-cosmos/tests/test_fault_injection_transport.py:349

  • Confirm that enabling multiple write locations in this test fully exercises the new code path and does not impact tests expecting single write location behavior when not explicitly enabled.
use_multiple_write_locations=True,

sdk/cosmos/azure-cosmos/azure/cosmos/_location_cache.py:59

  • Ensure that downstream components correctly handle regional routing contexts that now contain only a primary endpoint, without relying on any alternate endpoint logic.
def get_regional_routing_contexts_by_loc(new_locations):

Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @tvaron3

Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_global_endpoint_manager.py
Comment thread sdk/cosmos/azure-cosmos/tests/test_service_retry_policies.py
Copy link
Copy Markdown
Member

@simorenoh simorenoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small comment but looks good otherwise - thanks Tomas!

Comment thread sdk/cosmos/azure-cosmos/tests/test_fault_injection_transport.py Outdated
tvaron3 and others added 4 commits May 8, 2025 23:22
…into tvaron3/revertDualEndpoint

# Conflicts:
#	sdk/cosmos/azure-cosmos/azure/cosmos/_global_endpoint_manager.py
#	sdk/cosmos/azure-cosmos/azure/cosmos/_location_cache.py
#	sdk/cosmos/azure-cosmos/azure/cosmos/_request_object.py
#	sdk/cosmos/azure-cosmos/azure/cosmos/aio/_global_endpoint_manager_async.py
#	sdk/cosmos/azure-cosmos/tests/test_fault_injection_transport.py
#	sdk/cosmos/azure-cosmos/tests/test_fault_injection_transport_async.py
#	sdk/cosmos/azure-cosmos/tests/test_health_check.py
#	sdk/cosmos/azure-cosmos/tests/test_health_check_async.py
@github-actions
Copy link
Copy Markdown

Hi @tvaron3. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

@github-actions github-actions Bot added the no-recent-activity There has been no recent activity on this issue. label Aug 15, 2025
@tvaron3
Copy link
Copy Markdown
Member Author

tvaron3 commented Aug 15, 2025

Still working on this

@github-actions github-actions Bot removed the no-recent-activity There has been no recent activity on this issue. label Aug 15, 2025
@tvaron3
Copy link
Copy Markdown
Member Author

tvaron3 commented Sep 23, 2025

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Member

@simorenoh simorenoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method def get_all_write_endpoints(self) -> Set[str]: still makes a reference to the alternate endpoint

@tvaron3
Copy link
Copy Markdown
Member Author

tvaron3 commented Sep 25, 2025

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3
Copy link
Copy Markdown
Member Author

tvaron3 commented Sep 26, 2025

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@tvaron3
Copy link
Copy Markdown
Member Author

tvaron3 commented Sep 26, 2025

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread sdk/cosmos/azure-cosmos/azure/cosmos/_location_cache.py Outdated
Comment thread sdk/cosmos/azure-cosmos/docs/ErrorCodesAndRetries.md
Co-authored-by: Allen Kim <allenkim0129@gmail.com>
Copy link
Copy Markdown
Contributor

@allenkim0129 allenkim0129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tvaron3 tvaron3 enabled auto-merge (squash) September 29, 2025 19:26
@tvaron3 tvaron3 merged commit 279a3ce into Azure:main Sep 29, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants