Skip to content

azure-cosmos: SDK routes requests through public IP even with private endpoint and enable_endpoint_discovery=False #46219

@KrzysztofKasprowicz

Description

@KrzysztofKasprowicz

Describe the bug

When using azure.cosmos.aio.CosmosClient to connect to an Azure Cosmos DB account exclusively via a private endpoint (public network access disabled), the SDK intermittently sends requests through the public IP of the Cosmos DB account, resulting in 403 Forbidden errors:

azure.cosmos.exceptions.CosmosHttpResponseError: (Forbidden)
Request originated from IP <public-ip> through public internet.
This is blocked by your Cosmos DB account firewall settings.

Key finding: Setting enable_endpoint_discovery=False does not fully resolve the issue — the 403 errors persist.

Root cause analysis

I traced through the SDK source code (v4.15.0) and identified the endpoint discovery mechanism as a likely contributor:

  1. The _GlobalEndpointManager periodically (every 5 min) calls _GetDatabaseAccount() on the gateway.
  2. The gateway responds with writableLocations / readableLocations containing public regional endpoint URLs (e.g., https://account-northeurope.documents.azure.com:443/), even when the request arrives via a private endpoint.
  3. LocationCache.get_regional_routing_contexts_by_loc() (_location_cache.py:61-83) stores these public URLs verbatim as RegionalRoutingContext objects.
  4. resolve_service_endpoint() (_location_cache.py:335-375) selects endpoints from this cache, potentially routing requests to the public URL.

Setting enable_endpoint_discovery=False should prevent this, but the 403 errors continue — suggesting there may be additional code paths in the SDK that bypass the EnableEndpointDiscovery flag and still attempt connections to public endpoints.

To Reproduce

  1. Create a Cosmos DB account with:
    • Public network access: Disabled
    • A private endpoint configured in a VNet
  2. Deploy an application in the same VNet (e.g., Azure Container Apps with VNet integration)
  3. Initialize the async client:
from azure.cosmos.aio import CosmosClient

client = CosmosClient(
    "https://myaccount.documents.azure.com:443/",
    credential=my_aad_credential,
    enable_endpoint_discovery=False,  # does NOT prevent the issue
)
  1. Perform read/write operations over an extended period (hours)
  2. Intermittently, requests fail with 403 Forbidden — "Request originated from IP through public internet"

DNS verification: Independent DNS monitoring confirms the application's OS-level DNS always resolves the Cosmos FQDN to the correct private IP via the privatelink.documents.azure.com CNAME chain. The public IP in the error message does not match what the OS resolves — it appears the SDK is connecting to a different endpoint entirely.

Expected behavior

When enable_endpoint_discovery=False is set, the SDK should only use the endpoint URL provided at client initialization. No requests should be routed to gateway-discovered regional endpoints, especially public ones.

Even when enable_endpoint_discovery=True (default), if the client was initialized with a private endpoint URL, the SDK should not route requests to public regional endpoints that are unreachable due to firewall rules.

Additional context

  • Environment: Azure Container Apps with VNet integration, private endpoint to Cosmos DB
  • Authentication: Azure AD (DefaultAzureCredential) — async variant
  • Cosmos DB: Single-region account, North Europe, public access disabled
  • The error is intermittent — most requests succeed (via private IP), but periodically requests hit the public IP
  • The documentdb-dotnet-sdk/2.14.0 in the error message is from the Cosmos DB server-side, not the Python client

Setup

  • OS: Linux (container on Azure Container Apps)
  • Python: 3.12
  • azure-cosmos: 4.15.0
  • azure-identity: latest (async DefaultAzureCredential)

Metadata

Metadata

Assignees

Labels

CosmosService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions