Describe the bug
When using azure.cosmos.aio.CosmosClient to connect to an Azure Cosmos DB account exclusively via a private endpoint (public network access disabled), the SDK intermittently sends requests through the public IP of the Cosmos DB account, resulting in 403 Forbidden errors:
azure.cosmos.exceptions.CosmosHttpResponseError: (Forbidden)
Request originated from IP <public-ip> through public internet.
This is blocked by your Cosmos DB account firewall settings.
Key finding: Setting enable_endpoint_discovery=False does not fully resolve the issue — the 403 errors persist.
Root cause analysis
I traced through the SDK source code (v4.15.0) and identified the endpoint discovery mechanism as a likely contributor:
- The
_GlobalEndpointManager periodically (every 5 min) calls _GetDatabaseAccount() on the gateway.
- The gateway responds with
writableLocations / readableLocations containing public regional endpoint URLs (e.g., https://account-northeurope.documents.azure.com:443/), even when the request arrives via a private endpoint.
LocationCache.get_regional_routing_contexts_by_loc() (_location_cache.py:61-83) stores these public URLs verbatim as RegionalRoutingContext objects.
resolve_service_endpoint() (_location_cache.py:335-375) selects endpoints from this cache, potentially routing requests to the public URL.
Setting enable_endpoint_discovery=False should prevent this, but the 403 errors continue — suggesting there may be additional code paths in the SDK that bypass the EnableEndpointDiscovery flag and still attempt connections to public endpoints.
To Reproduce
- Create a Cosmos DB account with:
- Public network access: Disabled
- A private endpoint configured in a VNet
- Deploy an application in the same VNet (e.g., Azure Container Apps with VNet integration)
- Initialize the async client:
from azure.cosmos.aio import CosmosClient
client = CosmosClient(
"https://myaccount.documents.azure.com:443/",
credential=my_aad_credential,
enable_endpoint_discovery=False, # does NOT prevent the issue
)
- Perform read/write operations over an extended period (hours)
- Intermittently, requests fail with
403 Forbidden — "Request originated from IP through public internet"
DNS verification: Independent DNS monitoring confirms the application's OS-level DNS always resolves the Cosmos FQDN to the correct private IP via the privatelink.documents.azure.com CNAME chain. The public IP in the error message does not match what the OS resolves — it appears the SDK is connecting to a different endpoint entirely.
Expected behavior
When enable_endpoint_discovery=False is set, the SDK should only use the endpoint URL provided at client initialization. No requests should be routed to gateway-discovered regional endpoints, especially public ones.
Even when enable_endpoint_discovery=True (default), if the client was initialized with a private endpoint URL, the SDK should not route requests to public regional endpoints that are unreachable due to firewall rules.
Additional context
- Environment: Azure Container Apps with VNet integration, private endpoint to Cosmos DB
- Authentication: Azure AD (DefaultAzureCredential) — async variant
- Cosmos DB: Single-region account, North Europe, public access disabled
- The error is intermittent — most requests succeed (via private IP), but periodically requests hit the public IP
- The
documentdb-dotnet-sdk/2.14.0 in the error message is from the Cosmos DB server-side, not the Python client
Setup
- OS: Linux (container on Azure Container Apps)
- Python: 3.12
- azure-cosmos: 4.15.0
- azure-identity: latest (async DefaultAzureCredential)
Describe the bug
When using
azure.cosmos.aio.CosmosClientto connect to an Azure Cosmos DB account exclusively via a private endpoint (public network access disabled), the SDK intermittently sends requests through the public IP of the Cosmos DB account, resulting in403 Forbiddenerrors:Key finding: Setting
enable_endpoint_discovery=Falsedoes not fully resolve the issue — the 403 errors persist.Root cause analysis
I traced through the SDK source code (v4.15.0) and identified the endpoint discovery mechanism as a likely contributor:
_GlobalEndpointManagerperiodically (every 5 min) calls_GetDatabaseAccount()on the gateway.writableLocations/readableLocationscontaining public regional endpoint URLs (e.g.,https://account-northeurope.documents.azure.com:443/), even when the request arrives via a private endpoint.LocationCache.get_regional_routing_contexts_by_loc()(_location_cache.py:61-83) stores these public URLs verbatim asRegionalRoutingContextobjects.resolve_service_endpoint()(_location_cache.py:335-375) selects endpoints from this cache, potentially routing requests to the public URL.Setting
enable_endpoint_discovery=Falseshould prevent this, but the 403 errors continue — suggesting there may be additional code paths in the SDK that bypass theEnableEndpointDiscoveryflag and still attempt connections to public endpoints.To Reproduce
403 Forbidden— "Request originated from IP through public internet"DNS verification: Independent DNS monitoring confirms the application's OS-level DNS always resolves the Cosmos FQDN to the correct private IP via the
privatelink.documents.azure.comCNAME chain. The public IP in the error message does not match what the OS resolves — it appears the SDK is connecting to a different endpoint entirely.Expected behavior
When
enable_endpoint_discovery=Falseis set, the SDK should only use the endpoint URL provided at client initialization. No requests should be routed to gateway-discovered regional endpoints, especially public ones.Even when
enable_endpoint_discovery=True(default), if the client was initialized with a private endpoint URL, the SDK should not route requests to public regional endpoints that are unreachable due to firewall rules.Additional context
documentdb-dotnet-sdk/2.14.0in the error message is from the Cosmos DB server-side, not the Python clientSetup