Skip to content

Bug Retrying Writes#40672

Merged
FabianMeiswinkel merged 5 commits intoAzure:mainfrom
tvaron3:tvaron3/writeRetriesBug
Apr 24, 2025
Merged

Bug Retrying Writes#40672
FabianMeiswinkel merged 5 commits intoAzure:mainfrom
tvaron3:tvaron3/writeRetriesBug

Conversation

@tvaron3
Copy link
Copy Markdown
Member

@tvaron3 tvaron3 commented Apr 23, 2025

Description

Bug

We were still relying on azure core's is_method_retryable to retry AzureError. There are two issues with using this method. This method will behave differently based on customers passed in options. A customer could change it so that creates are retried. Secondly by default, this method retries patch and replace operations. All these retries are after the request has been sent and certain 5xx status codes are returned.

Solution

Instead of handling CosmosHttpResponseErrors in the connection retry policy, they will immediately be raised. I kept the AzureError code, but unsure when this code path will be executed. If there is some unexpected AzureError, we would retry but only if it is a read request.

Other Changes

  • added new fault injection predicate to filter by operation type

@tvaron3 tvaron3 marked this pull request as ready for review April 23, 2025 05:39
Copilot AI review requested due to automatic review settings April 23, 2025 05:39
@tvaron3 tvaron3 requested a review from a team as a code owner April 23, 2025 05:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a bug in the Cosmos DB retry policy where patch and replace operations were being retried unexpectedly and ensures that CosmosHttpResponseError is raised immediately. It also adds tests for both asynchronous and synchronous behaviors, updates fault injection transport to use header constants, and updates the changelog accordingly.

  • Added async and sync tests (test_patch_replace_no_retry) to verify that patch/replace operations do not retry.
  • Modified the retry policies in both async and sync modules to immediately raise CosmosHttpResponseError.
  • Updated fault injection transports to use constant header values and include a new predicate for operation type.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/tests/test_retry_policy_async.py Added async test for patch/replace no retry behavior.
sdk/cosmos/azure-cosmos/tests/test_retry_policy.py Added sync test for patch/replace no retry behavior.
sdk/cosmos/azure-cosmos/tests/test_config.py Updated MockConnectionRetryPolicy signature and error handling logic.
sdk/cosmos/azure-cosmos/tests/_fault_injection_transport_async.py Updated header usage to constants and added fault injection predicate for operation type.
sdk/cosmos/azure-cosmos/tests/_fault_injection_transport.py Updated header usage to constants and added fault injection predicate for operation type.
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_retry_utility_async.py Modified retry logic to immediately raise CosmosHttpResponseError.
sdk/cosmos/azure-cosmos/azure/cosmos/_retry_utility.py Modified retry logic to immediately raise CosmosHttpResponseError.
sdk/cosmos/azure-cosmos/CHANGELOG.md Documented the bug fix for not retrying patch/replace operations.

@azure-sdk
Copy link
Copy Markdown
Collaborator

API change check

API changes are not detected in this pull request.

@tvaron3
Copy link
Copy Markdown
Member Author

tvaron3 commented Apr 23, 2025

/azp run python - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FabianMeiswinkel FabianMeiswinkel merged commit 6f78dc7 into Azure:main Apr 24, 2025
31 checks passed
cRui861 pushed a commit that referenced this pull request May 14, 2025
* No retries for writes

* update changelog

* fix pylint

* fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants