Add timeout and retry to framework version of provider#151
Conversation
detro
left a comment
There was a problem hiding this comment.
LGTM, and I left a couple of comments/reflections
- Should we also support setting delay between retries?
- I'm conflicted around the error-code-driven retry: the default behaviour of
retryinghttp, that involves checking 5xx, has strong merits, grounded in practicality.
| retryClient.CheckRetry = func(ctx context.Context, resp *http.Response, err error) (bool, error) { | ||
| if err == nil { | ||
| return false, nil | ||
| } | ||
|
|
||
| return true, fmt.Errorf("retrying as request generated error: %w", err) | ||
| } |
There was a problem hiding this comment.
To me this is a sane approach.
Given we made a choice of letting user deal with error codes themselves, it feels like an actual client error is the only case when retrying make sense.
There was a problem hiding this comment.
I have taken a second to look at how the retryablehttp client does it, and they have this:
// Check the response code. We retry on 500-range responses to allow
// the server time to recover, as 500's are typically not permanent
// errors and may relate to outages on the server side. This will catch
// invalid response codes as well, like 0 and 999.
if resp.StatusCode == 0 || (resp.StatusCode >= 500 && resp.StatusCode != http.StatusNotImplemented) {
return true, fmt.Errorf("unexpected HTTP status %s", resp.Status)
}While I agree that it feels rather odd to ignore error codes, while also checking them, here they are making a fair point: most of 5xx errors are usually temporary issues on the server side of things. Yes, post-conditions will allow TF execution to stop in case of error, but for this class of issues (temporary server-side issues), it would end up having to fail the overall TF execution, and then the user has to find a way to retry.
I'm frankly on the fence about this. Both side of this argument have merits.
There was a problem hiding this comment.
Yep. Agreed. I'm on the fence too.
|
|
||
| "retry": { | ||
| Description: "Retry request configuration.", | ||
| Attributes: tfsdk.SingleNestedAttributes(map[string]tfsdk.Attribute{ |
There was a problem hiding this comment.
One thing that sometimes helps is to introduce a delay between attempts - maybe an exponential-back-off?. Similarly proposed here: #87
There was a problem hiding this comment.
The retryablehttp client has an exponential backoff by default and a default minimum wait between retries of 1 sec.
We could expose the RetryWaitMin (and RetryWaitMax)? One of the reasons for using a SingleNestedAttribute for retry configuration was because of the possibility of needing to expose additional configuration.
There was a problem hiding this comment.
Even better if it's easy to expose and it's nicely "tucked away" in the nested object
c7cebe4 to
e974f5a
Compare
| "retry": { | ||
| Description: "Retry request configuration.", | ||
| Attributes: tfsdk.SingleNestedAttributes(map[string]tfsdk.Attribute{ |
There was a problem hiding this comment.
Since we're still on protocol version 5, this will need to be migrated to Blocks for now -- Nesting: tfsdk.BlockNestingModeList with MaxItems: 1 would be the closest we can get there. 🙁 (It could technically be an Object attribute, but then it'd lose all the "nice" things the framework can provide for the nested attributes).
|
|
||
| client := &http.Client{} | ||
| retryClient := retryablehttp.NewClient() | ||
| retryClient.Logger = levelledLogger{ctx} |
There was a problem hiding this comment.
Aside: This leaves us with an interesting inflection point with terraform-plugin-log. We recently updated the http.Transport in sdk/v2 helper/logging for terraform-plugin-log support: https://www.terraform.io/plugin/sdkv2/logging/http-transport
We thought that it might be the case that framework providers could use something similar and that the answer there might be copying that http.Transport implementation into a new terraform-plugin-log package, since its extending Go standard library functionality and tightly coupled with terraform-plugin-log functionality.
That said, the logging implementation of go-retryablehttp partially overlaps/conflicts with some of that http.Transport implementation if we were to try an also use it here. For example, retryablehttp saves some information in differently named fields and it would always create duplicate log entries for sending/receiving requests unless we overwrite RequestLogHook/ResponseLogHook. The terraform-plugin-log http.Transport implementation also creates a transaction ID for correlating entries easier.
I'm not sure if its worth looking into this further, but wanted to raise it in case it seems like an area we should explore more. I think go-retryablehttp is a really good choice in general since it bakes in a lot of goodies, just not sure about the logging side of things.
There was a problem hiding this comment.
Created hashicorp/terraform-plugin-log#91 to discuss some ideas.
29da528 to
6f95f1f
Compare
|
|
||
| ENHANCEMENTS: | ||
|
|
||
| * data-source/http: `retry` with nested `attempts` has been added ([#151](https://github.com/hashicorp/terraform-provider-http/pull/151). |
There was a problem hiding this comment.
Needs updating to incorporate min_delay and max_delay.
| }, | ||
|
|
||
| "request_timeout": schema.Int64Attribute{ | ||
| Description: "The request timeout in milliseconds.", |
There was a problem hiding this comment.
We could use a custom time.Duration type here.
| }, | ||
| }, | ||
| "min_delay": schema.Int64Attribute{ | ||
| Description: "The minimum delay between retry requests in milliseconds.", |
There was a problem hiding this comment.
We could use a custom time.Duration type here.
| }, | ||
| }, | ||
| "max_delay": schema.Int64Attribute{ | ||
| Description: "The maximum delay between retry requests in milliseconds.", |
There was a problem hiding this comment.
We could use a custom time.Duration type here.
1c016ee to
3ff90d2
Compare
Co-authored-by: Vincent Chenal <vincent.chenal@protonmail.com>
3ff90d2 to
4d22215
Compare
4d22215 to
ccc248e
Compare
|
Any chance someone can review and merge this if it's ready? This feature would be really useful for a terraform project I am currently working on. |
|
Hi @F21 👋 We are currently reviewing this PR and should have an update regarding when it might be merged and released shortly. |
|
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
Closes: #149
One consequence of adding an option to retry is that it raises the question of when a retry should be initiated.
The implementation in this PR will only retry if an error is generated when a request is made.
An option might be to add an additional attribute to
retryto allow specifying which status codes, for instance, should also trigger a retry, something like:However, adding an attribute to allow defining which response status codes should trigger a retry seems slightly odd given that status code checking is removed in #142