Skip to content

Race condition where the HystrixMetricsPublisherThreadPool isn't referencing the correct ThreadPoolExecutor instance#270

Closed
chrisgray wants to merge 1 commit intoNetflix:masterfrom
chrisgray:bug/hystrixThreadPoolMetrics-concurrency
Closed

Race condition where the HystrixMetricsPublisherThreadPool isn't referencing the correct ThreadPoolExecutor instance#270
chrisgray wants to merge 1 commit intoNetflix:masterfrom
chrisgray:bug/hystrixThreadPoolMetrics-concurrency

Conversation

@chrisgray
Copy link
Copy Markdown
Contributor

Race condition exists when constructing a HystrixThreadPool where the ThreadPoolExecutor used by HystrixThreadPool.Factory is not the same instance as the one that is associated with the HystrixMetricsPublisherThreadPool.

This causes a disconnect between the ThreadPoolExecutor's metrics and those supplied by HystrixMetricsThreadPool.

Now the ThreadPoolExecutor itself is retrieved from the HystrixMetricsThreadPool object, because it is protected behind the pattern of a ConcurrentMap#putIfAbsent. This seemed much more difficult to do with the ConcurrencyStrategy
as then any further custom implementations would not have similar concurrency guarantees.

A test has been added to show that the ThreadPoolExecutor construct is indeed correct and the same as the one associated with the metrics publisher. It was a bit difficult to construct a test to prove the bad case and good case at the same time. Once the solution was in place, I had to turn the failing test case into a validation test case.

In addition this exposed some static state that was kept around between some tests that needed to be cleaned up. In particular the HystrixMetricsPublisherFactory has a singleton object that needed to be reset (even though it's a singleton) in order to validate the test. I'm not particularly happy with this approach, so I'd be happy for any help here.

… ThreadPoolExecutor used by HystrixThreadPool.Factory is not the same instance as the one that is associated with the HystrixMetricsPublisherThreadPool.

This causes a disconnect between the ThreadPoolExecutor's metrics and those supplied by HystrixMetricsThreadPool.

Now the ThreadPoolExecutor itself is retrieved from the HystrixMetricsThreadPool object, because it is protected behind the pattern of a ConcurrentMap#putIfAbsent. This seemed much more difficult to do with the ConcurrencyStrategy
as then any further custom implementations would not have similar concurrency guarantees.

A test has been added to show that the ThreadPoolExecutor construct is indeed correct and the same as the one associated with the metrics publisher. It was a bit difficult to construct a test to prove the bad case and good case at the same time. Once the solution was in place, I had to turn the failing test case into a validation test case.

In addition this exposed some static state that was kept around between some tests that needed to be cleaned up. In particular the HystrixMetricsPublisherFactory has a singleton object that needed to be reset (even though it's a singleton) in order to validate the test. I'm not particularly happy with this approach, so I'd be happy for any help here.
@cloudbees-pull-request-builder
Copy link
Copy Markdown

Hystrix-pull-requests #137 SUCCESS
This pull request looks good

@benjchristensen
Copy link
Copy Markdown
Contributor

Thank you @chrisgray for finding this and submitting a fix. This is why we're still at Release Candidate status so I'm very appreciative of you spending time to help on this.

I know of at least 2 more things I need to look at before 1.4.0 leaves RC.

It will be a few more days before I have time in my schedule to thoroughly review this so I'm going to leave it open for now, but will definitely come back to this once I can.

@RyanAD
Copy link
Copy Markdown

RyanAD commented Jul 11, 2014

We are using Hystrix and have run into this issue as well. I wrote a test for this which fails when using master but passes as soon as I merge this pull request into my local fork. Thanks @chrisgray

@benjchristensen
Copy link
Copy Markdown
Contributor

Getting into this again finally ... and cc @mattrjacobs who is helping out.

@mattrjacobs
Copy link
Copy Markdown
Contributor

Thanks for the contribution, @chrisgray. I reviewed and manually merged via #361

chrisgray pushed a commit to yammer/tenacity that referenced this pull request Apr 8, 2015
…ategy because the Netflix/Hystrix#270 was merged into 1.4.0+ version of Hystrix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants