Race condition where the HystrixMetricsPublisherThreadPool isn't referencing the correct ThreadPoolExecutor instance#270
Conversation
… ThreadPoolExecutor used by HystrixThreadPool.Factory is not the same instance as the one that is associated with the HystrixMetricsPublisherThreadPool. This causes a disconnect between the ThreadPoolExecutor's metrics and those supplied by HystrixMetricsThreadPool. Now the ThreadPoolExecutor itself is retrieved from the HystrixMetricsThreadPool object, because it is protected behind the pattern of a ConcurrentMap#putIfAbsent. This seemed much more difficult to do with the ConcurrencyStrategy as then any further custom implementations would not have similar concurrency guarantees. A test has been added to show that the ThreadPoolExecutor construct is indeed correct and the same as the one associated with the metrics publisher. It was a bit difficult to construct a test to prove the bad case and good case at the same time. Once the solution was in place, I had to turn the failing test case into a validation test case. In addition this exposed some static state that was kept around between some tests that needed to be cleaned up. In particular the HystrixMetricsPublisherFactory has a singleton object that needed to be reset (even though it's a singleton) in order to validate the test. I'm not particularly happy with this approach, so I'd be happy for any help here.
|
Hystrix-pull-requests #137 SUCCESS |
|
Thank you @chrisgray for finding this and submitting a fix. This is why we're still at Release Candidate status so I'm very appreciative of you spending time to help on this. I know of at least 2 more things I need to look at before 1.4.0 leaves RC. It will be a few more days before I have time in my schedule to thoroughly review this so I'm going to leave it open for now, but will definitely come back to this once I can. |
|
We are using Hystrix and have run into this issue as well. I wrote a test for this which fails when using master but passes as soon as I merge this pull request into my local fork. Thanks @chrisgray |
|
Getting into this again finally ... and cc @mattrjacobs who is helping out. |
|
Thanks for the contribution, @chrisgray. I reviewed and manually merged via #361 |
…ategy because the Netflix/Hystrix#270 was merged into 1.4.0+ version of Hystrix.
Race condition exists when constructing a HystrixThreadPool where the ThreadPoolExecutor used by HystrixThreadPool.Factory is not the same instance as the one that is associated with the HystrixMetricsPublisherThreadPool.
This causes a disconnect between the ThreadPoolExecutor's metrics and those supplied by HystrixMetricsThreadPool.
Now the ThreadPoolExecutor itself is retrieved from the HystrixMetricsThreadPool object, because it is protected behind the pattern of a ConcurrentMap#putIfAbsent. This seemed much more difficult to do with the ConcurrencyStrategy
as then any further custom implementations would not have similar concurrency guarantees.
A test has been added to show that the ThreadPoolExecutor construct is indeed correct and the same as the one associated with the metrics publisher. It was a bit difficult to construct a test to prove the bad case and good case at the same time. Once the solution was in place, I had to turn the failing test case into a validation test case.
In addition this exposed some static state that was kept around between some tests that needed to be cleaned up. In particular the HystrixMetricsPublisherFactory has a singleton object that needed to be reset (even though it's a singleton) in order to validate the test. I'm not particularly happy with this approach, so I'd be happy for any help here.