client: add lock in tso.Request to avoid data race#10130
client: add lock in tso.Request to avoid data race#10130okJiang wants to merge 6 commits intotikv:masterfrom
Conversation
Signed-off-by: okjiang <819421878@qq.com>
Signed-off-by: okjiang <819421878@qq.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| func (c *Cli) GetTSORequest(ctx context.Context) *Request { | ||
| req := c.tsoReqPool.Get().(*Request) | ||
| // Set needed fields in the request before using it. | ||
| req.mu.Lock() |
There was a problem hiding this comment.
Will it cause performance regression?
There was a problem hiding this comment.
Let's wait a test result. I will do it.
There was a problem hiding this comment.
Almost -2% performance regression. Can we accept it? cc @JmPotato
./bin/pd-tso-bench -client 1 -duration 10m -c 100
Master
Total:
count: 174720411, max: 10.5724ms, min: 0.0911ms, avg: 0.3338ms
<1ms: 174699770, >1ms: 19262, >2ms: 579, >5ms: 700, >10ms: 100, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 174720411, <1ms: 99.99%, >1ms: 0.01%, >2ms: 0.00%, >5ms: 0.00%, >10ms: 0.00%, >30ms: 0.00%, >50ms: 0.00%, >100ms: 0.00%, >200ms:
0.00%, >400ms: 0.00%, >800ms: 0.00%, >1s: 0.00%
P0.5: 0.3348ms, P0.8: 0.3789ms, P0.9: 0.4039ms, P0.99: 0.4830ms
Pr
Total:
count: 171009435, max: 10.3831ms, min: 0.0940ms, avg: 0.3415ms
<1ms: 170987346, >1ms: 20519, >2ms: 871, >5ms: 599, >10ms: 100, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 171009435, <1ms: 99.99%, >1ms: 0.01%, >2ms: 0.00%, >5ms: 0.00%, >10ms: 0.00%, >30ms: 0.00%, >50ms: 0.00%, >100ms: 0.00%, >200ms:
0.00%, >400ms: 0.00%, >800ms: 0.00%, >1s: 0.00%
P0.5: 0.3420ms, P0.8: 0.3875ms, P0.9: 0.4135ms, P0.99: 0.4959ms
There was a problem hiding this comment.
One more test after applying this comment #10130 (comment), the conclusion remains unchanged
Total:
count: 170459945, max: 10.4290ms, min: 0.0907ms, avg: 0.3432ms
<1ms: 170427514, >1ms: 28694, >2ms: 2732, >5ms: 953, >10ms: 52, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 170459945, <1ms: 99.98%, >1ms: 0.02%, >2ms: 0.00%, >5ms: 0.00%, >10ms: 0.00%, >30ms: 0.00%, >50ms: 0.00%, >100ms: 0.00%, >200ms:
0.00%, >400ms: 0.00%, >800ms: 0.00%, >1s: 0.00%
P0.5: 0.3429ms, P0.8: 0.3899ms, P0.9: 0.4173ms, P0.99: 0.5068ms
client/clients/tso/request.go
Outdated
| req.mu.Lock() | ||
| physical, logical = req.physical, req.logical | ||
| req.mu.Unlock() |
Signed-off-by: okjiang <819421878@qq.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10130 +/- ##
==========================================
+ Coverage 78.38% 78.96% +0.58%
==========================================
Files 518 532 +14
Lines 69476 71882 +2406
==========================================
+ Hits 54456 56759 +2303
- Misses 11066 11098 +32
- Partials 3954 4025 +71
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
JmPotato
left a comment
There was a problem hiding this comment.
I don’t understand how the race in the original issue happened—why would a request that already has a result still be written during the doneCollectedRequests processing? In theory, a request should only be returned to the request pool for reuse after it has been completely used.
|
|
Any update? We found this data race in ticdc integration test again. @okJiang https://prow.tidb.net/jenkins/blue/organizations/jenkins/pingcap%2Fticdc%2Fpull_cdc_mysql_integration_heavy/detail/pull_cdc_mysql_integration_heavy/578/pipeline |
|
@JmPotato ptal again |
client/clients/tso/request.go
Outdated
There was a problem hiding this comment.
req.requestCtx is read here without acquiring the lock.
client/clients/tso/request.go
Outdated
| func (req *Request) TryDone(err error) { | ||
| req.mu.RLock() | ||
| defer req.mu.RUnlock() | ||
| select { | ||
| case req.done <- err: | ||
| default: | ||
| } | ||
| } |
There was a problem hiding this comment.
This lock seems unnecessary, because once done is created it won’t be modified, and the channel itself is already thread-safe.
| @@ -174,13 +174,15 @@ func (c *Cli) scheduleUpdateTSOConnectionCtxs() { | |||
| func (c *Cli) GetTSORequest(ctx context.Context) *Request { | |||
There was a problem hiding this comment.
Here the physical, logical, streamID, and other fields are reset, but the done channel is not reset. A req taken from the pool may have leftover data in its done channel. However, this isn’t an issue introduced by this PR, but it’s worth keeping an eye on.
Signed-off-by: okjiang <819421878@qq.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds a mutex to TSO Request and applies locking around request initialization, finalization, and waiting to synchronize concurrent access to fields (start, requestCtx, clientCtx, physical, logical, streamID) between client, dispatcher, and stream goroutines. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/retest |
1 similar comment
|
/retest |
Signed-off-by: okjiang <819421878@qq.com>
|
@okJiang: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: Close #10124
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
pingcap/docs/pingcap/docs-cn:pingcap/tiup:Release note
Summary by CodeRabbit