schedule/labeler: Optimize locking to fix high-concurrency goroutine surge#9857
schedule/labeler: Optimize locking to fix high-concurrency goroutine surge#9857ti-chi-bot[bot] merged 2 commits intotikv:masterfrom
Conversation
…surge Signed-off-by: lhy1024 <admin@liudos.us>
| } | ||
|
|
||
| // only Lock for in-memory update | ||
| l.Lock() |
There was a problem hiding this comment.
In my local experiments, this lock can still experience heavy contention because the range list is built while holding the write lock.
One possible improvement is:
- Make a copy of the rules under a read lock.
- Build the range list without holding any lock.
- Use write lock to set the new range list in RegionLabeler.
This approach avoids holding the write lock for an extended period and should significantly reduce contention.
There was a problem hiding this comment.
Something this
- // only Lock for in-memory update
+ // Build range list outside the write lock to reduce contention
+ l.RLock()
+ rulesCopy := make(map[string]*LabelRule, len(l.labelRules)+1)
+ for k, v := range l.labelRules {
+ rulesCopy[k] = v
+ }
+ l.RUnlock()
+ rulesCopy[rule.ID] = rule
+
+ // Build the new range list without holding any lock (expensive operation)
+ newRangeList, newMinExpire := buildRangeListFrom(rulesCopy)
+
+ // Quick atomic swap under write lock
l.Lock()
- defer l.Unlock()
l.labelRules[rule.ID] = rule
- l.buildRangeList()
+ l.rangeList = newRangeList
+ l.minExpire = newMinExpire
+ l.Unlock()
There was a problem hiding this comment.
The above code may not be bulletproof — it’s just to show the idea.
There was a problem hiding this comment.
We will need compare and swap to avoid losing updates.
There was a problem hiding this comment.
I’m totally fine with handling the further optimization in a separate PR, since the compare-and-swap part could be a bit tricky.
There was a problem hiding this comment.
Good job. I think it will be helpful and we could do it in another pr.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bufferflies, okJiang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/retest |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #9857 +/- ##
==========================================
+ Coverage 76.82% 78.69% +1.87%
==========================================
Files 491 491
Lines 78473 66137 -12336
==========================================
- Hits 60283 52048 -8235
+ Misses 14486 10397 -4089
+ Partials 3704 3692 -12
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
/test pull-unit-test-next-gen |
2 similar comments
|
/test pull-unit-test-next-gen |
|
/test pull-unit-test-next-gen |
|
/test pull-integration-realcluster-test |
|
In response to a cherrypick label: new pull request created to branch |
|
/label needs-cherry-pick-release-7.5 |
|
In response to a cherrypick label: new pull request created to branch |
…surge
What problem does this PR solve?
Issue Number: Close #9854
What is changed and how does it work?
Check List
Tests
Release note