core: region heartbeat with bucket meta#10231
core: region heartbeat with bucket meta#10231ti-chi-bot[bot] merged 8 commits intotikv:release-nextgen-20251011from
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds bucket metadata tracking: Changes
Sequence DiagramsequenceDiagram
participant Client as Client/Heartbeat
participant Region as RegionInfo
participant Syncer as Syncer/Cluster
participant Guide as RegionGuide
Client->>Region: RegionFromHeartbeat(hb)
activate Region
Region->>Client: hb.GetBucketMeta()
Client-->>Region: *metapb.BucketMeta
Region->>Region: attach bucketMeta to RegionInfo
deactivate Region
Syncer->>Region: Receive region update / PutRegion
activate Syncer
Note right of Syncer: Previously updated buckets after PutRegion (now removed)
Syncer->>Region: GetBuckets() / GetBucketMeta()
Syncer-->>Region: keep existing buckets (no post-PutRegion update)
deactivate Syncer
Region->>Guide: GenerateRegionGuideFunc()
activate Guide
Guide->>Guide: compare old vs new bucket versions (including 0→non-zero)
alt version changed
Guide->>Guide: mark region for update / log change
end
deactivate Guide
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: tongjian <1045931706@qq.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release-nextgen-20251011 #10231 +/- ##
============================================================
- Coverage 78.73% 78.69% -0.04%
============================================================
Files 491 491
Lines 66124 66341 +217
============================================================
+ Hits 52060 52209 +149
- Misses 10308 10380 +72
+ Partials 3756 3752 -4
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
/retest |
1 similar comment
|
/retest |
|
/retest-required |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/syncer/client.go (1)
222-245:⚠️ Potential issue | 🔴 CriticalData race:
SetBucketMetais called after the region is already stored in the shared cache.On Line 239,
bc.PutRegion(region)makes the region visible to other goroutines. Then on Line 244,region.SetBucketMeta(buckets[i])mutates the region'sbucketMetafield without synchronization, while other goroutines may concurrently callGetBuckets()(which readsbucketMeta).The simplest fix is to set
bucketMetaduring construction via theWithBucketMetaoption (beforePutRegion), keeping it consistent with howSetBucketsis already handled:🐛 Proposed fix to eliminate the data race
if hasBuckets { opts = append(opts, core.SetBuckets(buckets[i])) + opts = append(opts, core.WithBucketMeta(&metapb.BucketMeta{ + Version: buckets[i].GetVersion(), + Keys: buckets[i].GetKeys(), + })) } region = core.NewRegionInfo(r, regionLeader, opts...)And remove or guard the post-
PutRegionSetBucketMetacall:if hasBuckets { if old := origin.GetBuckets(); buckets[i].GetVersion() > old.GetVersion() { region.UpdateBuckets(buckets[i], old) - region.SetBucketMeta(buckets[i]) } }pkg/core/region.go (1)
207-222:⚠️ Potential issue | 🔴 CriticalAdd
BucketMetafield to the scheduling service forwarding request at line 1414.The interface now requires
GetBucketMeta(), andRegionFromHeartbeat()calls it at line 250. However, the forwarding code inserver/grpc_service.go(lines 1414–1433) constructsschedulingpbReqwithout theBucketMetafield. This causes bucket metadata from the original heartbeat to be silently dropped when forwarding to the scheduling service.Missing field in forward request
schedulingpbReq := &schedulingpb.RegionHeartbeatRequest{ // ... existing fields ... QueryStats: request.GetQueryStats(), // Missing: BucketMeta field assignment }Add:
BucketMeta: request.GetBucketMeta(),
🤖 Fix all issues with AI agents
In `@pkg/core/region_test.go`:
- Around line 1460-1461: Replace the truncated comment "// Inherit false if
region" with a complete sentence that explains the expected behavior in this
test: e.g., "Inherit should not override buckets when region has bucketMeta
set." Reference the test target by name (region) and the assertions using
region.GetBuckets().GetVersion() so readers understand this assertion verifies
that Inherit does not modify bucket metadata/version for a region that already
has bucketMeta.
In `@pkg/core/region.go`:
- Around line 617-623: SetBucketMeta writes the plain pointer field bucketMeta
on RegionInfo without synchronization, causing a data race when GetBuckets or
other readers access bucketMeta concurrently (e.g., after bc.PutRegion(region)).
Fix by making bucketMeta an atomic pointer (e.g.,
atomic.Pointer[metapb.BucketMeta] or unsafe.Pointer with
atomic.StorePointer/atomic.LoadPointer) and update SetBucketMeta to use atomic
store and GetBuckets to use atomic load; alternatively, ensure callers set
bucketMeta before calling PutRegion (update syncer path where SetBucketMeta is
invoked) so no concurrent readers observe a non-atomic write. Ensure references
to bucketMeta in RegionInfo, SetBucketMeta, and GetBuckets are updated
consistently.
🧹 Nitpick comments (2)
pkg/core/region.go (1)
645-651:GetBuckets()allocates a newmetapb.Bucketson every call whenbucketMetais set.This is called repeatedly in hot paths — for example, Line 912 calls
GetBuckets()four times withinGenerateRegionGuideFunc(executed per heartbeat). Each call allocates a new struct.At minimum, cache the result in local variables at Line 912:
♻️ Suggested optimization in GenerateRegionGuideFunc
- if region.GetBuckets().GetVersion() > origin.GetBuckets().GetVersion() || (region.GetBuckets().GetVersion() == 0 && origin.GetBuckets().GetVersion() > 0) { + regionBuckets := region.GetBuckets() + originBuckets := origin.GetBuckets() + if regionBuckets.GetVersion() > originBuckets.GetVersion() || (regionBuckets.GetVersion() == 0 && originBuckets.GetVersion() > 0) { if log.GetLevel() <= zap.DebugLevel { debug("bucket key changed", zap.Uint64("region-id", region.GetID()), - zap.Uint64("old-bucket-version", origin.GetBuckets().GetVersion()), - zap.Uint64("new-bucket-version", region.GetBuckets().GetVersion())) + zap.Uint64("old-bucket-version", originBuckets.GetVersion()), + zap.Uint64("new-bucket-version", regionBuckets.GetVersion())) }server/cluster/cluster_test.go (1)
620-654: Consider strengthening assertions and testing the "stale" path.
Line 653 only asserts
re.NotEqual(bucket1, region1.GetBuckets())— a negative check. Consider also asserting the expected state (e.g., the new bucket version) to catch regressions more precisely.The test name is
TestStaleBucketMeta, but it only tests that a newer bucket meta (v3) replaces old buckets (v2). It doesn't verify that a stale (older version) bucket meta is correctly rejected. Adding that scenario would match the test name and increase coverage.💡 Suggested additions
region1 := cluster.GetRegion(1) re.NotEqual(bucket1, region1.GetBuckets()) + re.Equal(uint64(3), region1.GetBuckets().GetVersion()) + + // Stale bucket meta (version < current) should not overwrite + staleBucketMeta := &metapb.BucketMeta{ + Version: 1, + Keys: [][]byte{{'c'}, {'d'}}, + } + region3 := region2.Clone(core.WithIncVersion(), core.WithBucketMeta(staleBucketMeta)) + re.NoError(cluster.processRegionHeartbeat(core.ContextTODO(), region3)) + re.Equal(uint64(3), cluster.GetRegion(1).GetBuckets().GetVersion()) }
a789c5e to
a1f2b2c
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
/retest |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/go.mod (1)
3-13:⚠️ Potential issue | 🟠 MajorUpdate the
godirective to 1.25+ to match CI requirements.The client module targets Go 1.23.12, but CI and the coding guidelines require Go 1.25 or higher. Update the go directive and run
make tidyto ensure go.mod/go.sum remain clean.🔧 Suggested update
-go 1.23.12 +go 1.25.0
🧹 Nitpick comments (2)
server/cluster/cluster_test.go (1)
620-654: Make the stale-bucket assertion explicit.
re.NotEqual(bucket1, region1.GetBuckets())could pass if buckets are nil or unrelated. Assert the expected version/keys so the test fails only when the update is missing.✅ Suggested assertion tightening
- re.NotEqual(bucket1, region1.GetBuckets()) + re.Equal(uint64(3), region1.GetBuckets().GetVersion()) + re.Equal(bucket2.GetKeys(), region1.GetBuckets().GetKeys())pkg/core/region.go (1)
637-643:GetBuckets()allocates a newmetapb.Bucketson every call whenbucketMetais set.This method is called on hot paths (heartbeat guide function, query regions). Each invocation with a non-nil
bucketMetacreates a heap allocation. Consider caching the derived*metapb.Bucketsin the struct at construction time (e.g., inRegionFromHeartbeatandClone) instead of rebuilding it on every read.♻️ Suggested approach
Populate a
bucketsfield frombucketMetaonce during construction inRegionFromHeartbeat:region := &RegionInfo{ ... bucketMeta: heartbeat.GetBucketMeta(), } + if region.bucketMeta != nil { + b := &metapb.Buckets{ + RegionId: region.GetID(), + Version: region.bucketMeta.GetVersion(), + Keys: region.bucketMeta.GetKeys(), + } + atomic.StorePointer(®ion.buckets, unsafe.Pointer(b)) + }Then
GetBuckets()can simply use the existing atomic load path without the per-call allocation branch.Based on learnings: "Mind allocations; reuse buffers/pools where appropriate"
|
/retest |
|
/hold cancel |
|
test failed |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pkg/core/region_option.go`:
- Around line 240-252: SetBuckets currently assigns &metapb.BucketMeta{} when
buckets==nil which leaves region.bucketMeta non-nil and breaks GetBuckets
nil-checks; change SetBuckets to set region.bucketMeta = nil when buckets ==
nil. Additionally, update the Inherit logic (the code that copies bucketMeta
from another RegionInfo) to explicitly clear bucketMeta when bucketEnable ==
false (set r.bucketMeta = nil and clear any derived/atomic r.buckets state if
applicable) so disabling buckets removes stale empty structs.
🧹 Nitpick comments (2)
pkg/core/region.go (1)
637-643:GetBuckets()allocates a new*metapb.Bucketson every call whenbucketMetais set.This method is called in hot paths (e.g.,
GenerateRegionGuideFunc,QueryRegions,sortOutKeyIDMap). Each invocation with a non-nilbucketMetacreates a heap-allocated struct. Consider caching the constructedBucketsat region-creation time (e.g., inRegionFromHeartbeator via theWithBucketMetaoption) instead of rebuilding it on every read.♻️ Sketch: pre-compute buckets from bucketMeta at construction time
For example, in
RegionFromHeartbeat(and analogously inWithBucketMeta):+ if region.bucketMeta != nil { + region.buckets = unsafe.Pointer(&metapb.Buckets{ + RegionId: region.GetID(), + Version: region.bucketMeta.GetVersion(), + Keys: region.bucketMeta.GetKeys(), + }) + }Then
GetBucketscan simply return the atomic-loaded pointer without branching onbucketMeta:func (r *RegionInfo) GetBuckets() *metapb.Buckets { if r == nil { return nil } - if meta := r.bucketMeta; meta != nil { - return &metapb.Buckets{ - RegionId: r.GetID(), - Version: meta.GetVersion(), - Keys: meta.GetKeys(), - } - } buckets := atomic.LoadPointer(&r.buckets) return (*metapb.Buckets)(buckets) }This also eliminates the unsynchronized read of
bucketMetaflagged in the previous review.server/cluster/cluster_test.go (1)
620-654: Test only asserts inequality — consider verifying the expected bucket state.Line 653 checks
re.NotEqual(bucket1, region1.GetBuckets())but doesn't assert whatregion1.GetBuckets()should be after the heartbeat withbucket2. A positive assertion (e.g., verifying the version or keys matchbucket2) would make the test more robust against false passes.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
go.mod (1)
3-3:⚠️ Potential issue | 🟠 MajorUpdate Go version directive to 1.25+ across all modules.
The root go.mod and all submodules (client/, tools/, tests/integrations/) are still at go 1.23.12, which violates the project guideline requiring Go 1.25+. CI uses Go 1.25. Update all four go.mod files to 1.25, and run
make tidyto clean up the dependency tree (go.sum currently contains multiple kvproto versions).Suggested fix for root go.mod
-go 1.23.12 +go 1.25Apply the same change to:
client/go.mod,tools/go.mod,tests/integrations/go.mod
🧹 Nitpick comments (2)
server/cluster/cluster_test.go (1)
620-654: Assert expected bucket2 values for stronger coverage.
Right now the test only checksbucket1 != current; asserting the version/keys matchbucket2reduces false positives.Suggested enhancement
region1 := cluster.GetRegion(1) re.NotEqual(bucket1, region1.GetBuckets()) +re.Equal(bucket2.GetVersion(), region1.GetBuckets().GetVersion()) +re.Equal(bucket2.GetKeys(), region1.GetBuckets().GetKeys())pkg/core/region.go (1)
642-648:GetBuckets()allocates a newmetapb.Bucketson every call whenbucketMetais set — hot-path allocation concern.This method is called on every heartbeat path (e.g.,
GenerateRegionGuideFunccalls it for both new and origin regions). Each invocation with a non-nilbucketMetaallocates a freshBucketsstruct. Consider caching the constructedBucketsin theRegionInfo(e.g., populate it once during construction/clone) or returning a read-only view to avoid per-call allocation on this hot path.♻️ Sketch: pre-compute buckets from bucketMeta at construction time
One approach: when
bucketMetais set (inRegionFromHeartbeat,Clone,Inherit, option), also store the corresponding*metapb.Bucketsin the existingbucketsfield viaunsafe.Pointer, soGetBuckets()can always use the zero-allocation atomic load path.func (r *RegionInfo) GetBuckets() *metapb.Buckets { if r == nil { return nil } - if meta := r.bucketMeta; meta != nil { - return &metapb.Buckets{ - RegionId: r.GetID(), - Version: meta.GetVersion(), - Keys: meta.GetKeys(), - } - } buckets := atomic.LoadPointer(&r.buckets) return (*metapb.Buckets)(buckets) }Then ensure that whenever
bucketMetais set, thebucketsunsafe.Pointer is also atomically stored with the derived value.Based on learnings: "Mind allocations; reuse buffers/pools where appropriate".
cf21a32 to
b07c34d
Compare
|
/retest |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
tools/go.mod (1)
3-3:⚠️ Potential issue | 🟠 MajorUpdate the go directive to 1.25 and re-tidy this module.
This module declares
go 1.23.12but policy requires 1.25+. After the kvproto dependency bump on line 26, runmake tidyto ensurego.sumstays clean.tests/integrations/go.mod (1)
3-3:⚠️ Potential issue | 🟠 MajorUpdate the Go version directive and re-tidy this module.
Line 3 declares
go 1.23.12; policy requires Go 1.25+. After the kvproto dependency update at line 17, runmake tidyto ensurego.modandgo.sumstay clean.client/go.mod (1)
3-3:⚠️ Potential issue | 🟠 MajorUpdate the go directive to 1.25 or higher and re-tidy this module.
This module declares
go 1.23.12but policy requires Go 1.25+. After the kvproto dependency bump, runmake tidyand ensurego.sumstays clean.go.mod (1)
3-3:⚠️ Potential issue | 🟠 MajorUpdate the go directive and re‑tidy this module.
This module still declares
go 1.23.12; policy requires 1.25+. After the kvproto bump, please runmake tidyfor this module and ensurego.sumstays clean.
|
/test pull-unit-test-next-gen-1 |
fb55f2c
into
tikv:release-nextgen-20251011
What problem does this PR solve?
Issue Number: Ref #10117
CP: #10120
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
pingcap/docs/pingcap/docs-cn:pingcap/tiup:Release note
Summary by CodeRabbit
New Features
Bug Fixes
Tests