tso: validate callee id for tso requests#10600
tso: validate callee id for tso requests#10600bufferflies wants to merge 7 commits intotikv:masterfrom
Conversation
Signed-off-by: bufferflies <1045931706@qq.com>
Signed-off-by: bufferflies <1045931706@qq.com>
Signed-off-by: bufferflies <1045931706@qq.com>
Signed-off-by: bufferflies <1045931706@qq.com>
Signed-off-by: bufferflies <1045931706@qq.com>
Signed-off-by: bufferflies <1045931706@qq.com>
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughClients now include a callee ID in TSO requests and validate it server-side against the server's advertise listen host. On callee mismatches, clients remove the stale gRPC connection and retry with a fresh connection. ServiceDiscovery gained RemoveClientConn to support selective connection cleanup. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as TSO Client
participant SD as ServiceDiscovery
participant Conn as gRPC Conn
participant Server as TSO Server
rect rgba(100,200,100,0.5)
Note over Client,Server: Initial request with callee ID
Client->>Client: extract calleeID from serverURL
Client->>SD: GetOrCreateGRPCConn(url)
SD-->>Client: *grpc.ClientConn
Client->>Server: TsoRequest (CalleeId header)
Server->>Server: parse advertise host & compare CalleeId
alt match
Server-->>Client: Response
else mismatch
Server-->>Client: FailedPrecondition (mismatch)
end
end
rect rgba(100,150,200,0.5)
Note over Client,SD: Recovery on mismatch
Client->>Client: IsCalleeMismatch(err)
Client->>SD: RemoveClientConn(url)
SD->>Conn: Close() and delete cache entry
Client->>SD: GetOrCreateGRPCConn(url) [retry]
SD-->>Client: new *grpc.ClientConn
Client->>Server: TsoRequest (retry)
Server-->>Client: Response
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/ping @iosmanthus |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
client/errs/errs.go (1)
42-48: Harden callee-mismatch detection to include gRPC status code check.Line 47 currently relies only on substring matching against error messages. This can produce false positives if other gRPC errors coincidentally contain "mismatch callee id". Adding a status code check first narrows the scope to only
codes.FailedPreconditionerrors that match the message.♻️ Proposed refactor
import ( "strings" "go.uber.org/zap" "go.uber.org/zap/zapcore" "google.golang.org/grpc/codes" + "google.golang.org/grpc/status" "github.com/pingcap/errors" ) @@ func IsCalleeMismatch(err error) bool { if err == nil { return false } - return strings.Contains(err.Error(), MismatchCalleeIDErr) + cause := errors.Cause(err) + if status.Code(cause) != codes.FailedPrecondition { + return false + } + return strings.Contains(status.Convert(cause).Message(), MismatchCalleeIDErr) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/errs/errs.go` around lines 42 - 48, IsCalleeMismatch currently only does substring matching on err.Error(), which can false-positive; update IsCalleeMismatch to first extract a gRPC status via status.FromError(err), verify the status.Code() == codes.FailedPrecondition, and only then check that st.Message() (or err.Error()) contains the MismatchCalleeIDErr constant; reference the IsCalleeMismatch function, MismatchCalleeIDErr constant, and use status.FromError and codes.FailedPrecondition in the check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/errs/errs.go`:
- Around line 42-48: IsCalleeMismatch currently only does substring matching on
err.Error(), which can false-positive; update IsCalleeMismatch to first extract
a gRPC status via status.FromError(err), verify the status.Code() ==
codes.FailedPrecondition, and only then check that st.Message() (or err.Error())
contains the MismatchCalleeIDErr constant; reference the IsCalleeMismatch
function, MismatchCalleeIDErr constant, and use status.FromError and
codes.FailedPrecondition in the check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 96598a2f-f433-46f3-8eb3-101e98530b4f
⛔ Files ignored due to path filters (4)
client/go.sumis excluded by!**/*.sumgo.sumis excluded by!**/*.sumtests/integrations/go.sumis excluded by!**/*.sumtools/go.sumis excluded by!**/*.sum
📒 Files selected for processing (16)
client/clients/tso/client.goclient/clients/tso/dispatcher.goclient/clients/tso/stream.goclient/errs/errno.goclient/errs/errs.goclient/go.modclient/resource_manager_client_test.goclient/servicediscovery/mock_service_discovery.goclient/servicediscovery/router_service_discovery.goclient/servicediscovery/service_discovery.goclient/servicediscovery/tso_service_discovery.gogo.modpkg/mcs/tso/server/grpc_service.goserver/cluster/cluster.gotests/integrations/go.modtools/go.mod
|
@iosmanthus: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: iosmanthus The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| svcDiscovery := td.provider.getServiceDiscovery() | ||
| if errs.IsLeaderChange(err) { | ||
| switch { | ||
| case errs.IsCalleeMismatch(err): |
There was a problem hiding this comment.
We need a test to cover this change.
Signed-off-by: bufferflies <1045931706@qq.com>
|
@bufferflies: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Linking issues" section in the CONTRIBUTING.md. |
| return &Service{ | ||
| Server: server, | ||
| Server: server, | ||
| advertiseListenHost: getAdvertiseListenHost(server.GetAdvertiseListenAddr()), |
There was a problem hiding this comment.
Will the startServer init advertiseListenHost?
Issue Number
ref #10516, close #10552
What problem does this PR solve?
The TSO client can keep using a stale gRPC connection after service endpoint changes. This patch carries a callee ID in TSO requests and lets the server reject requests that land on the wrong endpoint, so the client can drop the stale connection and reconnect.
What is changed and how does it work?
CalleeIdto TSO request headersRemoveClientConnto service discovery implementations so stale gRPC connections can be closed and recreatedgithub.com/pingcap/kvprototo latestv0.0.0-20260414083400-4388bfaaedabgofmt/test-stub follow-up required formake checkCheck List
Validation
go test ./pkg/mcs/tso/server -count=1cd client && go test ./errs ./servicediscovery ./clients/tso -run TestDoesNotExist -count=1make checkauthor: @iosmanthus
cp
88e4f7d1Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores