domain/affinity: update pd client for api#65592
domain/affinity: update pd client for api#65592ti-chi-bot[bot] merged 6 commits intopingcap:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
Hi @lhy1024. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: lhy1024 <admin@liudos.us>
4de3424 to
9e90c9c
Compare
📝 WalkthroughWalkthroughUpdates pd client dependency and adds resilient affinity-group management in Changes
Sequence Diagram(s)sequenceDiagram
participant Manager as "Affinity Manager\nrgba(100,150,255,0.5)"
participant PD as "PD HTTP API\nrgba(255,200,100,0.5)"
participant Scanner as "Local Scanner\nrgba(150,255,150,0.5)"
Manager->>PD: CreateAffinityGroups (skip-exist)
alt create succeeded
PD-->>Manager: success / skipped
else create failed with conflict/unsupported
PD-->>Manager: error (Conflict/BadRequest/ServiceErr)
Manager->>PD: GetAffinityGroups by IDs
alt PD returns requested subset
PD-->>Manager: existing groups
Manager->>PD: Create missing groups
else PD doesn't support IDs or query too large
Manager->>Scanner: GetAllAffinityGroups (scan)
Scanner-->>Manager: filtered groups for requested IDs
Manager->>PD: Create missing groups
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Command failed Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@go.mod`:
- Line 129: The go.mod update bumps the github.com/tikv/pd/client module but the
PR is missing Bazel metadata changes; run the repository's make bazel_prepare
target to regenerate Bazel metadata (WORKSPACE/BUILD.bazel files and any
generated module hashes) and commit those resulting changes alongside the
go.mod/go.sum update so the Bazel build graph stays in sync with the new
github.com/tikv/pd/client version.
In `@pkg/domain/affinity/manager.go`:
- Around line 81-95: The function affinityGroupIDsQueryLen computes the query
length using raw id lengths but ignores URL encoding which can expand characters
(e.g., spaces -> %20); update affinityGroupIDsQueryLen to account for encoded
length by calling url.QueryEscape (or url.PathEscape) on each id and using len
of the escaped string when summing, preserving the same "&ids=" / "ids=" prefix
logic so the returned total reflects the actual encoded query size.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (2)
go.modpkg/domain/affinity/manager.go
lcwangchao
left a comment
There was a problem hiding this comment.
please make sure CI pass
reset LGTM
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #65592 +/- ##
================================================
+ Coverage 77.6973% 79.5473% +1.8499%
================================================
Files 2007 1956 -51
Lines 549532 541406 -8126
================================================
+ Hits 426972 430674 +3702
+ Misses 120900 109259 -11641
+ Partials 1660 1473 -187
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/retest |
|
@lhy1024: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test unit-test |
|
@lhy1024: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
D3Hunter
left a comment
There was a problem hiding this comment.
AI-generated review based on @D3Hunter's standards; manual review will be performed after all comments are resolved.
Summary
- Total findings: 8
- Inline comments: 8
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
⚠️ [Major] (3)
- Idempotent affinity-group creation now depends on new PD query support without a fallback (
pkg/domain/affinity/manager.go:53) - Create-idempotency now depends on PD skip-exist feature without compatibility guard (
pkg/domain/affinity/manager.go:53) - Affinity create/get paths assume new PD query semantics without N-1 fallback (
pkg/domain/affinity/manager.go:53)
🟡 [Minor] (4)
affinityGroupIDsQueryLenimplies exact query size, but current logic is only an estimate (pkg/domain/affinity/manager.go:81)- Fallback branch uses hard thresholds without intent documentation (
pkg/domain/affinity/manager.go:35) - URI-length guard can miss over-limit requests because it does not model escaped query size (
pkg/domain/affinity/manager.go:71) - No targeted tests for new compatibility branches and fallback thresholds (
pkg/domain/affinity/manager.go:71)
ℹ️ [Info] (1)
CreateAffinityGroupsIfNotExistscomment does not describe the current single-call path (pkg/domain/affinity/manager.go:45)
|
|
||
| // Create only the groups that don't exist | ||
| _, err = m.Client.CreateAffinityGroups(ctx, groupsToCreate) | ||
| _, err := m.Client.CreateAffinityGroups(ctx, groups, pdhttp.WithSkipExistCheck()) |
There was a problem hiding this comment.
⚠️ [Major] Idempotent affinity-group creation now depends on new PD query support without a fallback
Why
CreateGroupsIfNotExists is a critical DDL retry path, but the new implementation removed client-side existence filtering and now unconditionally relies on skip_exist_check support in PD.
Scope
pkg/domain/affinity/manager.go:53
Risk if unchanged
In N-1 mixed-version or partially upgraded PD deployments, retries after partial success can fail with duplicate-group errors, causing CREATE/ALTER/TRUNCATE affinity-related DDL jobs to fail instead of being idempotent.
Evidence
This diff replaces the previous GetAffinityGroups + filter-before-create flow with a single call: m.Client.CreateAffinityGroups(ctx, groups, pdhttp.WithSkipExistCheck()), and no compatibility fallback remains in CreateAffinityGroupsIfNotExists.
Change request
is this backward compatible with N-1 nodes? Please add a fallback path: if skip_exist_check is unsupported or rejected, fall back to the prior behavior (query existing IDs and only create missing groups) to preserve retry idempotency.
| pdhttp.Client | ||
| } | ||
|
|
||
| const ( |
There was a problem hiding this comment.
🟡 [Minor] Fallback branch uses hard thresholds without intent documentation
Why
The new GetAllAffinityGroups fallback is non-obvious behavior (switching from targeted fetch to full-list fetch + filter), but there is no rationale comment explaining why 100 IDs and 4096 query length were chosen.
Scope
pkg/domain/affinity/manager.go:35
Risk if unchanged
Thresholds can be changed or copied without understanding protocol and performance tradeoffs, increasing maintenance risk and making future tuning error-prone.
Evidence
Constants maxAffinityGroupIDsQueryLen and maxAffinityGroupIDsCount are introduced and immediately used in a branch that changes retrieval strategy, with no nearby why-comment or source-limit reference.
Change request
Please add a short why-comment for this branch, including where 4096 and 100 come from and why full-scan fallback is the intended behavior.
Signed-off-by: lhy1024 <admin@liudos.us>
|
Review Complete Findings: 1 issues ℹ️ Learn more details on Pantheon AI. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cfzjywxk, D3Hunter, lcwangchao The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/domain/affinity/manager_test.go (1)
160-180: Add explicit 404/414 fallback test cases forGetAffinityGroups.Current fallback tests cover
400 Bad Request, butshouldFallbackGetAffinityGroupsalso includes404 Not Foundand414 Request-URI Too Long. A small table-driven extension would lock all intended compatibility branches.🧪 Suggested extension
func TestGetAffinityGroupsFallbackWhenIDsQueryUnsupported(t *testing.T) { ctx := context.Background() - client := &mockPDClient{} - manager := &pdManager{Client: client} - ids := []string{"g1"} - client.On("GetAffinityGroups", mock.Anything, ids).Return(nil, pdStatusErr(http.StatusBadRequest)).Once() - client.On("GetAllAffinityGroups", mock.Anything).Return( - map[string]*pdhttp.AffinityGroupState{ - "g1": affinityGroupState("g1"), - "g2": affinityGroupState("g2"), - }, nil, - ).Once() - - result, err := manager.GetAffinityGroups(ctx, ids) - require.NoError(t, err) - require.Len(t, result, 1) - require.Contains(t, result, "g1") - require.NotContains(t, result, "g2") - client.AssertExpectations(t) + for _, status := range []int{ + http.StatusBadRequest, + http.StatusNotFound, + http.StatusRequestURITooLong, + } { + t.Run(http.StatusText(status), func(t *testing.T) { + client := &mockPDClient{} + manager := &pdManager{Client: client} + client.On("GetAffinityGroups", mock.Anything, ids).Return(nil, pdStatusErr(status)).Once() + client.On("GetAllAffinityGroups", mock.Anything).Return( + map[string]*pdhttp.AffinityGroupState{ + "g1": affinityGroupState("g1"), + "g2": affinityGroupState("g2"), + }, nil, + ).Once() + + result, err := manager.GetAffinityGroups(ctx, ids) + require.NoError(t, err) + require.Len(t, result, 1) + require.Contains(t, result, "g1") + require.NotContains(t, result, "g2") + client.AssertExpectations(t) + }) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/domain/affinity/manager_test.go` around lines 160 - 180, Extend TestGetAffinityGroupsFallbackWhenIDsQueryUnsupported into a small table-driven test that iterates over the HTTP statuses used by shouldFallbackGetAffinityGroups (400, 404, 414) and verifies the fallback to GetAllAffinityGroups for each case: for each status, configure the mockPDClient to have GetAffinityGroups return (nil, pdStatusErr(status)) and GetAllAffinityGroups return the full map, then call pdManager.GetAffinityGroups and assert no error, result contains only the requested id ("g1") and not "g2", and call client.AssertExpectations; keep the existing test logic but replace the single-status setup with a loop over the [http.StatusBadRequest, http.StatusNotFound, http.StatusRequestURITooLong] table entries so all branches of shouldFallbackGetAffinityGroups are covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/domain/affinity/BUILD.bazel`:
- Around line 18-30: The BUILD.bazel change (target "affinity_test") and updated
PD client deps require regenerating Bazel metadata: run the repository's make
bazel_prepare to update generated BUILD/WORKSPACE/third_party metadata, add the
produced files to your commit/PR (the generated Bazel outputs that make
bazel_prepare updates), and push those changes so CI can resolve dependencies
consistently.
In `@pkg/domain/affinity/manager.go`:
- Around line 168-174: The isPDHTTPStatusError function currently matches the
full "status: 'NNN Reason'" substring which is brittle; change the check to only
detect the numeric status code portion (e.g., look for "status: 'NNN" or use a
small regex that extracts the three-digit code) so the reason phrase is ignored.
Update isPDHTTPStatusError to build/check for the numeric prefix using the
statusCode (or parse the error string for a leading numeric status) and return
true when the numeric code matches, referencing the existing isPDHTTPStatusError
function to locate and modify the logic.
---
Nitpick comments:
In `@pkg/domain/affinity/manager_test.go`:
- Around line 160-180: Extend
TestGetAffinityGroupsFallbackWhenIDsQueryUnsupported into a small table-driven
test that iterates over the HTTP statuses used by
shouldFallbackGetAffinityGroups (400, 404, 414) and verifies the fallback to
GetAllAffinityGroups for each case: for each status, configure the mockPDClient
to have GetAffinityGroups return (nil, pdStatusErr(status)) and
GetAllAffinityGroups return the full map, then call pdManager.GetAffinityGroups
and assert no error, result contains only the requested id ("g1") and not "g2",
and call client.AssertExpectations; keep the existing test logic but replace the
single-status setup with a loop over the [http.StatusBadRequest,
http.StatusNotFound, http.StatusRequestURITooLong] table entries so all branches
of shouldFallbackGetAffinityGroups are covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c2a25900-a128-4c3b-99ba-3cec8b594579
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
pkg/domain/affinity/BUILD.bazelpkg/domain/affinity/manager.gopkg/domain/affinity/manager_test.go
| go_test( | ||
| name = "affinity_test", | ||
| timeout = "short", | ||
| srcs = ["manager_test.go"], | ||
| embed = [":affinity"], | ||
| flaky = True, | ||
| shard_count = 8, | ||
| deps = [ | ||
| "@com_github_stretchr_testify//mock", | ||
| "@com_github_stretchr_testify//require", | ||
| "@com_github_tikv_pd_client//http", | ||
| ], | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Get the list of all changed files in this PR
echo "=== All changed files in this PR ==="
git diff --name-only --diff-filter=ACMR HEAD~1...HEAD 2>/dev/null | sort || git diff --name-only HEAD^ HEAD 2>/dev/null | sort || echo "Unable to determine changed files"
echo
echo "=== Bazel and module-related file changes ==="
git diff --name-only --diff-filter=ACMR HEAD~1...HEAD 2>/dev/null | grep -E '(BUILD\.bazel|\.bazel$|\.bzl$|go\.mod|go\.sum|DEPS\.bzl|MODULE\.bazel)' | sort || echo "None found"
echo
echo "=== Checking for common Bazel metadata outputs ==="
git diff --name-only --diff-filter=ACMR HEAD~1...HEAD 2>/dev/null | grep -E '(DEPS\.bzl|MODULE\.bazel\.lock|gazelle_python_manifest\.json)' | sort || echo "No metadata files found"Repository: pingcap/tidb
Length of output: 260
Ensure make bazel_prepare outputs are committed when BUILD.bazel or dependency files change.
This PR modifies BUILD.bazel and updates PD client dependencies. Per repository guidelines, when these files change, the resulting Bazel metadata must be regenerated via make bazel_prepare and committed with the PR to prevent CI resolution drift.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/domain/affinity/BUILD.bazel` around lines 18 - 30, The BUILD.bazel change
(target "affinity_test") and updated PD client deps require regenerating Bazel
metadata: run the repository's make bazel_prepare to update generated
BUILD/WORKSPACE/third_party metadata, add the produced files to your commit/PR
(the generated Bazel outputs that make bazel_prepare updates), and push those
changes so CI can resolve dependencies consistently.
pkg/domain/affinity/manager.go
Outdated
| if err == nil { | ||
| return false | ||
| } | ||
| expectStatus := fmt.Sprintf("status: '%d %s'", statusCode, http.StatusText(statusCode)) |
There was a problem hiding this comment.
[P1] Fragile PD HTTP error-string matching breaks mixed-version fallbacks
Why: isPDHTTPStatusError does brittle substring matching on err.Error() to detect HTTP status codes (400/409/414) for mixed-version compatibility fallbacks. However, in the production wiring, GlobalInfoSyncerInit injects pdResponseHandler via WithRespHandler, and the upstream PD HTTP client short-circuits to that handler — skipping its own "request pd http api failed with status: '...'" error formatting. As a result, errors for those status codes will never contain the expected "status: '<code> <reason>'" substring, so the fallbacks silently never trigger, and affinity DDL (create/batch-get groups) can be incorrectly cancelled when PD returns 400/409/414 in mixed-version deployments.
Evidence: pkg/domain/affinity/manager.go:168-172
expectStatus := fmt.Sprintf("status: '%d %s'", statusCode, http.StatusText(statusCode))
return strings.Contains(err.Error(), expectStatus)Production error path goes through pdResponseHandler → ErrHTTPServiceError, which does not embed this substring. Suggest matching on a typed error (e.g., errors.Is/errors.As on the PD client's error type) or checking the numeric code directly rather than parsing the error string.
Signed-off-by: lhy1024 <admin@liudos.us>
|
Review Complete Findings: 0 issues ℹ️ Learn more details on Pantheon AI. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/domain/affinity/manager_test.go (1)
64-256: Add one focused test forDeleteAffinityGroupsforce-delete behavior.
DeleteAffinityGroupsis new in the manager implementation, but this suite does not currently assert itsforce=truecall contract to PD. A small unit test here would close that gap.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/domain/affinity/manager_test.go` around lines 64 - 256, Add a unit test that verifies pdManager.DeleteAffinityGroups calls the PD client with force=true: create a mockPDClient and pdManager{Client: client}, set up an ids slice (e.g., []string{"g1"}), expect client.On("DeleteAffinityGroups", mock.Anything, ids, true).Return(nil).Once(), call manager.DeleteAffinityGroups(ctx, ids) and assert no error and client.AssertExpectations(t); use the existing test helpers (mock.Anything, mockPDClient, pdManager) so the test mirrors the style of other tests in this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/domain/affinity/manager_test.go`:
- Around line 64-256: Add a unit test that verifies
pdManager.DeleteAffinityGroups calls the PD client with force=true: create a
mockPDClient and pdManager{Client: client}, set up an ids slice (e.g.,
[]string{"g1"}), expect client.On("DeleteAffinityGroups", mock.Anything, ids,
true).Return(nil).Once(), call manager.DeleteAffinityGroups(ctx, ids) and assert
no error and client.AssertExpectations(t); use the existing test helpers
(mock.Anything, mockPDClient, pdManager) so the test mirrors the style of other
tests in this file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4dbcfb75-9dfc-4d9f-87f9-654ffe7ae968
📒 Files selected for processing (3)
pkg/domain/affinity/BUILD.bazelpkg/domain/affinity/manager.gopkg/domain/affinity/manager_test.go
|
/retest |
|
@lhy1024: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
/retest |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: lhy1024 <admin@liudos.us>
What problem does this PR solve?
Issue Number: ref #64938
Problem Summary:
What changed and how does it work?
Check List
Tests
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit