Skip to content

domain/affinity: update pd client for api#65592

Merged
ti-chi-bot[bot] merged 6 commits intopingcap:masterfrom
lhy1024:affinity-api
Mar 5, 2026
Merged

domain/affinity: update pd client for api#65592
ti-chi-bot[bot] merged 6 commits intopingcap:masterfrom
lhy1024:affinity-api

Conversation

@lhy1024
Copy link
Copy Markdown
Contributor

@lhy1024 lhy1024 commented Jan 15, 2026

What problem does this PR solve?

Issue Number: ref #64938
Problem Summary:

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Performance
    • Faster, more scalable retrieval of affinity groups for large ID requests.
  • Bug Fixes
    • Reduced failures by avoiding oversized queries and adding robust fallbacks when the backend rejects ID-based fetches.
    • More reliable affinity group creation with idempotent checks and safe fallback creation paths.
  • New Features
    • Support for forceful batch deletion of affinity groups.
  • Tests
    • Added comprehensive tests for creation, retrieval, fallbacks, and boundary cases.
  • Chores
    • Bumped an internal backend dependency to a newer version.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 15, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 15, 2026
@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 15, 2026

Hi @lhy1024. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 28, 2026

📝 Walkthrough

Walkthrough

Updates pd client dependency and adds resilient affinity-group management in pkg/domain/affinity: thresholded ID-query vs full-scan fallbacks, create-with-skip-exist plus filtering-based creation, force-delete, helpers, tests, and BUILD changes.

Changes

Cohort / File(s) Summary
Dependency manifest
go.mod, DEPS.bzl
Bumped github.com/tikv/pd/client pseudo-version to v0.0.0-20260228084044-4f5039d43753; updated DEPS.bzl sha256, strip_prefix, and ZIP URLs.
Affinity manager implementation
pkg/domain/affinity/manager.go
Added resilient flows: try create-with-skip-exist then fallback to filtering-based creation; GetAffinityGroups chooses ID-query or full-scan filtering based on ID count/escaped-query length and PD errors; added DeleteAffinityGroups, client-side filtering helpers, error-detection helpers, constants, and idempotence logic.
Build & tests
pkg/domain/affinity/BUILD.bazel, pkg/domain/affinity/manager_test.go
Added affinity_test bazel target, expanded go_library deps; new tests with a mock PD client covering create (successful and fallback), GetAffinityGroups direct and fallback flows, and boundary checks for query-length/ID-count logic.

Sequence Diagram(s)

sequenceDiagram
    participant Manager as "Affinity Manager\nrgba(100,150,255,0.5)"
    participant PD as "PD HTTP API\nrgba(255,200,100,0.5)"
    participant Scanner as "Local Scanner\nrgba(150,255,150,0.5)"

    Manager->>PD: CreateAffinityGroups (skip-exist)
    alt create succeeded
        PD-->>Manager: success / skipped
    else create failed with conflict/unsupported
        PD-->>Manager: error (Conflict/BadRequest/ServiceErr)
        Manager->>PD: GetAffinityGroups by IDs
        alt PD returns requested subset
            PD-->>Manager: existing groups
            Manager->>PD: Create missing groups
        else PD doesn't support IDs or query too large
            Manager->>Scanner: GetAllAffinityGroups (scan)
            Scanner-->>Manager: filtered groups for requested IDs
            Manager->>PD: Create missing groups
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hop through patches, nose to the breeze,
Bumped a client and taught queries to tease,
If PD says no, I scan, filter, then try,
Create, delete, dedupe — retry till it's dry,
A rabbit's small clap for fallbacks that please!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: updating the PD client for API compatibility in the domain/affinity package, reflecting the core intent of dependency updates and implementation changes.
Description check ✅ Passed The description includes the required issue reference and test checklist with both unit and integration tests marked as completed, though the 'What changed and how does it work?' section is empty.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Command failed

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lhy1024 lhy1024 marked this pull request as ready for review February 28, 2026 09:37
@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Feb 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Line 129: The go.mod update bumps the github.com/tikv/pd/client module but the
PR is missing Bazel metadata changes; run the repository's make bazel_prepare
target to regenerate Bazel metadata (WORKSPACE/BUILD.bazel files and any
generated module hashes) and commit those resulting changes alongside the
go.mod/go.sum update so the Bazel build graph stays in sync with the new
github.com/tikv/pd/client version.

In `@pkg/domain/affinity/manager.go`:
- Around line 81-95: The function affinityGroupIDsQueryLen computes the query
length using raw id lengths but ignores URL encoding which can expand characters
(e.g., spaces -> %20); update affinityGroupIDsQueryLen to account for encoded
length by calling url.QueryEscape (or url.PathEscape) on each id and using len
of the escaped string when summing, preserving the same "&ids=" / "ids=" prefix
logic so the returned total reflects the actual encoded query size.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a08765 and 9e90c9c.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (2)
  • go.mod
  • pkg/domain/affinity/manager.go

Copy link
Copy Markdown
Collaborator

@lcwangchao lcwangchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make sure CI pass

reset LGTM

Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 2, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 79.81651% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.5473%. Comparing base (5a08765) to head (95c2022).
⚠️ Report is 18 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #65592        +/-   ##
================================================
+ Coverage   77.6973%   79.5473%   +1.8499%     
================================================
  Files          2007       1956        -51     
  Lines        549532     541406      -8126     
================================================
+ Hits         426972     430674      +3702     
+ Misses       120900     109259     -11641     
+ Partials       1660       1473       -187     
Flag Coverage Δ
integration 47.4956% <0.0000%> (-0.6873%) ⬇️
unit 76.6860% <79.8165%> (+0.3271%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 66.2339% <ø> (+5.3721%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lhy1024
Copy link
Copy Markdown
Contributor Author

lhy1024 commented Mar 2, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 2, 2026

@lhy1024: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024
Copy link
Copy Markdown
Contributor Author

lhy1024 commented Mar 2, 2026

/test unit-test

@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 2, 2026

@lhy1024: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Mar 3, 2026
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 3, 2026
Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-generated review based on @D3Hunter's standards; manual review will be performed after all comments are resolved.

Summary

  • Total findings: 8
  • Inline comments: 8
  • Summary-only findings (no inline anchor): 0

Findings (highest risk first)

⚠️ [Major] (3)

  1. Idempotent affinity-group creation now depends on new PD query support without a fallback (pkg/domain/affinity/manager.go:53)
  2. Create-idempotency now depends on PD skip-exist feature without compatibility guard (pkg/domain/affinity/manager.go:53)
  3. Affinity create/get paths assume new PD query semantics without N-1 fallback (pkg/domain/affinity/manager.go:53)

🟡 [Minor] (4)

  1. affinityGroupIDsQueryLen implies exact query size, but current logic is only an estimate (pkg/domain/affinity/manager.go:81)
  2. Fallback branch uses hard thresholds without intent documentation (pkg/domain/affinity/manager.go:35)
  3. URI-length guard can miss over-limit requests because it does not model escaped query size (pkg/domain/affinity/manager.go:71)
  4. No targeted tests for new compatibility branches and fallback thresholds (pkg/domain/affinity/manager.go:71)

ℹ️ [Info] (1)

  1. CreateAffinityGroupsIfNotExists comment does not describe the current single-call path (pkg/domain/affinity/manager.go:45)


// Create only the groups that don't exist
_, err = m.Client.CreateAffinityGroups(ctx, groupsToCreate)
_, err := m.Client.CreateAffinityGroups(ctx, groups, pdhttp.WithSkipExistCheck())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [Major] Idempotent affinity-group creation now depends on new PD query support without a fallback

Why
CreateGroupsIfNotExists is a critical DDL retry path, but the new implementation removed client-side existence filtering and now unconditionally relies on skip_exist_check support in PD.

Scope
pkg/domain/affinity/manager.go:53

Risk if unchanged
In N-1 mixed-version or partially upgraded PD deployments, retries after partial success can fail with duplicate-group errors, causing CREATE/ALTER/TRUNCATE affinity-related DDL jobs to fail instead of being idempotent.

Evidence
This diff replaces the previous GetAffinityGroups + filter-before-create flow with a single call: m.Client.CreateAffinityGroups(ctx, groups, pdhttp.WithSkipExistCheck()), and no compatibility fallback remains in CreateAffinityGroupsIfNotExists.

Change request
is this backward compatible with N-1 nodes? Please add a fallback path: if skip_exist_check is unsupported or rejected, fall back to the prior behavior (query existing IDs and only create missing groups) to preserve retry idempotency.

pdhttp.Client
}

const (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 [Minor] Fallback branch uses hard thresholds without intent documentation

Why
The new GetAllAffinityGroups fallback is non-obvious behavior (switching from targeted fetch to full-list fetch + filter), but there is no rationale comment explaining why 100 IDs and 4096 query length were chosen.

Scope
pkg/domain/affinity/manager.go:35

Risk if unchanged
Thresholds can be changed or copied without understanding protocol and performance tradeoffs, increasing maintenance risk and making future tuning error-prone.

Evidence
Constants maxAffinityGroupIDsQueryLen and maxAffinityGroupIDsCount are introduced and immediately used in a branch that changes retrieval strategy, with no nearby why-comment or source-limit reference.

Change request
Please add a short why-comment for this branch, including where 4096 and 100 come from and why full-scan fallback is the intended behavior.

Signed-off-by: lhy1024 <admin@liudos.us>
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Mar 4, 2026

Review Complete

Findings: 1 issues
Posted: 1
Duplicates/Skipped: 0

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 4, 2026
Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfzjywxk, D3Hunter, lcwangchao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Mar 4, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/domain/affinity/manager_test.go (1)

160-180: Add explicit 404/414 fallback test cases for GetAffinityGroups.

Current fallback tests cover 400 Bad Request, but shouldFallbackGetAffinityGroups also includes 404 Not Found and 414 Request-URI Too Long. A small table-driven extension would lock all intended compatibility branches.

🧪 Suggested extension
 func TestGetAffinityGroupsFallbackWhenIDsQueryUnsupported(t *testing.T) {
 	ctx := context.Background()
-	client := &mockPDClient{}
-	manager := &pdManager{Client: client}
-
 	ids := []string{"g1"}
-	client.On("GetAffinityGroups", mock.Anything, ids).Return(nil, pdStatusErr(http.StatusBadRequest)).Once()
-	client.On("GetAllAffinityGroups", mock.Anything).Return(
-		map[string]*pdhttp.AffinityGroupState{
-			"g1": affinityGroupState("g1"),
-			"g2": affinityGroupState("g2"),
-		}, nil,
-	).Once()
-
-	result, err := manager.GetAffinityGroups(ctx, ids)
-	require.NoError(t, err)
-	require.Len(t, result, 1)
-	require.Contains(t, result, "g1")
-	require.NotContains(t, result, "g2")
-	client.AssertExpectations(t)
+	for _, status := range []int{
+		http.StatusBadRequest,
+		http.StatusNotFound,
+		http.StatusRequestURITooLong,
+	} {
+		t.Run(http.StatusText(status), func(t *testing.T) {
+			client := &mockPDClient{}
+			manager := &pdManager{Client: client}
+			client.On("GetAffinityGroups", mock.Anything, ids).Return(nil, pdStatusErr(status)).Once()
+			client.On("GetAllAffinityGroups", mock.Anything).Return(
+				map[string]*pdhttp.AffinityGroupState{
+					"g1": affinityGroupState("g1"),
+					"g2": affinityGroupState("g2"),
+				}, nil,
+			).Once()
+
+			result, err := manager.GetAffinityGroups(ctx, ids)
+			require.NoError(t, err)
+			require.Len(t, result, 1)
+			require.Contains(t, result, "g1")
+			require.NotContains(t, result, "g2")
+			client.AssertExpectations(t)
+		})
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/affinity/manager_test.go` around lines 160 - 180, Extend
TestGetAffinityGroupsFallbackWhenIDsQueryUnsupported into a small table-driven
test that iterates over the HTTP statuses used by
shouldFallbackGetAffinityGroups (400, 404, 414) and verifies the fallback to
GetAllAffinityGroups for each case: for each status, configure the mockPDClient
to have GetAffinityGroups return (nil, pdStatusErr(status)) and
GetAllAffinityGroups return the full map, then call pdManager.GetAffinityGroups
and assert no error, result contains only the requested id ("g1") and not "g2",
and call client.AssertExpectations; keep the existing test logic but replace the
single-status setup with a loop over the [http.StatusBadRequest,
http.StatusNotFound, http.StatusRequestURITooLong] table entries so all branches
of shouldFallbackGetAffinityGroups are covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/domain/affinity/BUILD.bazel`:
- Around line 18-30: The BUILD.bazel change (target "affinity_test") and updated
PD client deps require regenerating Bazel metadata: run the repository's make
bazel_prepare to update generated BUILD/WORKSPACE/third_party metadata, add the
produced files to your commit/PR (the generated Bazel outputs that make
bazel_prepare updates), and push those changes so CI can resolve dependencies
consistently.

In `@pkg/domain/affinity/manager.go`:
- Around line 168-174: The isPDHTTPStatusError function currently matches the
full "status: 'NNN Reason'" substring which is brittle; change the check to only
detect the numeric status code portion (e.g., look for "status: 'NNN" or use a
small regex that extracts the three-digit code) so the reason phrase is ignored.
Update isPDHTTPStatusError to build/check for the numeric prefix using the
statusCode (or parse the error string for a leading numeric status) and return
true when the numeric code matches, referencing the existing isPDHTTPStatusError
function to locate and modify the logic.

---

Nitpick comments:
In `@pkg/domain/affinity/manager_test.go`:
- Around line 160-180: Extend
TestGetAffinityGroupsFallbackWhenIDsQueryUnsupported into a small table-driven
test that iterates over the HTTP statuses used by
shouldFallbackGetAffinityGroups (400, 404, 414) and verifies the fallback to
GetAllAffinityGroups for each case: for each status, configure the mockPDClient
to have GetAffinityGroups return (nil, pdStatusErr(status)) and
GetAllAffinityGroups return the full map, then call pdManager.GetAffinityGroups
and assert no error, result contains only the requested id ("g1") and not "g2",
and call client.AssertExpectations; keep the existing test logic but replace the
single-status setup with a loop over the [http.StatusBadRequest,
http.StatusNotFound, http.StatusRequestURITooLong] table entries so all branches
of shouldFallbackGetAffinityGroups are covered.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c2a25900-a128-4c3b-99ba-3cec8b594579

📥 Commits

Reviewing files that changed from the base of the PR and between 3c269da and 361a32d.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • pkg/domain/affinity/BUILD.bazel
  • pkg/domain/affinity/manager.go
  • pkg/domain/affinity/manager_test.go

Comment on lines +18 to +30
go_test(
name = "affinity_test",
timeout = "short",
srcs = ["manager_test.go"],
embed = [":affinity"],
flaky = True,
shard_count = 8,
deps = [
"@com_github_stretchr_testify//mock",
"@com_github_stretchr_testify//require",
"@com_github_tikv_pd_client//http",
],
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the list of all changed files in this PR
echo "=== All changed files in this PR ==="
git diff --name-only --diff-filter=ACMR HEAD~1...HEAD 2>/dev/null | sort || git diff --name-only HEAD^ HEAD 2>/dev/null | sort || echo "Unable to determine changed files"

echo
echo "=== Bazel and module-related file changes ==="
git diff --name-only --diff-filter=ACMR HEAD~1...HEAD 2>/dev/null | grep -E '(BUILD\.bazel|\.bazel$|\.bzl$|go\.mod|go\.sum|DEPS\.bzl|MODULE\.bazel)' | sort || echo "None found"

echo
echo "=== Checking for common Bazel metadata outputs ==="
git diff --name-only --diff-filter=ACMR HEAD~1...HEAD 2>/dev/null | grep -E '(DEPS\.bzl|MODULE\.bazel\.lock|gazelle_python_manifest\.json)' | sort || echo "No metadata files found"

Repository: pingcap/tidb

Length of output: 260


Ensure make bazel_prepare outputs are committed when BUILD.bazel or dependency files change.

This PR modifies BUILD.bazel and updates PD client dependencies. Per repository guidelines, when these files change, the resulting Bazel metadata must be regenerated via make bazel_prepare and committed with the PR to prevent CI resolution drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/affinity/BUILD.bazel` around lines 18 - 30, The BUILD.bazel change
(target "affinity_test") and updated PD client deps require regenerating Bazel
metadata: run the repository's make bazel_prepare to update generated
BUILD/WORKSPACE/third_party metadata, add the produced files to your commit/PR
(the generated Bazel outputs that make bazel_prepare updates), and push those
changes so CI can resolve dependencies consistently.

if err == nil {
return false
}
expectStatus := fmt.Sprintf("status: '%d %s'", statusCode, http.StatusText(statusCode))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Fragile PD HTTP error-string matching breaks mixed-version fallbacks

Why: isPDHTTPStatusError does brittle substring matching on err.Error() to detect HTTP status codes (400/409/414) for mixed-version compatibility fallbacks. However, in the production wiring, GlobalInfoSyncerInit injects pdResponseHandler via WithRespHandler, and the upstream PD HTTP client short-circuits to that handler — skipping its own "request pd http api failed with status: '...'" error formatting. As a result, errors for those status codes will never contain the expected "status: '<code> <reason>'" substring, so the fallbacks silently never trigger, and affinity DDL (create/batch-get groups) can be incorrectly cancelled when PD returns 400/409/414 in mixed-version deployments.

Evidence: pkg/domain/affinity/manager.go:168-172

expectStatus := fmt.Sprintf("status: '%d %s'", statusCode, http.StatusText(statusCode))
return strings.Contains(err.Error(), expectStatus)

Production error path goes through pdResponseHandlerErrHTTPServiceError, which does not embed this substring. Suggest matching on a typed error (e.g., errors.Is/errors.As on the PD client's error type) or checking the numeric code directly rather than parsing the error string.

Signed-off-by: lhy1024 <admin@liudos.us>
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Mar 5, 2026

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

ℹ️ Learn more details on Pantheon AI.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/domain/affinity/manager_test.go (1)

64-256: Add one focused test for DeleteAffinityGroups force-delete behavior.

DeleteAffinityGroups is new in the manager implementation, but this suite does not currently assert its force=true call contract to PD. A small unit test here would close that gap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/affinity/manager_test.go` around lines 64 - 256, Add a unit test
that verifies pdManager.DeleteAffinityGroups calls the PD client with
force=true: create a mockPDClient and pdManager{Client: client}, set up an ids
slice (e.g., []string{"g1"}), expect client.On("DeleteAffinityGroups",
mock.Anything, ids, true).Return(nil).Once(), call
manager.DeleteAffinityGroups(ctx, ids) and assert no error and
client.AssertExpectations(t); use the existing test helpers (mock.Anything,
mockPDClient, pdManager) so the test mirrors the style of other tests in this
file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/domain/affinity/manager_test.go`:
- Around line 64-256: Add a unit test that verifies
pdManager.DeleteAffinityGroups calls the PD client with force=true: create a
mockPDClient and pdManager{Client: client}, set up an ids slice (e.g.,
[]string{"g1"}), expect client.On("DeleteAffinityGroups", mock.Anything, ids,
true).Return(nil).Once(), call manager.DeleteAffinityGroups(ctx, ids) and assert
no error and client.AssertExpectations(t); use the existing test helpers
(mock.Anything, mockPDClient, pdManager) so the test mirrors the style of other
tests in this file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4dbcfb75-9dfc-4d9f-87f9-654ffe7ae968

📥 Commits

Reviewing files that changed from the base of the PR and between 361a32d and 95c2022.

📒 Files selected for processing (3)
  • pkg/domain/affinity/BUILD.bazel
  • pkg/domain/affinity/manager.go
  • pkg/domain/affinity/manager_test.go

Copy link
Copy Markdown

@pantheon-ai pantheon-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Code looks good. No issues found.

@lhy1024
Copy link
Copy Markdown
Contributor Author

lhy1024 commented Mar 5, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 5, 2026

@lhy1024: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lcwangchao
Copy link
Copy Markdown
Collaborator

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Mar 5, 2026
@lhy1024
Copy link
Copy Markdown
Contributor Author

lhy1024 commented Mar 5, 2026

/retest

@ti-chi-bot ti-chi-bot bot merged commit a020ff3 into pingcap:master Mar 5, 2026
33 checks passed
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Mar 5, 2026
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #66708.
But this PR has conflicts, please resolve them!

lhy1024 added a commit to lhy1024/tidb that referenced this pull request Mar 6, 2026
lhy1024 added a commit to ti-chi-bot/tidb that referenced this pull request Mar 6, 2026
lhy1024 added a commit to lhy1024/tidb that referenced this pull request Mar 13, 2026
Signed-off-by: lhy1024 <admin@liudos.us>
ti-chi-bot bot pushed a commit that referenced this pull request Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants