*: split bazel coverage test into build and test phases#67829
*: split bazel coverage test into build and test phases#67829winoros wants to merge 13 commits intopingcap:masterfrom
Conversation
|
@winoros I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMakefile coverage targets were changed to a two-phase Bazel flow (build with coverage then coverage), added variables to disable remote cache and control coverage concurrency/targets, and .bazelrc gains CI build-event flags for asynchronous publishing. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #67829 +/- ##
=================================================
- Coverage 77.5964% 49.1566% -28.4398%
=================================================
Files 1982 1796 -186
Lines 548885 514805 -34080
=================================================
- Hits 425915 253061 -172854
- Misses 122165 261446 +139281
+ Partials 805 298 -507
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Temporary CI experiment: I squashed a small local code-change branch into this PR branch to avoid Bazel remote cache fully reusing previous results and to measure CI speed. This temporary commit will be reverted after the measurement. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/statistics/handle/syncload/stats_syncload.go`:
- Around line 334-346: The code directly assigns sessVars.ResourceGroupName and
sessVars.StmtCtx.ResourceGroupName which bypasses metrics updates; change these
direct writes to use sessVars.SetResourceGroupName(...) for session-level
updates and use the documented setter or an equivalent method for the StmtCtx
resource-group mutation (or call the session setter for the statement context if
that's the intended flow) and likewise restore the previous values in the defer
by calling SetResourceGroupName(prevSessionResourceGroupName) instead of
assigning sessVars.ResourceGroupName, while keeping restoration of
StmtCtx.ResourceGroupName via its proper setter or documented API to ensure
metrics are incremented/decremented correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 81774f01-54db-4fcd-955a-c8d7d829b69b
📒 Files selected for processing (9)
br/pkg/metautil/BUILD.bazelpkg/domain/infosync/BUILD.bazelpkg/domain/infosync/info.gopkg/domain/infosync/resource_manager_client.gopkg/importsdk/BUILD.bazelpkg/resourcegroup/checker.gopkg/resourcegroup/tests/resource_group_test.gopkg/statistics/handle/syncload/BUILD.bazelpkg/statistics/handle/syncload/stats_syncload.go
✅ Files skipped from review due to trivial changes (4)
- pkg/statistics/handle/syncload/BUILD.bazel
- br/pkg/metautil/BUILD.bazel
- pkg/resourcegroup/checker.go
- pkg/domain/infosync/BUILD.bazel
| sessVars := sctx.GetSessionVars() | ||
| prevPriority := sessVars.StmtCtx.Priority | ||
| prevStmtResourceGroupName := sessVars.StmtCtx.ResourceGroupName | ||
| prevSessionResourceGroupName := sessVars.ResourceGroupName | ||
| sessVars.StmtCtx.Priority = mysql.HighPriority | ||
| sessVars.StmtCtx.ResourceGroupName = resourcegroup.DefaultHighPriorityResourceGroupName | ||
| // Keep StmtCtx and session resource group aligned to avoid fallback to the session default group. | ||
| sessVars.ResourceGroupName = resourcegroup.DefaultHighPriorityResourceGroupName | ||
| defer func() { | ||
| sctx.GetSessionVars().StmtCtx.Priority = mysql.NoPriority | ||
| sessVars.StmtCtx.Priority = prevPriority | ||
| sessVars.StmtCtx.ResourceGroupName = prevStmtResourceGroupName | ||
| sessVars.ResourceGroupName = prevSessionResourceGroupName | ||
| }() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Contract/comment on SessionVars.ResourceGroupName =="
rg -n --type=go -C2 "please don't change it directly|SetResourceGroupName changes the resource group name" pkg/sessionctx/variable/session.go
echo
echo "== Current direct assignments in sync load path =="
rg -n --type=go -C2 "ResourceGroupName\\s*=" pkg/statistics/handle/syncload/stats_syncload.go
echo
echo "== Available setter usage =="
rg -n --type=go "SetResourceGroupName\\(" pkg/statistics/handle/syncload/stats_syncload.go pkg/sessionctx/variable/session.goRepository: pingcap/tidb
Length of output: 1479
Use SessionVars.SetResourceGroupName() instead of direct field assignment for metrics consistency.
Lines 341 and 345 bypass the setter, which is documented as the required way to update this field. The field carries an explicit note: "please don't change it directly. Use SetResourceGroupName, because it'll need to inc/dec the metrics." Direct assignment skips the metrics side-effects and violates the established mutation contract.
Proposed fix
sessVars := sctx.GetSessionVars()
prevPriority := sessVars.StmtCtx.Priority
prevStmtResourceGroupName := sessVars.StmtCtx.ResourceGroupName
prevSessionResourceGroupName := sessVars.ResourceGroupName
sessVars.StmtCtx.Priority = mysql.HighPriority
sessVars.StmtCtx.ResourceGroupName = resourcegroup.DefaultHighPriorityResourceGroupName
// Keep StmtCtx and session resource group aligned to avoid fallback to the session default group.
- sessVars.ResourceGroupName = resourcegroup.DefaultHighPriorityResourceGroupName
+ sessVars.SetResourceGroupName(resourcegroup.DefaultHighPriorityResourceGroupName)
defer func() {
sessVars.StmtCtx.Priority = prevPriority
sessVars.StmtCtx.ResourceGroupName = prevStmtResourceGroupName
- sessVars.ResourceGroupName = prevSessionResourceGroupName
+ sessVars.SetResourceGroupName(prevSessionResourceGroupName)
}()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/statistics/handle/syncload/stats_syncload.go` around lines 334 - 346, The
code directly assigns sessVars.ResourceGroupName and
sessVars.StmtCtx.ResourceGroupName which bypasses metrics updates; change these
direct writes to use sessVars.SetResourceGroupName(...) for session-level
updates and use the documented setter or an equivalent method for the StmtCtx
resource-group mutation (or call the session setter for the statement context if
that's the intended flow) and likewise restore the previous values in the defer
by calling SetResourceGroupName(prevSessionResourceGroupName) instead of
assigning sessVars.ResourceGroupName, while keeping restoration of
StmtCtx.ResourceGroupName via its proper setter or documented API to ensure
metrics are incremented/decremented correctly.
ad2542a to
49d2fb8
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
Makefile (1)
718-722: Optional: deduplicate the Bazel target set shared withbazel_test.
BAZEL_COVERAGE_TARGETSis the same list hard-coded inbazel_test(lines 715-716). If a target is ever added/removed (e.g. a newtests/<kind>), the two will drift silently. Consider promoting this into a shared variable (e.g.BAZEL_UNIT_TEST_TARGETS) used by bothbazel_testand the coverage targets.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 718 - 722, BAZEL_COVERAGE_TARGETS duplicates the hard-coded target list used by the bazel_test recipe; extract the shared test target list into a new variable (e.g. BAZEL_UNIT_TEST_TARGETS) containing the long "//... -//cmd/... -//tests/..." sequence, replace the inline lists in both bazel_test and the BAZEL_COVERAGE_TARGETS assignment to reference BAZEL_UNIT_TEST_TARGETS, and ensure any existing overrides (like exclusions for specific tests) are preserved by appending/removing entries relative to BAZEL_UNIT_TEST_TARGETS rather than duplicating the full list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.bazelrc:
- Around line 29-32: The CI Bazel flags use
--experimental_build_event_upload_strategy=fully_async together with
--build_event_publish_all_actions which can silently drop BES events; update the
CI config so coverage/test runs use a synchronous upload strategy (e.g., replace
or override fully_async with wait_for_upload_complete) or scope
publish_all_actions to smaller runs, and modify the bazel_coverage_test
invocation in the Makefile to ensure it runs with wait_for_upload_complete (or a
non-fully_async strategy) so coverage/test event uploads are awaited and
failures surface to CI.
In `@Makefile`:
- Line 684: The Makefile currently hard-wires BAZEL_NO_REMOTE_CACHE_CONFIG :=
--remote_cache= --noremote_accept_cached --noremote_upload_local_results which
must not ship to master and is inconsistently applied between the two coverage
targets; change the assignment to an opt-in form (BAZEL_NO_REMOTE_CACHE_CONFIG
?= with empty default) so it is only set by measurement CI, and then either
remove its usage from the bazel_coverage_test invocation or apply it
consistently to bazel_coverage_test_ddlargsv1 so both targets
(bazel_coverage_test and bazel_coverage_test_ddlargsv1) use the same cache
behavior; ensure references to BAZEL_NO_REMOTE_CACHE_CONFIG at the existing
invocation sites are updated accordingly.
- Around line 731-732: Add a new configurable variable (e.g. BAZEL_BES_FLAGS
with an empty default) and replace the hardcoded BES flags in the
bazel_coverage_test recipe with this variable; also apply the same
$(BAZEL_BES_FLAGS) insertion to bazel_coverage_test_ddlargsv1 so both targets
use the same optional BES flags and CI can opt-in via environment without
impacting local builds.
---
Nitpick comments:
In `@Makefile`:
- Around line 718-722: BAZEL_COVERAGE_TARGETS duplicates the hard-coded target
list used by the bazel_test recipe; extract the shared test target list into a
new variable (e.g. BAZEL_UNIT_TEST_TARGETS) containing the long "//...
-//cmd/... -//tests/..." sequence, replace the inline lists in both bazel_test
and the BAZEL_COVERAGE_TARGETS assignment to reference BAZEL_UNIT_TEST_TARGETS,
and ensure any existing overrides (like exclusions for specific tests) are
preserved by appending/removing entries relative to BAZEL_UNIT_TEST_TARGETS
rather than duplicating the full list.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| build:ci --build_event_publish_all_actions | ||
| test:ci --build_event_publish_all_actions | ||
| build:ci --experimental_build_event_upload_strategy=fully_async | ||
| test:ci --experimental_build_event_upload_strategy=fully_async |
There was a problem hiding this comment.
Heads-up: fully_async + publish_all_actions can silently drop events.
--experimental_build_event_upload_strategy=fully_async intentionally decouples BES upload from the build lifecycle, so upload errors (timeouts, backend 5xx) are not surfaced as build failures. Combined with --build_event_publish_all_actions, the BES stream now includes every action event, which can be large for full-repo coverage runs. If the BES backend (see bazel_coverage_test in Makefile) is flaky or rate-limits, you may lose coverage/test event data without any CI signal. Consider wait_for_upload_complete at least for coverage runs whose primary purpose is to publish results.
bazel experimental_build_event_upload_strategy fully_async error handling
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.bazelrc around lines 29 - 32, The CI Bazel flags use
--experimental_build_event_upload_strategy=fully_async together with
--build_event_publish_all_actions which can silently drop BES events; update the
CI config so coverage/test runs use a synchronous upload strategy (e.g., replace
or override fully_async with wait_for_upload_complete) or scope
publish_all_actions to smaller runs, and modify the bazel_coverage_test
invocation in the Makefile to ensure it runs with wait_for_upload_complete (or a
non-fully_async strategy) so coverage/test event uploads are awaited and
failures surface to CI.
| --run_under="cd $(CURDIR) && " \ | ||
| //tools/tazel:tazel | ||
|
|
||
| BAZEL_NO_REMOTE_CACHE_CONFIG := --remote_cache= --noremote_accept_cached --noremote_upload_local_results |
There was a problem hiding this comment.
Temporary cache-bypass must not ship to master, and is inconsistent between the two coverage targets.
Per the PR description, BAZEL_NO_REMOTE_CACHE_CONFIG := --remote_cache= --noremote_accept_cached --noremote_upload_local_results is a measurement aid — it disables both accepting and uploading remote cache results for every invocation that references it. Left in place, this will noticeably slow bazel_coverage_test on CI (every build is effectively a from-scratch local build). Two concerns:
- This variable (and its uses at lines 727 and 731) should be reverted before this PR leaves
do-not-mergestatus. Please gate it behind an explicit opt-in (e.g.BAZEL_NO_REMOTE_CACHE_CONFIG ?=with empty default, set only by the measurement CI job) rather than hard-wiring it into the recipe. - It is applied to
bazel_coverage_testbut not tobazel_coverage_test_ddlargsv1(lines 740, 744). If the intent is to measure, the asymmetry makes the two runs not directly comparable; if the intent is production, the asymmetry is a bug. Either way, please align them.
🔧 Suggested opt-in form
-BAZEL_NO_REMOTE_CACHE_CONFIG := --remote_cache= --noremote_accept_cached --noremote_upload_local_results
+# Opt-in override used only for cache-busting measurement runs; leave empty for normal CI.
+BAZEL_NO_REMOTE_CACHE_CONFIG ?=Also applies to: 727-727, 731-731
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` at line 684, The Makefile currently hard-wires
BAZEL_NO_REMOTE_CACHE_CONFIG := --remote_cache= --noremote_accept_cached
--noremote_upload_local_results which must not ship to master and is
inconsistently applied between the two coverage targets; change the assignment
to an opt-in form (BAZEL_NO_REMOTE_CACHE_CONFIG ?= with empty default) so it is
only set by measurement CI, and then either remove its usage from the
bazel_coverage_test invocation or apply it consistently to
bazel_coverage_test_ddlargsv1 so both targets (bazel_coverage_test and
bazel_coverage_test_ddlargsv1) use the same cache behavior; ensure references to
BAZEL_NO_REMOTE_CACHE_CONFIG at the existing invocation sites are updated
accordingly.
| bazel $(BAZEL_GLOBAL_CONFIG) --nohome_rc coverage $(BAZEL_CMD_CONFIG) $(BAZEL_NO_REMOTE_CACHE_CONFIG) $(BAZEL_INSTRUMENTATION_FILTER) --jobs=$(BAZEL_COVERAGE_TEST_JOBS) --local_test_jobs=$(BAZEL_COVERAGE_TEST_JOBS) --build_tests_only --test_keep_going=false \ | ||
| --bes_backend=grpcs://beplessproxy.hawkingrei.com --bes_results_url=https://bepless.hawkingrei.com/ \ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for references to the BES endpoint in the codebase
rg "beplessproxy|bepless|hawkingrei" -iRepository: pingcap/tidb
Length of output: 50369
🏁 Script executed:
# Check the Makefile context around the flagged lines
fd Makefile -x cat -n {} | grep -A 10 -B 5 "bepless"Repository: pingcap/tidb
Length of output: 1405
🏁 Script executed:
# Search for any documentation or comments related to BES configuration
rg "BES|bes_backend|bes_results_url" -A 2 -B 2Repository: pingcap/tidb
Length of output: 50370
🏁 Script executed:
# Check git history for this endpoint (check recent commits)
git log --oneline --all -20 -- Makefile | head -20Repository: pingcap/tidb
Length of output: 1382
Externalize BES configuration to make it optional and environment-specific.
The hardcoded --bes_backend=grpcs://beplessproxy.hawkingrei.com --bes_results_url=https://bepless.hawkingrei.com/ flags on line 732 tie bazel_coverage_test to a developer-managed endpoint with no fallback. Additionally, bazel_coverage_test_ddlargsv1 (line 744) does not include these flags, creating an inconsistency in coverage data collection.
Introduce a configurable variable with empty defaults so CI can opt-in to BES uploads without affecting local builds:
+BAZEL_BES_BACKEND ?=
+BAZEL_BES_RESULTS_URL ?=
+BAZEL_BES_FLAGS = $(if $(BAZEL_BES_BACKEND),--bes_backend=$(BAZEL_BES_BACKEND) --bes_results_url=$(BAZEL_BES_RESULTS_URL),)
@@
bazel $(BAZEL_GLOBAL_CONFIG) --nohome_rc coverage $(BAZEL_CMD_CONFIG) $(BAZEL_NO_REMOTE_CACHE_CONFIG) $(BAZEL_INSTRUMENTATION_FILTER) --jobs=$(BAZEL_COVERAGE_TEST_JOBS) --local_test_jobs=$(BAZEL_COVERAGE_TEST_JOBS) --build_tests_only --test_keep_going=false \
- --bes_backend=grpcs://beplessproxy.hawkingrei.com --bes_results_url=https://bepless.hawkingrei.com/ \
+ $(BAZEL_BES_FLAGS) \Apply the same $(BAZEL_BES_FLAGS) variable to both bazel_coverage_test and bazel_coverage_test_ddlargsv1 to ensure consistent behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` around lines 731 - 732, Add a new configurable variable (e.g.
BAZEL_BES_FLAGS with an empty default) and replace the hardcoded BES flags in
the bazel_coverage_test recipe with this variable; also apply the same
$(BAZEL_BES_FLAGS) insertion to bazel_coverage_test_ddlargsv1 so both targets
use the same optional BES flags and CI can opt-in via environment without
impacting local builds.
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
|
/retest |
…into bazel-test-two-phase
|
/retest |
1 similar comment
|
/retest |
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
0864523 to
ff0f901
Compare
|
/retest |
…into bazel-test-two-phase
|
/retest |
|
/retest |
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
What problem does this PR solve?
Issue Number: N/A
Problem Summary:
bazel_coverage_testused a singlebazel coverageinvocation, so the build and test phases had to share the same concurrency setting.What changed and how does it work?
bazel_coverage_testandbazel_coverage_test_ddlargsv1into two phases.bazel build --collect_code_coveragewith higher build concurrency.bazel coveragewith the existing test concurrency andlocal_test_jobs.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit