Skip to content

client/resource_group: cache request source RU metrics#10588

Open
YuhaoZhang00 wants to merge 5 commits intotikv:masterfrom
YuhaoZhang00:rg-request-source-metrics
Open

client/resource_group: cache request source RU metrics#10588
YuhaoZhang00 wants to merge 5 commits intotikv:masterfrom
YuhaoZhang00:rg-request-source-metrics

Conversation

@YuhaoZhang00
Copy link
Copy Markdown
Contributor

@YuhaoZhang00 YuhaoZhang00 commented Apr 9, 2026

What problem does this PR solve?

Issue Number: ref pingcap/tidb#64339.

client-go is moving RU-by-request-source accounting out of the interceptor hot path.

Add pd/client's own the request-source RU metrics and cache the corresponding metric handles.

Relative PR:

What is changed and how does it work?

This PR makes pd/client request accounting aware of RequestSource and records RU-by-request-source metrics inside the existing resource-group controller.

Implementation details:

  • add RequestSource() to controller.RequestInfo
  • add RequestSourceRUCounter under resource_manager_client_request
  • cache rru/wru counters per (resource_group, request_source) in a shared per-resource-group state managed by groupCostController. Reuse the same request-source metric state across normal / tombstone / revived group controllers. Delete cached handles and Prometheus series when the resource group is finally cleaned up
  • record request-side and response-side RU deltas through the existing accounting flow

This keeps the existing metric dimensions, but moves the metric ownership to pd/client and avoids repeated WithLabelValues() on the hot path in client-go.

Change log (2026-04-13)

Before this change, request-source metric state was controller-instance scoped, which could break cleanup across tombstone / revive.

  • keep request-source metric state per resource group instead of per controller instance
  • preserve request-source metric bookkeeping across tombstone / revive paths
  • clean up request-source metric state on final resource-group cleanup

Check List

Tests

  • Unit test
  • Manual test

performed ADD INDEX locally, the DDL-related RU showed up such as:

  • internal_ddl wru: +56.40898437500003
  • leader_internal_ddl rru: +37.54666388932296
  • internal_DistTask wru: +40.74453125
  • leader_internal_DistTask rru: +59.991488047526154

, but no fine-grained request_source matching add_index / merge_temp_index appeared in the new metric.

This is consistent with the bypass logic working in client-go: the fine-grained add_index / merge_temp_index requests are bypassed before entering pd/client RU accounting, while other non-bypassed DDL-related requests in the same workflow are still visible through coarse DDL sources.

Release note

None.

Summary by CodeRabbit

  • New Features

    • Per-request-source RU/WRU metrics with Prometheus counters and per-group shared state.
  • Bug Fixes

    • Request-source metrics and counters are cleaned up and reset when resource groups are removed, tombstoned, or controllers shut down to avoid stale exports.
  • Tests

    • Extensive tests covering caching, recording, cleanup, lifecycle, and re-creation of request-source metrics.
  • Chores

    • Request metadata contract updated to include request-source information.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 9, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Per-request-source RU/WRU metrics were added: controllers now cache per-group request-source metric state, record per-request-source RU/WRU deltas to a new Prometheus counter, RequestInfo requires RequestSource(), and global/group cleanup removes and unregisters those per-source metrics.

Changes

Cohort / File(s) Summary
Global Controller
client/resource_group/controller/global_controller.go
Added requestSourceStates cache; reset RequestSourceRUCounter on controller shutdown; pass per-group request-source state into newGroupCostController; ensure request-source state is cleaned on group deletion/tombstone and periodic cleanup paths.
Group Controller & Metrics Integration
client/resource_group/controller/group_controller.go, client/resource_group/controller/metrics/metrics.go
Wired shared requestSourceMetricsState into groupMetricsCollection; added lazy per-request-source counter creation, addRequestSourceRU to record RU/WRU deltas, and cleanup() to delete labeled series; introduced RequestSourceRUCounter CounterVec and registered it; updated failed-request label constant rename (errTypetypeLabel).
Model & Tests Helpers
client/resource_group/controller/model.go, client/resource_group/controller/testutil.go
Extended RequestInfo interface with RequestSource() string; updated TestRequestInfo to include requestSource field and implement the method.
Tests
client/resource_group/controller/request_source_metrics_test.go, client/resource_group/controller/group_controller_test.go, client/resource_group/controller/testutil.go
Added comprehensive tests exercising per-source metrics caching, recording, cleanup, tombstone/revive scenarios; updated test call sites to pass newRequestSourceMetricsState(...) to newGroupCostController.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant GC as GroupController
  participant Glo as GlobalController
  participant P as Prometheus
  Client->>GC: send request (RequestInfo with RequestSource)
  GC->>GC: compute RU/WRU delta
  GC->>GC: getOrCreateRequestSourceMetricsState(resource_group, request_source)
  GC->>P: increment RequestSourceRUCounter{resource_group, request_source, type}
  GC->>Client: respond
  Note over Glo,GC: On shutdown / cleanup
  Glo->>GC: trigger cleanup/tombstone
  GC->>P: delete labeled series via cleanup()
  GC->>Glo: remove cached request-source state
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

size/L, type/development, lgtm

Suggested reviewers

  • JmPotato
  • rleungx
  • disksing

Poem

🐇
I count the hops of requests and dew,
Per-source tallies made anew.
Maps held warm, then swept away—
Metrics bloom, then sleep by day.
A rabbit nods; the counters play.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: caching request source RU metrics, which aligns with the core objective of moving RU-by-request-source accounting into pd/client.
Description check ✅ Passed The description includes problem statement (issue reference), detailed technical explanation of changes, implementation details, and test information, though it lacks the formatted commit-message block that the template specifies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 9, 2026

Hi @YuhaoZhang00. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added contribution This PR is from a community contributor. dco-signoff: no Indicates the PR's author has not signed dco. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 9, 2026
@YuhaoZhang00 YuhaoZhang00 changed the title tso, server: add debug logs for TSO sync, closure, and forwarding paths client/resource_group: cache request source RU metrics Apr 9, 2026
@YuhaoZhang00 YuhaoZhang00 force-pushed the rg-request-source-metrics branch from 1dd52fd to ebd4f63 Compare April 9, 2026 03:23
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 9, 2026
Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
@YuhaoZhang00 YuhaoZhang00 force-pushed the rg-request-source-metrics branch from ebd4f63 to 492976a Compare April 9, 2026 03:29
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. and removed dco-signoff: no Indicates the PR's author has not signed dco. labels Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 89.36170% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.96%. Comparing base (5885cec) to head (492976a).
⚠️ Report is 58 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10588      +/-   ##
==========================================
+ Coverage   78.80%   78.96%   +0.15%     
==========================================
  Files         523      532       +9     
  Lines       70529    71931    +1402     
==========================================
+ Hits        55580    56799    +1219     
- Misses      10955    11107     +152     
- Partials     3994     4025      +31     
Flag Coverage Δ
unittests 78.96% <89.36%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@YuhaoZhang00 YuhaoZhang00 marked this pull request as ready for review April 9, 2026 05:00
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
client/resource_group/controller/request_source_metrics_test.go (1)

24-35: Consider increasing channel buffer or using unbuffered pattern for robustness.

The channel buffer of 8 could cause the goroutine to block if the collector produces more metrics than the buffer size before the main routine starts consuming. While this is unlikely in controlled test scenarios, a more robust pattern would be to use an unbuffered channel and start consuming immediately, or increase the buffer size.

♻️ Suggested improvement for robustness
 func collectorMetricCount(collector prometheus.Collector) int {
-	ch := make(chan prometheus.Metric, 8)
+	ch := make(chan prometheus.Metric, 128)
 	go func() {
 		collector.Collect(ch)
 		close(ch)
 	}()
 	count := 0
 	for range ch {
 		count++
 	}
 	return count
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/resource_group/controller/request_source_metrics_test.go` around lines
24 - 35, In collectorMetricCount, avoid the fixed small buffered channel which
can block if collector emits >8 metrics; change ch := make(chan
prometheus.Metric, 8) to either an unbuffered channel (ch := make(chan
prometheus.Metric)) so the main goroutine immediately consumes while the
goroutine runs, or increase the buffer to a safely large value (e.g., 256/1024)
to prevent blocking; ensure this change is applied in the collectorMetricCount
function that calls collector.Collect(ch).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@client/resource_group/controller/request_source_metrics_test.go`:
- Around line 24-35: In collectorMetricCount, avoid the fixed small buffered
channel which can block if collector emits >8 metrics; change ch := make(chan
prometheus.Metric, 8) to either an unbuffered channel (ch := make(chan
prometheus.Metric)) so the main goroutine immediately consumes while the
goroutine runs, or increase the buffer to a safely large value (e.g., 256/1024)
to prevent blocking; ensure this change is applied in the collectorMetricCount
function that calls collector.Collect(ch).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bae52b87-fccd-416c-a4c3-3b5c37206d41

📥 Commits

Reviewing files that changed from the base of the PR and between b21a183 and 492976a.

📒 Files selected for processing (6)
  • client/resource_group/controller/global_controller.go
  • client/resource_group/controller/group_controller.go
  • client/resource_group/controller/metrics/metrics.go
  • client/resource_group/controller/model.go
  • client/resource_group/controller/request_source_metrics_test.go
  • client/resource_group/controller/testutil.go

@YuhaoZhang00
Copy link
Copy Markdown
Contributor Author

/cc @JmPotato ptal

@ti-chi-bot ti-chi-bot bot requested a review from JmPotato April 9, 2026 05:17
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 9, 2026

@YuhaoZhang00: GitHub didn't allow me to request PR reviews from the following users: ptal.

Note that only tikv members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @JmPotato ptal

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@YuhaoZhang00
Copy link
Copy Markdown
Contributor Author

/release-note-none

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 9, 2026
func (mc *groupMetricsCollection) cleanupRequestSourceMetrics(resourceGroupName string) {
mc.sourceMetricsMu.Lock()
defer mc.sourceMetricsMu.Unlock()
for requestSource := range mc.sourceMetrics {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it leak if the getOrCreateRequestSourceMetrics create a new one?

Copy link
Copy Markdown
Contributor Author

@YuhaoZhang00 YuhaoZhang00 Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No extra leak from this cache:

1. These cached request-source metrics are cleaned up when the resource group is cleaned up (cleanupRequestSourceMetrics() called), so they do not stay around forever.

2. The request_source cardinality is also bounded in practice. In TiDB/client-go it currently comes from a small set of hardcoded values (< 100), so we do not expect it to grow uncontrollably.

If the concern is about concurrency issue, all sourceMetrics operations are wrapped by mutex locks.

@YuhaoZhang00
Copy link
Copy Markdown
Contributor Author

/hold

Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
@YuhaoZhang00 YuhaoZhang00 force-pushed the rg-request-source-metrics branch from ca98e85 to 3fe18d6 Compare April 13, 2026 08:18
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. and removed dco-signoff: no Indicates the PR's author has not signed dco. labels Apr 13, 2026
@YuhaoZhang00
Copy link
Copy Markdown
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2026
@YuhaoZhang00 YuhaoZhang00 requested a review from rleungx April 13, 2026 08:19
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/resource_group/controller/global_controller.go`:
- Around line 483-489: getOrCreateRequestSourceMetricsState may return a stale
requestSourceMetricsState that has been marked closed by cleanup(), causing
callers to stop emitting metrics; change getOrCreateRequestSourceMetricsState to
detect state.closed after a Load/LoadOrStore and, if closed, retry by creating a
fresh requestSourceMetricsState and atomically replacing the map entry (e.g.,
loop: Load, if missing create and LoadOrStore, if loaded and closed attempt
CompareAndSwap/Store after validating it is still closed or Delete+retry) so
callers never get a closed state; apply the same pattern to the other similar
helpers noted (the other getOrCreate variants around the 492-497 and 624-626
ranges) so closed entries are always recreated instead of reused.
- Around line 339-342: The shutdown path currently calls the global
RequestSourceRUCounter.Reset() which wipes metric series for other controllers;
instead, invoke this controller's cleanup() to delete only the labels tracked in
requestSourceStates (the existing cleanup method already calls DeleteLabelValues
for each tracked request source). Replace the RequestSourceRUCounter.Reset()
call in the loopCtx.Done() case with a call to cleanup() (while keeping
ResourceGroupStatusGauge.Reset() if intended) so only this controller's metrics
are removed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2d11e34f-f662-42e2-8f08-75f5a8dbd897

📥 Commits

Reviewing files that changed from the base of the PR and between ca98e85 and 3fe18d6.

📒 Files selected for processing (4)
  • client/resource_group/controller/global_controller.go
  • client/resource_group/controller/group_controller.go
  • client/resource_group/controller/group_controller_test.go
  • client/resource_group/controller/request_source_metrics_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • client/resource_group/controller/group_controller_test.go

@rleungx
Copy link
Copy Markdown
Member

rleungx commented Apr 13, 2026

/ok-to-test

@ti-chi-bot ti-chi-bot bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Apr 13, 2026
return tmp.(*groupCostController), loaded
}

func (c *ResourceGroupsController) getOrCreateRequestSourceMetricsState(name string) *requestSourceMetricsState {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a race between create and cleanup

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed by switching to LoadAndDelete in cleanupRequestSourceMetricsState. Once LoadAndDelete succeeds, the entry is gone from the map, so getOrCreateRequestSourceMetricsState will never observe a closed state, and the next Load/LoadOrStore creates a fresh one. PTAL

…equest-source metrics state

Use LoadAndDelete in cleanupRequestSourceMetricsState so the map entry
is removed atomically before the state is closed. Any hot-path goroutine
still holding the old reference no-ops via the closed check, and the
next getOrCreateRequestSourceMetricsState allocates a fresh state
instead of returning a closed one. Addresses rleungx's review comment.

Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
@YuhaoZhang00 YuhaoZhang00 requested a review from rleungx April 14, 2026 06:11
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
client/resource_group/controller/global_controller.go (1)

339-342: ⚠️ Potential issue | 🔴 Critical

Call Reset() only for metrics owned by this controller, not global metrics.

The global metrics.RequestSourceRUCounter.Reset() call on shutdown is incorrect. Test code (e.g., resource_manager_test.go:656–661) demonstrates that multiple ResourceGroupsController instances can coexist and be started together in the same process. When one controller's loopCtx exits, calling global Reset() will erase metrics for all other active controllers in that process, causing incorrect metric state.

Instead, track which metrics belong to this controller and reset only those, or pass a per-controller metrics instance that is cleaned up independently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/resource_group/controller/global_controller.go` around lines 339 -
342, The shutdown branch in ResourceGroupsController (when c.loopCtx.Done()
triggers) currently calls global metrics.RequestSourceRUCounter.Reset(), which
wipes metrics for all coexisting controllers; replace this by not resetting
global metrics and instead either (a) remove the RequestSourceRUCounter.Reset()
call and only reset controller-owned metrics (e.g., keep or scope
metrics.ResourceGroupStatusGauge.Reset() if it is per-controller) or (b)
refactor to use a per-controller metrics instance (e.g., a struct passed into
ResourceGroupsController) and call Reset() only on that instance during
shutdown; update the code in the c.loopCtx.Done() case accordingly (remove the
global Reset call or swap it for resetting the per-controller metrics) so only
metrics owned by this controller are cleared.
🧹 Nitpick comments (1)
client/resource_group/controller/request_source_metrics_test.go (1)

24-35: Consider adding a timeout to prevent test hangs.

The collectorMetricCount helper spawns a goroutine that calls Collect(). If Collect() blocks indefinitely (e.g., due to a deadlock in the metrics implementation), the test would hang. Consider adding a timeout:

♻️ Optional: Add timeout protection
 func collectorMetricCount(collector prometheus.Collector) int {
-	ch := make(chan prometheus.Metric, 8)
+	ch := make(chan prometheus.Metric, 64)
 	go func() {
 		collector.Collect(ch)
 		close(ch)
 	}()
 	count := 0
-	for range ch {
-		count++
+	timeout := time.After(5 * time.Second)
+	for {
+		select {
+		case _, ok := <-ch:
+			if !ok {
+				return count
+			}
+			count++
+		case <-timeout:
+			return count
+		}
 	}
-	return count
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/resource_group/controller/request_source_metrics_test.go` around lines
24 - 35, The helper collectorMetricCount can hang if collector.Collect blocks;
add a timeout to avoid test hangs by replacing the range over ch with an
explicit receive loop that selects between reading from ch and a time.After
timeout (e.g., 1s), returning the current count when the timeout fires; update
collectorMetricCount (and import time) and reference the prometheus.Collector
parameter and the collectorMetricCount function so the goroutine won't cause a
stuck test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@client/resource_group/controller/global_controller.go`:
- Around line 339-342: The shutdown branch in ResourceGroupsController (when
c.loopCtx.Done() triggers) currently calls global
metrics.RequestSourceRUCounter.Reset(), which wipes metrics for all coexisting
controllers; replace this by not resetting global metrics and instead either (a)
remove the RequestSourceRUCounter.Reset() call and only reset controller-owned
metrics (e.g., keep or scope metrics.ResourceGroupStatusGauge.Reset() if it is
per-controller) or (b) refactor to use a per-controller metrics instance (e.g.,
a struct passed into ResourceGroupsController) and call Reset() only on that
instance during shutdown; update the code in the c.loopCtx.Done() case
accordingly (remove the global Reset call or swap it for resetting the
per-controller metrics) so only metrics owned by this controller are cleared.

---

Nitpick comments:
In `@client/resource_group/controller/request_source_metrics_test.go`:
- Around line 24-35: The helper collectorMetricCount can hang if
collector.Collect blocks; add a timeout to avoid test hangs by replacing the
range over ch with an explicit receive loop that selects between reading from ch
and a time.After timeout (e.g., 1s), returning the current count when the
timeout fires; update collectorMetricCount (and import time) and reference the
prometheus.Collector parameter and the collectorMetricCount function so the
goroutine won't cause a stuck test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8ee6c63-fb84-447f-bde9-0f5de75476f8

📥 Commits

Reviewing files that changed from the base of the PR and between 3fe18d6 and 11e1a6c.

📒 Files selected for processing (3)
  • client/resource_group/controller/global_controller.go
  • client/resource_group/controller/group_controller.go
  • client/resource_group/controller/request_source_metrics_test.go

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 14, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 14, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 14, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-14 06:36:50.813430756 +0000 UTC m=+1456616.018790803: ☑️ agreed by rleungx.

@ti-chi-bot ti-chi-bot bot added the approved label Apr 14, 2026
@YuhaoZhang00
Copy link
Copy Markdown
Contributor Author

/retest

Name: "ru_total",
Help: "Counter of request RU consumption grouped by resource group and request source.",
ConstLabels: constLabels,
}, []string{newResourceGroupNameLabel, "request_source", errType})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although errType is also a "type," the semantics here are just a plain type and not related to err, right? It would be better to give errType a more generic name.

}
ms := initMetrics(group.Name, group.Name)
if sourceState == nil {
sourceState = newRequestSourceMetricsState(group.Name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sourceState created here is not registered in the requestSourceStates map, which means it will never be cleaned up by cleanupRequestSourceMetricsState. Currently, only tests pass nil; in production, the path always passes a non-nil value through getOrCreateRequestSourceMetricsState. I suggest adding a comment clarifying that this fallback is for test scenarios only, to avoid future misuse that could lead to metrics leaks.

…abel const

errType was reused for the new RequestSourceRUCounter where the "type"
dimension means rru/wru, not error kind. Rename to typeLabel to avoid
the misleading prefix and align with the file's xxxLabel convention.
Also extract requestSourceLabel for the new counter.

The Prometheus label name "type" is unchanged.

Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
newGroupCostController had a fallback that constructed a fresh
requestSourceMetricsState when callers passed nil. The state created
by the fallback was never registered in ResourceGroupsController's
requestSourceStates map, so it would never be cleaned up. All three
production call sites pass a registered state via
getOrCreateRequestSourceMetricsState; only two pre-existing tests
relied on nil.

Remove the fallback and update the two tests to construct their own
state explicitly.

Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
@YuhaoZhang00 YuhaoZhang00 requested a review from JmPotato April 14, 2026 13:42
@YuhaoZhang00
Copy link
Copy Markdown
Contributor Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 14, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
client/resource_group/controller/metrics/metrics.go (1)

53-54: Consider improving the GoDoc comment.

The placeholder comment doesn't follow Go conventions. GoDoc comments should start with the identifier name and describe the purpose.

📝 Suggested improvement
-	// RequestSourceRUCounter comments placeholder
+	// RequestSourceRUCounter tracks RU consumption by resource group, request source, and RU type (rru/wru).
 	RequestSourceRUCounter *prometheus.CounterVec
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/resource_group/controller/metrics/metrics.go` around lines 53 - 54,
The GoDoc for the RequestSourceRUCounter field is a placeholder; update it to
follow Go conventions by starting the comment with "RequestSourceRUCounter" and
briefly describing what the prometheus.CounterVec tracks (e.g., counts RU
requests by source and any label semantics). Edit the comment immediately above
the RequestSourceRUCounter declaration in metrics.go so it clearly states the
purpose, units, and important labels for consumers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@client/resource_group/controller/metrics/metrics.go`:
- Around line 53-54: The GoDoc for the RequestSourceRUCounter field is a
placeholder; update it to follow Go conventions by starting the comment with
"RequestSourceRUCounter" and briefly describing what the prometheus.CounterVec
tracks (e.g., counts RU requests by source and any label semantics). Edit the
comment immediately above the RequestSourceRUCounter declaration in metrics.go
so it clearly states the purpose, units, and important labels for consumers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1f272965-9cd4-4619-a5a5-88c217c3998c

📥 Commits

Reviewing files that changed from the base of the PR and between 11e1a6c and c7fbfee.

📒 Files selected for processing (3)
  • client/resource_group/controller/group_controller.go
  • client/resource_group/controller/group_controller_test.go
  • client/resource_group/controller/metrics/metrics.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • client/resource_group/controller/group_controller_test.go

@YuhaoZhang00 YuhaoZhang00 force-pushed the rg-request-source-metrics branch from 3163b67 to c7fbfee Compare April 14, 2026 14:15
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 14, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456, multiple issues should use full syntax for each issue and be separated by a comma, like: Issue Number: close #123, ref #456.

📖 For more info, you can check the "Linking issues" section in the CONTRIBUTING.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/needs-linked-issue needs-1-more-lgtm Indicates a PR needs 1 more LGTM. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants