Skip to content

metrics: add keyspace label to show tso request details #9778

Closed
bufferflies wants to merge 3 commits intotikv:masterfrom
bufferflies:metrics/keyspace
Closed

metrics: add keyspace label to show tso request details #9778
bufferflies wants to merge 3 commits intotikv:masterfrom
bufferflies:metrics/keyspace

Conversation

@bufferflies
Copy link
Copy Markdown
Contributor

@bufferflies bufferflies commented Sep 24, 2025

What problem does this PR solve?

Issue Number: Close #9780, ref #9707

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. labels Sep 24, 2025
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zhouqiang-cl for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 24, 2025
Signed-off-by: 童剑 <1045931706@qq.com>
Signed-off-by: 童剑 <1045931706@qq.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 25, 2025

Codecov Report

❌ Patch coverage is 93.87755% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.80%. Comparing base (e2f7162) to head (bdd8eed).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9778      +/-   ##
==========================================
- Coverage   76.88%   76.80%   -0.08%     
==========================================
  Files         485      486       +1     
  Lines       77372    77586     +214     
==========================================
+ Hits        59485    59592     +107     
- Misses      14270    14346      +76     
- Partials     3617     3648      +31     
Flag Coverage Δ
unittests 76.80% <93.87%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bufferflies bufferflies marked this pull request as ready for review September 25, 2025 13:06
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 25, 2025
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds keyspace label support to TSO request metrics to provide better observability by enabling users to distinguish TSO requests across different keyspaces. The change modifies the metrics collection system to include keyspace names in TSO request duration tracking.

Key changes:

  • Add keyspace name tracking throughout the TSO client pipeline
  • Update metrics to include keyspace_name label for TSO request duration
  • Refactor RequestDuration metric to be exported and include keyspace information

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
client/servicediscovery/service_discovery.go Add keyspaceName field and parameter to service discovery
client/metrics/metrics.go Export RequestDuration metric and add keyspace_name label
client/inner_client.go Add keyspaceName field and pass to TSO client initialization
client/constants/constants.go Add NullKeyspaceName constant for keyspace-agnostic operations
client/clients/tso/stream_test.go Update test mocks to include keyspaceName parameter
client/clients/tso/stream.go Add keyspaceName to stream builders and TSO stream
client/clients/tso/dispatcher_test.go Update test setup with keyspaceName parameter
client/clients/tso/client.go Add keyspaceName field to TSO client and pass through stream creation
client/client.go Add keyspace name resolution logic and initialization

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

// case, `Recv` may return an error while no request is pending.
if hasReq {
metrics.RequestFailedDurationTSO.Observe(latencySeconds)
successObserver().Observe(latencySeconds)
Copy link

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The success and failed observers are swapped. When there's an error, it should use the failed observer, not the success observer.

Copilot uses AI. Check for mistakes.
if keyspaceID == constants.NullKeyspaceID {
return nil
}
metas, err := c.GetAllKeyspaces(clientCtx, keyspaceID, 1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we introduce a separate GetKeyspaceNameByID call? Fetching all keyspaces seems too heavy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should add a new api to fetch the keyspace name easily and simply. I will do it in next PR

metrics.EstimateTSOLatencyGauge.WithLabelValues(s.streamID).Set(micros * 1e-6)
}
successObserver := sync.OnceValue(func() prometheus.Observer {
return metrics.RequestDuration.WithLabelValues("tso", s.keyspaceName)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For metrics such as batch size, should we also introduce the keyspace name label?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed

Signed-off-by: 童剑 <1045931706@qq.com>
metrics.EstimateTSOLatencyGauge.WithLabelValues(s.streamID).Set(micros * 1e-6)
}
successObserver := sync.OnceValue(func() prometheus.Observer {
return metrics.RequestDuration.WithLabelValues("tso", s.keyspaceName)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have thousands of keyspaces?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics report in the TIDB server, one TIDB server can only have one keyspace name. And the metrics agent collects this into different tenants.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Dec 4, 2025

@bufferflies: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
non-block/pull-unit-test-next-gen b28df15 link false /test pull-unit-test-next-gen
pull-unit-test-next-gen-1 b28df15 link true /test pull-unit-test-next-gen-1
pull-unit-test-next-gen-2 b28df15 link true /test pull-unit-test-next-gen-2
pull-unit-test-next-gen-3 b28df15 link true /test pull-unit-test-next-gen-3

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

metrics: Add new keyspace name label to the tso handler duration

4 participants