Skip to content

feat(metering): add storage size metering and keyspace name caching#9774

Merged
ti-chi-bot[bot] merged 4 commits intotikv:masterfrom
JmPotato:metering_kv_size
Sep 25, 2025
Merged

feat(metering): add storage size metering and keyspace name caching#9774
ti-chi-bot[bot] merged 4 commits intotikv:masterfrom
JmPotato:metering_kv_size

Conversation

@JmPotato
Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: ref #9707.

What is changed and how does it work?

- Introduced storage size metering:  
  - Added `storageSizeCollector` to periodically record row-based and column-based storage usage per keyspace.  
  - Integrated with cluster metering writer for reporting.  
- Extended `RegionInfo` and statistics with `ApproximateColumnarKvSize` tracking.  
- Added keyspace name caching and lookup in `KeyspaceManager` for efficient resolution.  
- Extracted common metering utilities (`NewRUValue`, `NewBytesValue`, constants for source/fields) into `pkg/metering/utils.go`.  

Check List

Tests

  • Unit test
  • Integration test

Release note

None.

@JmPotato JmPotato requested review from okJiang and rleungx September 23, 2025 12:18
@JmPotato JmPotato added the nextgen Indicates that the Issue or PR belongs to the nextgen kernel architecture. label Sep 23, 2025
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 23, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 75.13812% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.89%. Comparing base (f9be8dd) to head (1e341ae).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9774      +/-   ##
==========================================
+ Coverage   76.87%   76.89%   +0.02%     
==========================================
  Files         486      488       +2     
  Lines       77567    77722     +155     
==========================================
+ Hits        59628    59768     +140     
- Misses      14306    14326      +20     
+ Partials     3633     3628       -5     
Flag Coverage Δ
unittests 76.89% <75.13%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

regionBoundsMap[keyspaceName] = keyspace.MakeRegionBound(keyspaceID)
return true
})
log.Info("iterated the region bounds of all keyspaces",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this log necessary

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly used to log the duration cost of previous iteration.

@JmPotato JmPotato requested review from okJiang and rleungx September 25, 2025 05:41
storageSizeInfoList := make([]*storageSizeInfo, 0, len(regionBoundsMap))
// Observe the region stats of each keyspace.
for keyspaceName, regionBounds := range regionBoundsMap {
regionStats := c.GetRegionStatsByRange(regionBounds.TxnLeftBound, regionBounds.TxnRightBound)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it affect performance?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scanning by keyspace is essentially the first level of batching, and then during the internal scan in GetRegionStatsByRange it will further batch in groups of 1000. So my understanding is that as long as the lock on the region meta tree is not held for a long time, the performance impact won’t be significant. Moreover, since this scan runs relatively infrequently, if later testing shows it has a higher impact on performance, we can reduce its frequency, since this metering does not require highly real-time accuracy.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Sep 25, 2025
Signed-off-by: JmPotato <github@ipotato.me>
Signed-off-by: JmPotato <github@ipotato.me>
… collector startup

Signed-off-by: JmPotato <github@ipotato.me>
…ctor

Signed-off-by: JmPotato <github@ipotato.me>
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 25, 2025

@JmPotato: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
non-block/pull-unit-test-next-gen 1e341ae link false /test pull-unit-test-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 25, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: okJiang, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 25, 2025
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Sep 25, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-09-25 08:20:28.879288816 +0000 UTC m=+517638.949782499: ☑️ agreed by okJiang.
  • 2025-09-25 10:35:39.847383768 +0000 UTC m=+525749.917877452: ☑️ agreed by rleungx.

@ti-chi-bot ti-chi-bot bot merged commit 0b32e6e into tikv:master Sep 25, 2025
28 of 29 checks passed
@JmPotato JmPotato deleted the metering_kv_size branch September 28, 2025 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm nextgen Indicates that the Issue or PR belongs to the nextgen kernel architecture. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants