store/copr: refine RC paging pre-charge with per-scan EMA#67759
store/copr: refine RC paging pre-charge with per-scan EMA#67759YuhaoZhang00 wants to merge 11 commits intopingcap:masterfrom
Conversation
When Resource Control is enabled, force-enable paging for eligible coprocessor requests and set a byte budget (paging_size_bytes) per page. This limits the scanned bytes of each page so that RU cost is bounded and pre-charged in Phase 1, preventing concurrent workers from all hitting Phase 2 simultaneously. Signed-off-by: JmPotato <github@ipotato.me>
Signed-off-by: JmPotato <github@ipotato.me>
…ation Test rcPagingEligible conditions (RC enabled, resource group, TiKV, DAG) and verify pagingSizeBytes is correctly propagated to copTasks and cleared when small limit disables paging. Signed-off-by: JmPotato <github@ipotato.me>
Signed-off-by: JmPotato <github@ipotato.me>
Add a new session variable `tidb_rc_paging_size_bytes` to allow dynamically controlling the byte budget per page for RC paging via SQL. - Default: 4MB (paging.MaxPagingSizeBytes) - Set to 0 to disable byte-budget paging - Supports both GLOBAL and SESSION scope Usage: SET @@tidb_rc_paging_size_bytes = 4194304; -- 4MB (default) SET @@tidb_rc_paging_size_bytes = 1048576; -- 1MB SET @@tidb_rc_paging_size_bytes = 0; -- disable Signed-off-by: JmPotato <github@ipotato.me>
Signed-off-by: JmPotato <github@ipotato.me>
Increase the bucket count from 10 (max 512KB) to 19 (max 256MB) for the tidb_distsql_copr_resp_size histogram to improve percentile accuracy when observing large coprocessor responses during paging size tuning. Signed-off-by: JmPotato <github@ipotato.me>
- Rename all RC-prefixed identifiers to generic names (e.g. RCPagingSizeBytes → PagingSizeBytes, rcPagingEligible → pagingBytesEligible) since byte-budget paging is general infrastructure, not RC-specific. - Remove RC gates (EnableResourceControl, ResourceGroupName) from pagingBytesEligible; only TiKV + DAG checks remain. - Remove unused MinPagingSizeBytes/MaxPagingSizeBytes constants. - Change DefPagingSizeBytes default from 4MB to 0 (disabled, opt-in). - Move byte-budget paging force-enable before checkStoreBatchCopr to preserve the paging-disables-batch-copr invariant. - Add tidb_paging_size_bytes to SET_VAR hint allowlist. - Fix copr_resp_size histogram: use float division to preserve sub-KiB precision, extend buckets to 256MB, correct help text unit to KiB. - Fix anonymous Paging struct literal in copr_test to include the new PagingSizeBytes field. - Add PagingSizeBytes to TestContextDetach fixture. - Add integration test for tidb_paging_size_bytes session variable. Signed-off-by: JmPotato <github@ipotato.me>
Update the PD client replace directive to pick up commit fda21466b565 which adds Limiter.RefundTokens and fixes the pre-charge token leakage in paging byte budget scenarios. Signed-off-by: JmPotato <github@ipotato.me>
Maintain a time-aware EMA of actual per-RPC MVCC read bytes on copTask and, once enough samples are in, pass the prediction to PD via the new tikvrpc.Request.PredictedReadBytes hint so pre-charge uses the learned estimate instead of the 4 MiB worst-case cap. Falls back to PagingSizeBytes on cold start. Propagated across region/lock retries; no-op for non-paging tasks. Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @YuhaoZhang00. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi @YuhaoZhang00. Thanks for your PR. I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Adds tidb_distsql_copr_ema_observation_total with a {state=cold|ready}
label to quantify, in aggregate, how many per-logical-scan copTask EMA
Observe() calls happen while the EMA is still below its readiness
threshold vs after. This complements the PD-side paging pre-charge
source/bytes counters by showing how often the TiDB side could emit a
non-zero PredictedReadBytes hint for downstream pre-charge.
Signed-off-by: Yuhao Zhang <yhzhang00@outlook.com>
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
|
Added one observability metric on the TiDB side for upcoming e2e validation of the per-logical-scan EMA path:
Paired with the 4 PD-side paging pre-charge metrics (see tikv/pd#10599 comment), this lets the validation report show the cold-start window quantitatively rather than inferring it from QPS alone. Draft status unchanged. |
What problem does this PR solve?
Issue Number: ref #N/A (draft — issue to be filed)
Problem Summary:
Under Resource Control, TiDB pre-charges RUs for each paging coprocessor
RPC using a fixed
paging_size_bytes(currently 4 MiB, set by #67504). Thisworst-case charge typically overshoots the actual scanned bytes by a large
margin, and under a tight resource tier it stalls concurrent workers at
Phase 1 (token acquisition) even when the cluster has spare capacity. The
result is that large
paging_size_bytesvalues collapse QPS on workloadslike a 1500-thread JOIN benchmark, while small values (256 KiB) look fine.
What changed and how does it work?
Maintain a time-aware EMA of the actual per-RPC MVCC read bytes on
copTask, and once enough samples are in (>= 2), pass the prediction toPD via the new
tikvrpc.Request.PredictedReadByteshint so the pre-chargeuses the learned estimate instead of the 4 MiB cap. On cold start and for
non-paging tasks the behavior is unchanged (falls back to
PagingSizeBytes).The EMA state lives on
copTaskand is naturally scoped to one logical scanacross its paging rounds — no cross-RPC or cross-process sync. The
prediction is propagated across region/lock retries so learning is not lost.
Stacked with:
tikv/client-go*: improve log #1944: adds thePredictedReadBytesfield ontikvrpc.Requestand theRequestInfogettertikv/pdstore/tikv: collect exited goroutines when shutdown the server #10599: consumes the hint inBeforeKVRequest/AfterKVRequestCheck List
Tests
Side effects
Documentation
Release note
Draft note: E2E validation on the simulation cluster is pending. Once
the client-go + PD companion PRs land (or are pointed to via go.mod replace
for CI), this PR's CI will turn green. This is a draft for early visibility
and discussion.