Skip to content

resource control: add client-side pre-throttling demand RU/s metric #10581

@JmPotato

Description

@JmPotato

Enhancement Task

Problem

Currently, there is no metric that accurately reflects the real-time RU/s demand from clients before Resource Control throttling takes effect:

  • Client-side avgRUPerSec (group_controller.go) is computed from getRUValueFromConsumption() — actual post-throttling consumption. When a resource group is throttled, requests wait in Reserve(), consumption slows, and avgRUPerSec only reflects the throttled rate.
  • Server-side read_request_unit_max_per_sec / write_request_unit_max_per_sec are derived from Consumption.RRU/WRU reported by clients — also post-throttling values.
  • Server-side sampled_request_unit_per_sec is based on requiredToken in AcquireTokenBuckets, which is avgRUPerSec * targetPeriod * amplification - availableTokens — not a clean demand rate, and lacks per-instance granularity.

This makes it impossible for operators to determine the true workload demand when Resource Control is actively throttling.

Proposal

Add a new client-side Prometheus Gauge that tracks the EMA of demanded RU/s, sampled at the acquireTokens() entry point (before Reserve() throttling):

  • Metric: resource_manager_client_resource_group_demand_ru_per_sec{resource_group="..."}
  • Data source: the RU cost (v) passed to acquireTokens() in group_controller.go, which represents the true per-request demand before any token bucket throttling
  • Smoothing: time-aware EMA (reuse the existing movingAvgFactor logic)

Expected Usage

# Per-instance demand
resource_manager_client_resource_group_demand_ru_per_sec{instance="tidb-0", resource_group="default"}

# Cluster-wide demand for a resource group
sum(resource_manager_client_resource_group_demand_ru_per_sec) by (resource_group)

# Peak demand over time
max_over_time(sum(resource_manager_client_resource_group_demand_ru_per_sec) by (resource_group)[1h])

Benefits

  • Accurate: samples RU cost before throttling, reflects true workload demand
  • Per-instance: client-side metric naturally carries instance label
  • Aggregatable: sum by in Grafana for cluster-wide view
  • Rolling-upgrade friendly: pure client-side change, no proto or PD server changes required

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/enhancementThe issue or PR belongs to an enhancement.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions