Optimize text format scrape hot path#202
Conversation
239596f to
d7bb69f
Compare
During every scrape, metric name and help were converted from
atom/list to binary via ensure_binary_or_string/1 in create_mf.
With many metric families this repeated work adds up, especially
the atom_to_binary calls that show up in profiling.
Now, extract_common_params/1 returns pre-computed NameBin and
HelpBin, stored in ETS as a 3-tuple {Labels, HelpBin, NameBin}
alongside label definitions. All six built-in metric collectors
(boolean, counter, gauge, histogram, quantile_summary, summary)
pass these binaries directly into create_mf, avoiding per-scrape
conversion entirely.
A normalize_mf_row/1 function in prometheus_metric handles backward
compatibility: old ETS entries with the 2-tuple {Labels, Help}
format are normalized on read, so hot upgrades don't crash.
d7bb69f to
b4a8202
Compare
|
@the-mikedavis WDYT about the above ☝🏽😄 |
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
the-mikedavis
left a comment
There was a problem hiding this comment.
Looks promising! I will rerun my tests with RabbitMQ
lukebakken
left a comment
There was a problem hiding this comment.
Hello! I did a drive-by review with the 🧞
collect_metrics/2 first argument is now always a binary
create_mf/5 calls Collector:collect_metrics(Name, CollectorData) passing whatever was given as its first argument. After this PR, all in-tree metric modules call create_mf(NameBin, HelpBin, ..., ?MODULE, Data) with a pre-computed binary, so collect_metrics/2 now always receives a binary as its first argument.
The in-tree implementations all ignore it (_NameBin) and recover the original name from the data tuple, so nothing breaks here. But the documented pattern in prometheus_collector shows:
collect_metrics(erlang_vm_bytes_total, Memory) ->
...
create_gauge(Name, Help, Data) ->
prometheus_model_helpers:create_mf(Name, Help, gauge, ?MODULE, Data).An external collector following this pattern - with atom-matched collect_metrics/2 clauses and create_mf/5 - would get a function clause error at scrape time if Name is now a binary. The collect_metrics/2 callback spec says Name :: prometheus_metric:name() which permits binaries, so it is not technically a spec violation, but the behaviour change is silent and the documented example does not prepare implementers for it.
Is this an intentional interface change? If so, it may be worth a note in the changelog and an update to the prometheus_collector doc example to show atom-matched clauses are no longer safe.
Backward-compat normalization path has no test coverage
normalize_mf_row/1 has two clauses: the new 3-tuple fast path and a backward-compat clause for old ETS entries that still hold the 2-tuple {Labels, Help} shape. Codecov confirms the backward-compat clause is not exercised by the test suite.
This clause is only reachable during a hot upgrade from a pre-PR node. A test that inserts a raw 2-tuple ETS entry and calls metrics/2 would cover it.
Profiling (tprof) of a RabbitMQ workload shows gauge_metric/2 at
16.32% and counter_metric/2 at 2.45% of total process time, with
~2M calls each allocating 12 words of intermediate records that the
text format immediately destructures and discards.
This commit applies several targeted optimizations:
- Inline gauge_metric/2, counter_metric/2, and label_pair/1 to
eliminate function call overhead on these ~2M-call-per-scrape
functions.
- Add a fast path in create_mf/4 that detects when the input list
already contains #'Metric'{} records (as all built-in collectors
produce), skipping the metrics_from_tuples/2 dispatch entirely.
This avoids the per-element is_record check and type-based
dispatch for every metric.
- Replace lists:map(fun label_pair/1, Labels) with a list
comprehension, which the compiler can optimize better when
combined with the inline directive.
- Add integer and float clauses to ensure_binary_or_string/1,
avoiding the expensive io_lib:format("~p", [Val]) fallback for
numeric label values.
metrics_from_tuples/2 previously traversed the list twice: once
via filter_undefined_metrics (lists:filter) to remove undefined
entries, then a list comprehension to convert tuples to records.
Profiling shows lists:filter alone at 3.26% of scrape time with
~2.4M elements.
Fold both operations into a single list comprehension with a
guard filter. Apply the same pattern to the create_mf/4 fast path
for pre-built #'Metric'{} records.
These two functions are called once per metric series during every scrape (~2.3M calls each in profiled RabbitMQ workloads), together accounting for ~15.6% of scrape time. Inlining eliminates the function call overhead and allows the compiler to optimize the binary append operations across call boundaries.
b4a8202 to
e7a8bcf
Compare
|
Thanks @lukebakken! Waiting for some RMQ tests then @the-mikedavis 🔥 |
|
Thanks a lot for working on this. The changes look good to me, but at the same time, I don't see any meaningful improvements when scraping /metrics/per-object with 100k classic queues imported. I will take another look in a bit, but can you share how you tested it and what your results were? |
mkuratczyk
left a comment
There was a problem hiding this comment.
While I don't see an overall improvement in the time to scrape, flamgraphs confirm less time spent in metrics_from_tuples and lists:filter. Thanks!
|
I don't have services with that gigantic amount of metrics so I didn't have a measurable time difference other than Thanks for running the tests, good to confirm no regressions at least! |
Profiling (tprof) of a RabbitMQ workload with ~2.3M metric series per scrape revealed that a significant portion of scrape time is spent on repeated conversions and intermediate allocations that can be avoided. This spawned from rabbitmq/rabbitmq-server#14885, which spawned itself into #196. I have more changes in mind but they'll be breaking so for now this PR proposes only backwards-compatible stuff:
atom_to_binary/iolist_to_binarywere called per metric family on every scrape; now done once atinsert_mftime and stored in ETS. Includes backward-compat normalization for old ETS entries.gauge_metric/2,counter_metric/2, andlabel_pair/1are inlined (these are ~2M calls/scrape at 12 words each). A fast path increate_mf/4skipsmetrics_from_tuplesdispatch when metrics are already#'Metric'{}records. Numeric clauses added toensure_binary_or_string/1to avoidio_lib:formatfallback.metrics_from_tuples/2previously ranlists:filter/2then a comprehension (two passes over ~2.4M elements,lists:filter/2alone at 3.26% of scrape time). Now a single comprehension with a guard.render_series/4andrender_value/2— called ~2.3M times each per scrape (~15.6% combined), inlining eliminates call overhead and enables cross-boundary binary append optimization.