Skip to content

Optimize text format scrape hot path#202

Merged
NelsonVides merged 5 commits intomasterfrom
perf/optimise_scrape_hot_path
Apr 16, 2026
Merged

Optimize text format scrape hot path#202
NelsonVides merged 5 commits intomasterfrom
perf/optimise_scrape_hot_path

Conversation

@NelsonVides
Copy link
Copy Markdown
Member

@NelsonVides NelsonVides commented Apr 14, 2026

Profiling (tprof) of a RabbitMQ workload with ~2.3M metric series per scrape revealed that a significant portion of scrape time is spent on repeated conversions and intermediate allocations that can be avoided. This spawned from rabbitmq/rabbitmq-server#14885, which spawned itself into #196. I have more changes in mind but they'll be breaking so for now this PR proposes only backwards-compatible stuff:

  • Pre-compute metric name/help as binaries at declaration time — atom_to_binary/iolist_to_binary were called per metric family on every scrape; now done once at insert_mf time and stored in ETS. Includes backward-compat normalization for old ETS entries.
  • Inline and fast-path metric record creation — gauge_metric/2, counter_metric/2, and label_pair/1 are inlined (these are ~2M calls/scrape at 12 words each). A fast path in create_mf/4 skips metrics_from_tuples dispatch when metrics are already #'Metric'{} records. Numeric clauses added to ensure_binary_or_string/1 to avoid io_lib:format fallback.
  • Merge filter and map into single list traversal — metrics_from_tuples/2 previously ran lists:filter/2 then a comprehension (two passes over ~2.4M elements, lists:filter/2 alone at 3.26% of scrape time). Now a single comprehension with a guard.
  • Inline render_series/4 and render_value/2 — called ~2.3M times each per scrape (~15.6% combined), inlining eliminates call overhead and enables cross-boundary binary append optimization.

@NelsonVides NelsonVides self-assigned this Apr 14, 2026
@NelsonVides NelsonVides force-pushed the perf/optimise_scrape_hot_path branch 2 times, most recently from 239596f to d7bb69f Compare April 14, 2026 10:53
During every scrape, metric name and help were converted from
atom/list to binary via ensure_binary_or_string/1 in create_mf.
With many metric families this repeated work adds up, especially
the atom_to_binary calls that show up in profiling.

Now, extract_common_params/1 returns pre-computed NameBin and
HelpBin, stored in ETS as a 3-tuple {Labels, HelpBin, NameBin}
alongside label definitions. All six built-in metric collectors
(boolean, counter, gauge, histogram, quantile_summary, summary)
pass these binaries directly into create_mf, avoiding per-scrape
conversion entirely.

A normalize_mf_row/1 function in prometheus_metric handles backward
compatibility: old ETS entries with the 2-tuple {Labels, Help}
format are normalized on read, so hot upgrades don't crash.
@NelsonVides NelsonVides force-pushed the perf/optimise_scrape_hot_path branch from d7bb69f to b4a8202 Compare April 14, 2026 10:57
@NelsonVides
Copy link
Copy Markdown
Member Author

@the-mikedavis WDYT about the above ☝🏽😄

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 94.64286% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/prometheus_metric_spec.erl 66.66% 3 Missing ⚠️
Files with missing lines Coverage Δ
src/formats/prometheus_text_format.erl 94.52% <ø> (ø)
src/metrics/prometheus_boolean.erl 93.84% <100.00%> (ø)
src/metrics/prometheus_counter.erl 92.30% <100.00%> (ø)
src/metrics/prometheus_gauge.erl 96.55% <100.00%> (ø)
src/metrics/prometheus_histogram.erl 95.76% <100.00%> (ø)
src/metrics/prometheus_quantile_summary.erl 100.00% <100.00%> (ø)
src/metrics/prometheus_summary.erl 96.15% <100.00%> (ø)
src/model/prometheus_model_helpers.erl 93.25% <100.00%> (-1.13%) ⬇️
src/prometheus_collector.erl 97.14% <ø> (ø)
src/prometheus_metric.erl 100.00% <100.00%> (ø)
... and 3 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@the-mikedavis the-mikedavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising! I will rerun my tests with RabbitMQ

Comment thread src/model/prometheus_model_helpers.erl Outdated
Copy link
Copy Markdown
Contributor

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! I did a drive-by review with the 🧞


collect_metrics/2 first argument is now always a binary

create_mf/5 calls Collector:collect_metrics(Name, CollectorData) passing whatever was given as its first argument. After this PR, all in-tree metric modules call create_mf(NameBin, HelpBin, ..., ?MODULE, Data) with a pre-computed binary, so collect_metrics/2 now always receives a binary as its first argument.

The in-tree implementations all ignore it (_NameBin) and recover the original name from the data tuple, so nothing breaks here. But the documented pattern in prometheus_collector shows:

collect_metrics(erlang_vm_bytes_total, Memory) ->
    ...

create_gauge(Name, Help, Data) ->
    prometheus_model_helpers:create_mf(Name, Help, gauge, ?MODULE, Data).

An external collector following this pattern - with atom-matched collect_metrics/2 clauses and create_mf/5 - would get a function clause error at scrape time if Name is now a binary. The collect_metrics/2 callback spec says Name :: prometheus_metric:name() which permits binaries, so it is not technically a spec violation, but the behaviour change is silent and the documented example does not prepare implementers for it.

Is this an intentional interface change? If so, it may be worth a note in the changelog and an update to the prometheus_collector doc example to show atom-matched clauses are no longer safe.

Backward-compat normalization path has no test coverage

normalize_mf_row/1 has two clauses: the new 3-tuple fast path and a backward-compat clause for old ETS entries that still hold the 2-tuple {Labels, Help} shape. Codecov confirms the backward-compat clause is not exercised by the test suite.

This clause is only reachable during a hot upgrade from a pre-PR node. A test that inserts a raw 2-tuple ETS entry and calls metrics/2 would cover it.

Profiling (tprof) of a RabbitMQ workload shows gauge_metric/2 at
16.32% and counter_metric/2 at 2.45% of total process time, with
~2M calls each allocating 12 words of intermediate records that the
text format immediately destructures and discards.

This commit applies several targeted optimizations:

- Inline gauge_metric/2, counter_metric/2, and label_pair/1 to
  eliminate function call overhead on these ~2M-call-per-scrape
  functions.

- Add a fast path in create_mf/4 that detects when the input list
  already contains #'Metric'{} records (as all built-in collectors
  produce), skipping the metrics_from_tuples/2 dispatch entirely.
  This avoids the per-element is_record check and type-based
  dispatch for every metric.

- Replace lists:map(fun label_pair/1, Labels) with a list
  comprehension, which the compiler can optimize better when
  combined with the inline directive.

- Add integer and float clauses to ensure_binary_or_string/1,
  avoiding the expensive io_lib:format("~p", [Val]) fallback for
  numeric label values.
metrics_from_tuples/2 previously traversed the list twice: once
via filter_undefined_metrics (lists:filter) to remove undefined
entries, then a list comprehension to convert tuples to records.
Profiling shows lists:filter alone at 3.26% of scrape time with
~2.4M elements.

Fold both operations into a single list comprehension with a
guard filter. Apply the same pattern to the create_mf/4 fast path
for pre-built #'Metric'{} records.
These two functions are called once per metric series during every
scrape (~2.3M calls each in profiled RabbitMQ workloads), together
accounting for ~15.6% of scrape time. Inlining eliminates the
function call overhead and allows the compiler to optimize the
binary append operations across call boundaries.
@NelsonVides NelsonVides force-pushed the perf/optimise_scrape_hot_path branch from b4a8202 to e7a8bcf Compare April 14, 2026 14:46
Copy link
Copy Markdown
Contributor

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 :shipit:

@NelsonVides
Copy link
Copy Markdown
Member Author

Thanks @lukebakken! Waiting for some RMQ tests then @the-mikedavis 🔥

@mkuratczyk
Copy link
Copy Markdown
Contributor

Thanks a lot for working on this. The changes look good to me, but at the same time, I don't see any meaningful improvements when scraping /metrics/per-object with 100k classic queues imported. I will take another look in a bit, but can you share how you tested it and what your results were?

Copy link
Copy Markdown
Contributor

@mkuratczyk mkuratczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't see an overall improvement in the time to scrape, flamgraphs confirm less time spent in metrics_from_tuples and lists:filter. Thanks!

@NelsonVides
Copy link
Copy Markdown
Member Author

I don't have services with that gigantic amount of metrics so I didn't have a measurable time difference other than tprof profiles saying it actually took less time and memory overall :)

Thanks for running the tests, good to confirm no regressions at least!

@NelsonVides NelsonVides merged commit 16d242f into master Apr 16, 2026
4 of 5 checks passed
@NelsonVides NelsonVides deleted the perf/optimise_scrape_hot_path branch April 16, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants