Optimize text format scrape hot path by NelsonVides · Pull Request #202 · prometheus-erl/prometheus.erl

NelsonVides · 2026-04-14T10:44:55Z

Profiling (tprof) of a RabbitMQ workload with ~2.3M metric series per scrape revealed that a significant portion of scrape time is spent on repeated conversions and intermediate allocations that can be avoided. This spawned from rabbitmq/rabbitmq-server#14885, which spawned itself into #196. I have more changes in mind but they'll be breaking so for now this PR proposes only backwards-compatible stuff:

Pre-compute metric name/help as binaries at declaration time — atom_to_binary/iolist_to_binary were called per metric family on every scrape; now done once at insert_mf time and stored in ETS. Includes backward-compat normalization for old ETS entries.
Inline and fast-path metric record creation — gauge_metric/2, counter_metric/2, and label_pair/1 are inlined (these are ~2M calls/scrape at 12 words each). A fast path in create_mf/4 skips metrics_from_tuples dispatch when metrics are already #'Metric'{} records. Numeric clauses added to ensure_binary_or_string/1 to avoid io_lib:format fallback.
Merge filter and map into single list traversal — metrics_from_tuples/2 previously ran lists:filter/2 then a comprehension (two passes over ~2.4M elements, lists:filter/2 alone at 3.26% of scrape time). Now a single comprehension with a guard.
Inline render_series/4 and render_value/2 — called ~2.3M times each per scrape (~15.6% combined), inlining eliminates call overhead and enables cross-boundary binary append optimization.

During every scrape, metric name and help were converted from atom/list to binary via ensure_binary_or_string/1 in create_mf. With many metric families this repeated work adds up, especially the atom_to_binary calls that show up in profiling. Now, extract_common_params/1 returns pre-computed NameBin and HelpBin, stored in ETS as a 3-tuple {Labels, HelpBin, NameBin} alongside label definitions. All six built-in metric collectors (boolean, counter, gauge, histogram, quantile_summary, summary) pass these binaries directly into create_mf, avoiding per-scrape conversion entirely. A normalize_mf_row/1 function in prometheus_metric handles backward compatibility: old ETS entries with the 2-tuple {Labels, Help} format are normalized on read, so hot upgrades don't crash.

NelsonVides · 2026-04-14T10:59:37Z

@the-mikedavis WDYT about the above ☝🏽😄

codecov · 2026-04-14T11:00:17Z

Codecov Report

❌ Patch coverage is 94.64286% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/prometheus_metric_spec.erl	66.66%	3 Missing ⚠️

Files with missing lines	Coverage Δ
src/formats/prometheus_text_format.erl	`94.52% <ø> (ø)`
src/metrics/prometheus_boolean.erl	`93.84% <100.00%> (ø)`
src/metrics/prometheus_counter.erl	`92.30% <100.00%> (ø)`
src/metrics/prometheus_gauge.erl	`96.55% <100.00%> (ø)`
src/metrics/prometheus_histogram.erl	`95.76% <100.00%> (ø)`
src/metrics/prometheus_quantile_summary.erl	`100.00% <100.00%> (ø)`
src/metrics/prometheus_summary.erl	`96.15% <100.00%> (ø)`
src/model/prometheus_model_helpers.erl	`93.25% <100.00%> (-1.13%)`	⬇️
src/prometheus_collector.erl	`97.14% <ø> (ø)`
src/prometheus_metric.erl	`100.00% <100.00%> (ø)`
... and 3 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

the-mikedavis

Looks promising! I will rerun my tests with RabbitMQ

lukebakken

Hello! I did a drive-by review with the 🧞

`collect_metrics/2` first argument is now always a binary

create_mf/5 calls Collector:collect_metrics(Name, CollectorData) passing whatever was given as its first argument. After this PR, all in-tree metric modules call create_mf(NameBin, HelpBin, ..., ?MODULE, Data) with a pre-computed binary, so collect_metrics/2 now always receives a binary as its first argument.

The in-tree implementations all ignore it (_NameBin) and recover the original name from the data tuple, so nothing breaks here. But the documented pattern in prometheus_collector shows:

collect_metrics(erlang_vm_bytes_total, Memory) ->
    ...

create_gauge(Name, Help, Data) ->
    prometheus_model_helpers:create_mf(Name, Help, gauge, ?MODULE, Data).

An external collector following this pattern - with atom-matched collect_metrics/2 clauses and create_mf/5 - would get a function clause error at scrape time if Name is now a binary. The collect_metrics/2 callback spec says Name :: prometheus_metric:name() which permits binaries, so it is not technically a spec violation, but the behaviour change is silent and the documented example does not prepare implementers for it.

Is this an intentional interface change? If so, it may be worth a note in the changelog and an update to the prometheus_collector doc example to show atom-matched clauses are no longer safe.

Backward-compat normalization path has no test coverage

normalize_mf_row/1 has two clauses: the new 3-tuple fast path and a backward-compat clause for old ETS entries that still hold the 2-tuple {Labels, Help} shape. Codecov confirms the backward-compat clause is not exercised by the test suite.

This clause is only reachable during a hot upgrade from a pre-PR node. A test that inserts a raw 2-tuple ETS entry and calls metrics/2 would cover it.

Profiling (tprof) of a RabbitMQ workload shows gauge_metric/2 at 16.32% and counter_metric/2 at 2.45% of total process time, with ~2M calls each allocating 12 words of intermediate records that the text format immediately destructures and discards. This commit applies several targeted optimizations: - Inline gauge_metric/2, counter_metric/2, and label_pair/1 to eliminate function call overhead on these ~2M-call-per-scrape functions. - Add a fast path in create_mf/4 that detects when the input list already contains #'Metric'{} records (as all built-in collectors produce), skipping the metrics_from_tuples/2 dispatch entirely. This avoids the per-element is_record check and type-based dispatch for every metric. - Replace lists:map(fun label_pair/1, Labels) with a list comprehension, which the compiler can optimize better when combined with the inline directive. - Add integer and float clauses to ensure_binary_or_string/1, avoiding the expensive io_lib:format("~p", [Val]) fallback for numeric label values.

metrics_from_tuples/2 previously traversed the list twice: once via filter_undefined_metrics (lists:filter) to remove undefined entries, then a list comprehension to convert tuples to records. Profiling shows lists:filter alone at 3.26% of scrape time with ~2.4M elements. Fold both operations into a single list comprehension with a guard filter. Apply the same pattern to the create_mf/4 fast path for pre-built #'Metric'{} records.

These two functions are called once per metric series during every scrape (~2.3M calls each in profiled RabbitMQ workloads), together accounting for ~15.6% of scrape time. Inlining eliminates the function call overhead and allows the compiler to optimize the binary append operations across call boundaries.

lukebakken

🚀

NelsonVides · 2026-04-15T07:05:46Z

Thanks @lukebakken! Waiting for some RMQ tests then @the-mikedavis 🔥

mkuratczyk · 2026-04-16T09:20:14Z

Thanks a lot for working on this. The changes look good to me, but at the same time, I don't see any meaningful improvements when scraping /metrics/per-object with 100k classic queues imported. I will take another look in a bit, but can you share how you tested it and what your results were?

mkuratczyk

While I don't see an overall improvement in the time to scrape, flamgraphs confirm less time spent in metrics_from_tuples and lists:filter. Thanks!

NelsonVides · 2026-04-16T14:46:23Z

I don't have services with that gigantic amount of metrics so I didn't have a measurable time difference other than tprof profiles saying it actually took less time and memory overall :)

Thanks for running the tests, good to confirm no regressions at least!

NelsonVides self-assigned this Apr 14, 2026

NelsonVides added the enhancement label Apr 14, 2026

NelsonVides force-pushed the perf/optimise_scrape_hot_path branch 2 times, most recently from 239596f to d7bb69f Compare April 14, 2026 10:53

NelsonVides force-pushed the perf/optimise_scrape_hot_path branch from d7bb69f to b4a8202 Compare April 14, 2026 10:57

NelsonVides requested review from lhoguin and mkuratczyk April 14, 2026 11:00

the-mikedavis reviewed Apr 14, 2026

View reviewed changes

Comment thread src/model/prometheus_model_helpers.erl Outdated

lukebakken reviewed Apr 14, 2026

View reviewed changes

NelsonVides added 4 commits April 14, 2026 16:42

Apply review comments

e7a8bcf

NelsonVides force-pushed the perf/optimise_scrape_hot_path branch from b4a8202 to e7a8bcf Compare April 14, 2026 14:46

NelsonVides requested review from lukebakken and the-mikedavis April 14, 2026 14:46

lukebakken approved these changes Apr 14, 2026

View reviewed changes

mkuratczyk approved these changes Apr 16, 2026

View reviewed changes

NelsonVides merged commit 16d242f into master Apr 16, 2026
4 of 5 checks passed

NelsonVides deleted the perf/optimise_scrape_hot_path branch April 16, 2026 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize text format scrape hot path#202

Optimize text format scrape hot path#202
NelsonVides merged 5 commits intomasterfrom
perf/optimise_scrape_hot_path

NelsonVides commented Apr 14, 2026 •

edited

Loading

Uh oh!

NelsonVides commented Apr 14, 2026

Uh oh!

codecov Bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

the-mikedavis left a comment

Uh oh!

Uh oh!

lukebakken left a comment

Uh oh!

lukebakken left a comment

Uh oh!

NelsonVides commented Apr 15, 2026

Uh oh!

mkuratczyk commented Apr 16, 2026

Uh oh!

mkuratczyk left a comment

Uh oh!

NelsonVides commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

NelsonVides commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NelsonVides commented Apr 14, 2026

Uh oh!

codecov Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

the-mikedavis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukebakken left a comment

Choose a reason for hiding this comment

collect_metrics/2 first argument is now always a binary

Backward-compat normalization path has no test coverage

Uh oh!

lukebakken left a comment

Choose a reason for hiding this comment

Uh oh!

NelsonVides commented Apr 15, 2026

Uh oh!

mkuratczyk commented Apr 16, 2026

Uh oh!

mkuratczyk left a comment

Choose a reason for hiding this comment

Uh oh!

NelsonVides commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NelsonVides commented Apr 14, 2026 •

edited

Loading

codecov Bot commented Apr 14, 2026 •

edited

Loading

`collect_metrics/2` first argument is now always a binary