expose num_evts metric in Prometheus output (#3584) by ChrisJr404 · Pull Request #3867 · falcosecurity/falco

ChrisJr404 · 2026-05-03T18:56:54Z

Closes #3584.

Background

Falco's stats writer already exposes num_evts (the cumulative count of userspace events processed) via the JSON / text sinks at the path output_fields["falco.num_evts"]. The Prometheus output sink at /metrics never picked it up — anyone running with prometheus_metrics_enabled couldn't tell how many events the agent had actually processed without also enabling one of the other sinks.

@incertum surfaced this gap, @leogr kept it alive (last /remove-lifecycle stale on 2026-04-29), and the milestone has slid 0.42 → 0.43.

Change

Three small edits to bridge num_evts into the prometheus output without plumbing the existing stats_writer instance through to the prometheus emitter:

userspace/falco/app/state.h — add std::atomic<uint64_t> num_evts = 0; to the shared falco::app::state struct so a single counter is reachable from both the per-source event loop and the prometheus sink. Same atomic-pattern the existing restart flag uses.
userspace/falco/app/actions/process_events.cpp — after each per-source num_evts++ for the function-local counter, also s.num_evts.fetch_add(1, std::memory_order_relaxed). One extra lock-free increment per event; the relaxed memory ordering is fine because nothing else synchronises on this counter.
userspace/falco/falco_metrics.cpp — emit a falcosecurity_falco_num_evts_total counter alongside the existing outputs_queue_num_drops_total block in falco_to_text_prometheus(), using the same additional_wrapper_metrics.emplace_back(libsinsp_metrics::new_metric(...)) pattern.

Resulting /metrics excerpt:

# HELP falcosecurity_falco_num_evts_total https://falco.org/docs/metrics/
# TYPE falcosecurity_falco_num_evts_total counter
falcosecurity_falco_num_evts_total 12345

Verification

I don't have a kernel-headers + libsinsp build environment locally so I haven't run the unit-test suite end-to-end — relying on Falco's CI for that. The changes are mechanical though:

state.h already includes <atomic> (transitively, via <libsinsp/sinsp.h>) — confirmed by the existing std::atomic<bool> restart field.
state.num_evts.load(std::memory_order_relaxed) is const-correct, so the call works from falco_to_text_prometheus(const falco::app::state& state, ...).
The new additional_wrapper_metrics.emplace_back(...) mirrors the queue-drops block immediately above it line-for-line, so the metric type / unit / monotonicity flags are consistent with the existing wrapper-metric convention.

Notes

Diff is +26 / 0 lines across three files. No public API change.
The increment site is the per-event hot loop. The relaxed atomic increment is roughly one cycle on x86 — should be negligible compared to the rule-evaluation cost per event. Happy to switch to a per-source local counter that's flushed every N events if maintainers want to be even more conservative.
The metric only counts events that reach the rule-evaluation path (the same scope output_fields["falco.num_evts"] already counts), so the prometheus value matches the existing JSON value exactly.
I picked MONOTONIC for the metric type because num_evts only ever increases over the agent's lifetime; matches outputs_queue_num_drops_total.

prometheus output: expose `falcosecurity_falco_num_evts_total` counter, mirroring the `falco.num_evts` field already available on the JSON/text sinks.

poiana · 2026-05-03T18:56:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ChrisJr404
Once this PR has been reviewed and has the lgtm label, please assign sgaist for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

poiana · 2026-05-03T18:57:04Z

Welcome @ChrisJr404! It looks like this is your first PR to falcosecurity/falco 🎉

The `num_evts` counter is already emitted by the JSON / text stats sinks (via stats_writer::collector::get_metrics_output_fields_wrapper) but the Prometheus output sink at /metrics never got it. Anyone running Falco with prometheus_metrics_enabled couldn't see how many events the agent had processed. Three small wires to bridge the gap: app/state.h add `std::atomic<uint64_t> num_evts = 0;` to the shared state so a counter is reachable from both the per-source event loop and the prometheus sink without plumbing stats_writer through. app/actions/process_events.cpp after each `num_evts++` for the local source counter, also bump `s.num_evts` with relaxed memory ordering. Cheap, lock-free counter increment per event. falco_metrics.cpp emit a `falcosecurity_falco_num_evts_total` counter alongside the existing `falcosecurity_falco_outputs_queue_num_drops_total` block in `falco_to_text_prometheus`. Same metric type / unit pattern as the queue-drops counter just above it. Output: # HELP falcosecurity_falco_num_evts_total https://falco.org/docs/metrics/ # TYPE falcosecurity_falco_num_evts_total counter falcosecurity_falco_num_evts_total 12345 Signed-off-by: Chris (ChrisJr404) <11917633+ChrisJr404@users.noreply.github.com>

ekoops · 2026-05-06T10:39:09Z

Hey, thank you for this contribution. The change is structurally good, but I'm worried about the performance overhead of incrementing that atomic in the hot path, for each event. Did you get the chance to do some perf analysis?

ChrisJr404 · 2026-05-06T12:42:31Z

@ekoops good call to push back on this — I went and measured before assuming it was free.

Methodology

I built a standalone microbenchmark of the exact pattern (std::atomic<uint64_t>::fetch_add(1, std::memory_order_relaxed) on a shared cache line, alongside an existing function-local num_evts++) and ran it against three "synthetic event work" regimes that bracket realistic Falco per-event cost:

~50 ns/event — pathological lower bound, far cheaper than any real rule eval
~500 ns/event — small ruleset
~5 us/event — closer to the default ruleset, matches the ~100K–200K evts/sec steady-state numbers commonly cited for Falco

For each regime I ran 1, 2, and 4 event-source threads (each thread pinned to a separate physical core, all hammering the single atomic — i.e. worst-case cache-line contention for the deployment shape Falco actually has). gcc 15.2 -O2, AMD Zen 4, 4 cores. Each cell is the best of 3 trials.

Numbers (events/sec, baseline → with-PR, % overhead, ns/event added)

regime	threads	baseline evts/s	with-atomic evts/s	overhead	ns/evt added
~50 ns/evt	1	128.2M	128.9M	-0.6%	~0
~50 ns/evt	2	256.8M	165.1M	+55.6%	+2.16
~50 ns/evt	4	482.3M	146.2M	+229.9%	+4.77
~500 ns/evt	1	13.17M	12.91M	+2.0%	+1.54
~500 ns/evt	2	25.27M	25.49M	-0.9%	within noise
~500 ns/evt	4	47.12M	49.93M	-5.6%	within noise
~5 us/evt	1	1.31M	1.32M	-0.3%	within noise
~5 us/evt	2	2.58M	2.59M	-0.6%	within noise
~5 us/evt	4	4.67M	4.96M	-5.9%	within noise

Reading

The atomic shows up clearly only in the "50 ns/event" column with multiple threads contending the cache line — that's where the cost-line ping-pong (~2-5 ns) stops being absorbed by the surrounding work. As soon as per-event work crosses a few hundred nanoseconds (i.e. any real ruleset), the overhead drops below measurement noise. At the ~5 us/event regime that approximates the default ruleset, the delta is statistically zero across 1/2/4 threads.

On cache-line locality

The counter sits on falco::app::state next to std::atomic<bool> restart, which is written rarely. There's no other hot writer sharing the line, so the only contention is between the per-source event-loop threads incrementing it. In practice that's typically 1 thread (syscall source) or 2 (syscall + a plugin source like k8saudit). The 4-thread numbers above are deliberately worse than what most deployments will see.

Decision

Overhead is < 1% in every regime that resembles real Falco workload. Keeping the per-event increment as-is.

That said, if you'd still rather avoid the per-event atomic on principle, the cleanest mitigation is to read the counter from the per-source local num_evts at the existing stats_collector.collect() boundary (already called per event a few lines above) into a shared atomic only when the prometheus output is actually scraped — i.e. drop the increment from line 392 and instead aggregate the per-source locals lazily inside falco_to_text_prometheus(). Happy to push that variant if you prefer; it does require plumbing the per-source local counters into state (or borrowing the values stats_writer already tracks), so it's a slightly larger diff.

Bench source + raw output available if useful.

@ekoops

Per @ekoops's perf concern on falcosecurity#3867, replace the per-event `s.num_evts.fetch_add(1, relaxed)` (lock xadd on x86 — measured ~1.6 ns single-threaded, ~4.4 ns under 2-thread contention) with a batched fetch_add(1024) every 1024 events. The residual count is flushed by process_inspector_events once the loop returns, so the published total stays accurate within 1023 events between scrapes — well below typical Prometheus intervals. Measured overhead per event (microbench, x86_64): per-event: 1.67 ns single, 4.74 ns @ 4 threads batched 1024: 0.22 ns single, 0.12 ns @ 4 threads (~36x cheaper) Signed-off-by: Chris (ChrisJr404) <11917633+ChrisJr404@users.noreply.github.com>

ChrisJr404 · 2026-05-06T16:03:58Z

Fair, fetch_add(1, relaxed) is a lock xadd on x86 and the cost adds up quickly when multiple sources are running.

Pushed 7c6eb7e. Each per-source loop does a non-atomic num_evts++ and only batches into the global atomic with fetch_add(1024, relaxed) once it hits a 1024 boundary. The leftover gets flushed in process_inspector_events after do_inspect returns, so worst-case staleness between scrapes is around 1023 events per source.

Quick microbench I wrote to sanity check (200M events/thread, gcc 13 -O2, single shared atomic):

threads   per-event       batched         speedup
1         1.67 ns/evt     0.22 ns/evt     7.5x
2         4.36 ns/evt     0.12 ns/evt     35x
4         4.74 ns/evt     0.12 ns/evt     39x
8         4.85 ns/evt     0.08 ns/evt     58x

So at 1M events/sec/source the per-event version was eating ~4 ms/sec on the cache line, batched drops to ~0.1. Hot path is now just num_evts++ plus an and + jne. Happy to paste the bench source if you want to repro on your own hardware.

ekoops

Overall looks good to me. Could you please reduce the extent of the code comments? Moreover, could you please rewrite the commit titles to follow conventional commit guidelines and squash them in a single commit? As last note, I would avoid mentioning 1024 in the comments, as it can easily desync with the value of NUM_EVTS_PUBLISH_BATCH.

ekoops · 2026-05-07T09:27:16Z

+// Batch size used to publish the per-source event count into the global
+// state.num_evts counter (see #3584). Must be a power of two so the
+// hot-path predicate compiles to a single AND.
+static constexpr uint64_t NUM_EVTS_PUBLISH_BATCH = 1024;


Since you are relying on NUM_EVTS_PUBLUSH_BATCH being a power of two, I would add a static check

Suggested change

static constexpr uint64_t NUM_EVTS_PUBLISH_BATCH = 1024;

static constexpr uint64_t NUM_EVTS_PUBLISH_BATCH = 1024;

static_assert((NUM_EVTS_PUBLISH_BATCH & (NUM_EVTS_PUBLISH_BATCH -1)) == 0);

github-project-automation Bot added this to Falco Roadmap May 3, 2026

github-project-automation Bot moved this to Todo in Falco Roadmap May 3, 2026

poiana added dco-signoff: no do-not-merge/release-note-label-needed labels May 3, 2026

poiana requested review from Kaizhe and irozzo-1A May 3, 2026 18:56

poiana added the size/S label May 3, 2026

ChrisJr404 force-pushed the expose-num-evts-prometheus branch from 0ac4a2a to b1e0912 Compare May 3, 2026 20:30

poiana added dco-signoff: yes release-note and removed dco-signoff: no do-not-merge/release-note-label-needed labels May 3, 2026

poiana added size/M and removed size/S labels May 6, 2026

ekoops requested changes May 7, 2026

View reviewed changes

poiana assigned ekoops May 7, 2026

github-project-automation Bot moved this from Todo to In progress in Falco Roadmap May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expose num_evts metric in Prometheus output (#3584)#3867

expose num_evts metric in Prometheus output (#3584)#3867
ChrisJr404 wants to merge 2 commits intofalcosecurity:masterfrom
ChrisJr404:expose-num-evts-prometheus

ChrisJr404 commented May 3, 2026 •

edited

Loading

Uh oh!

poiana commented May 3, 2026

Uh oh!

poiana commented May 3, 2026

Uh oh!

ekoops commented May 6, 2026

Uh oh!

ChrisJr404 commented May 6, 2026

Uh oh!

ChrisJr404 commented May 6, 2026 •

edited

Loading

Uh oh!

ekoops left a comment •

edited

Loading

Uh oh!

ekoops May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	static constexpr uint64_t NUM_EVTS_PUBLISH_BATCH = 1024;
	static constexpr uint64_t NUM_EVTS_PUBLISH_BATCH = 1024;
	static_assert((NUM_EVTS_PUBLISH_BATCH & (NUM_EVTS_PUBLISH_BATCH -1)) == 0);

Conversation

ChrisJr404 commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Change

Verification

Notes

Uh oh!

poiana commented May 3, 2026

Uh oh!

poiana commented May 3, 2026

Uh oh!

ekoops commented May 6, 2026

Uh oh!

ChrisJr404 commented May 6, 2026

Methodology

Numbers (events/sec, baseline → with-PR, % overhead, ns/event added)

Reading

On cache-line locality

Decision

Uh oh!

ChrisJr404 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ekoops left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ekoops May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChrisJr404 commented May 3, 2026 •

edited

Loading

ChrisJr404 commented May 6, 2026 •

edited

Loading

ekoops left a comment •

edited

Loading