expose num_evts metric in Prometheus output (#3584)#3867
expose num_evts metric in Prometheus output (#3584)#3867ChrisJr404 wants to merge 2 commits intofalcosecurity:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ChrisJr404 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @ChrisJr404! It looks like this is your first PR to falcosecurity/falco 🎉 |
The `num_evts` counter is already emitted by the JSON / text stats
sinks (via stats_writer::collector::get_metrics_output_fields_wrapper)
but the Prometheus output sink at /metrics never got it. Anyone
running Falco with prometheus_metrics_enabled couldn't see how many
events the agent had processed.
Three small wires to bridge the gap:
app/state.h
add `std::atomic<uint64_t> num_evts = 0;` to the shared state
so a counter is reachable from both the per-source event loop
and the prometheus sink without plumbing stats_writer through.
app/actions/process_events.cpp
after each `num_evts++` for the local source counter, also
bump `s.num_evts` with relaxed memory ordering. Cheap, lock-free
counter increment per event.
falco_metrics.cpp
emit a `falcosecurity_falco_num_evts_total` counter alongside the
existing `falcosecurity_falco_outputs_queue_num_drops_total` block
in `falco_to_text_prometheus`. Same metric type / unit pattern as
the queue-drops counter just above it.
Output:
# HELP falcosecurity_falco_num_evts_total https://falco.org/docs/metrics/
# TYPE falcosecurity_falco_num_evts_total counter
falcosecurity_falco_num_evts_total 12345
Signed-off-by: Chris (ChrisJr404) <11917633+ChrisJr404@users.noreply.github.com>
0ac4a2a to
b1e0912
Compare
|
Hey, thank you for this contribution. The change is structurally good, but I'm worried about the performance overhead of incrementing that atomic in the hot path, for each event. Did you get the chance to do some perf analysis? |
|
@ekoops good call to push back on this — I went and measured before assuming it was free. MethodologyI built a standalone microbenchmark of the exact pattern (
For each regime I ran 1, 2, and 4 event-source threads (each thread pinned to a separate physical core, all hammering the single atomic — i.e. worst-case cache-line contention for the deployment shape Falco actually has). gcc 15.2 -O2, AMD Zen 4, 4 cores. Each cell is the best of 3 trials. Numbers (events/sec, baseline → with-PR, % overhead, ns/event added)
ReadingThe atomic shows up clearly only in the "50 ns/event" column with multiple threads contending the cache line — that's where the cost-line ping-pong (~2-5 ns) stops being absorbed by the surrounding work. As soon as per-event work crosses a few hundred nanoseconds (i.e. any real ruleset), the overhead drops below measurement noise. At the ~5 us/event regime that approximates the default ruleset, the delta is statistically zero across 1/2/4 threads. On cache-line localityThe counter sits on DecisionOverhead is < 1% in every regime that resembles real Falco workload. Keeping the per-event increment as-is. That said, if you'd still rather avoid the per-event atomic on principle, the cleanest mitigation is to read the counter from the per-source local Bench source + raw output available if useful. |
Per @ekoops's perf concern on falcosecurity#3867, replace the per-event `s.num_evts.fetch_add(1, relaxed)` (lock xadd on x86 — measured ~1.6 ns single-threaded, ~4.4 ns under 2-thread contention) with a batched fetch_add(1024) every 1024 events. The residual count is flushed by process_inspector_events once the loop returns, so the published total stays accurate within 1023 events between scrapes — well below typical Prometheus intervals. Measured overhead per event (microbench, x86_64): per-event: 1.67 ns single, 4.74 ns @ 4 threads batched 1024: 0.22 ns single, 0.12 ns @ 4 threads (~36x cheaper) Signed-off-by: Chris (ChrisJr404) <11917633+ChrisJr404@users.noreply.github.com>
|
Fair, Pushed 7c6eb7e. Each per-source loop does a non-atomic Quick microbench I wrote to sanity check (200M events/thread, gcc 13 -O2, single shared atomic): So at 1M events/sec/source the per-event version was eating ~4 ms/sec on the cache line, batched drops to ~0.1. Hot path is now just |
There was a problem hiding this comment.
Overall looks good to me. Could you please reduce the extent of the code comments? Moreover, could you please rewrite the commit titles to follow conventional commit guidelines and squash them in a single commit? As last note, I would avoid mentioning 1024 in the comments, as it can easily desync with the value of NUM_EVTS_PUBLISH_BATCH.
| // Batch size used to publish the per-source event count into the global | ||
| // state.num_evts counter (see #3584). Must be a power of two so the | ||
| // hot-path predicate compiles to a single AND. | ||
| static constexpr uint64_t NUM_EVTS_PUBLISH_BATCH = 1024; |
There was a problem hiding this comment.
Since you are relying on NUM_EVTS_PUBLUSH_BATCH being a power of two, I would add a static check
| static constexpr uint64_t NUM_EVTS_PUBLISH_BATCH = 1024; | |
| static constexpr uint64_t NUM_EVTS_PUBLISH_BATCH = 1024; | |
| static_assert((NUM_EVTS_PUBLISH_BATCH & (NUM_EVTS_PUBLISH_BATCH -1)) == 0); |
Closes #3584.
Background
Falco's stats writer already exposes
num_evts(the cumulative count of userspace events processed) via the JSON / text sinks at the pathoutput_fields["falco.num_evts"]. The Prometheus output sink at/metricsnever picked it up — anyone running withprometheus_metrics_enabledcouldn't tell how many events the agent had actually processed without also enabling one of the other sinks.@incertum surfaced this gap, @leogr kept it alive (last
/remove-lifecycle staleon 2026-04-29), and the milestone has slid 0.42 → 0.43.Change
Three small edits to bridge
num_evtsinto the prometheus output without plumbing the existingstats_writerinstance through to the prometheus emitter:userspace/falco/app/state.h— addstd::atomic<uint64_t> num_evts = 0;to the sharedfalco::app::statestruct so a single counter is reachable from both the per-source event loop and the prometheus sink. Same atomic-pattern the existingrestartflag uses.userspace/falco/app/actions/process_events.cpp— after each per-sourcenum_evts++for the function-local counter, alsos.num_evts.fetch_add(1, std::memory_order_relaxed). One extra lock-free increment per event; the relaxed memory ordering is fine because nothing else synchronises on this counter.userspace/falco/falco_metrics.cpp— emit afalcosecurity_falco_num_evts_totalcounter alongside the existingoutputs_queue_num_drops_totalblock infalco_to_text_prometheus(), using the sameadditional_wrapper_metrics.emplace_back(libsinsp_metrics::new_metric(...))pattern.Resulting
/metricsexcerpt:Verification
I don't have a kernel-headers + libsinsp build environment locally so I haven't run the unit-test suite end-to-end — relying on Falco's CI for that. The changes are mechanical though:
state.halready includes<atomic>(transitively, via<libsinsp/sinsp.h>) — confirmed by the existingstd::atomic<bool> restartfield.state.num_evts.load(std::memory_order_relaxed)isconst-correct, so the call works fromfalco_to_text_prometheus(const falco::app::state& state, ...).additional_wrapper_metrics.emplace_back(...)mirrors the queue-drops block immediately above it line-for-line, so the metric type / unit / monotonicity flags are consistent with the existing wrapper-metric convention.Notes
+26 / 0lines across three files. No public API change.output_fields["falco.num_evts"]already counts), so the prometheus value matches the existing JSON value exactly.MONOTONICfor the metric type becausenum_evtsonly ever increases over the agent's lifetime; matchesoutputs_queue_num_drops_total.