Skip to content

perf: reduce per-node allocations in to_native_metric_node#4075

Open
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:perf-metric-allocations
Open

perf: reduce per-node allocations in to_native_metric_node#4075
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:perf-metric-allocations

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Part of #4072.

Rationale for this change

Issue #4072 documents allocator and JNI overhead in the per-batch metric reporting path. Profiling on the Rust side shows that to_native_metric_node builds a fresh HashMap and Vec per plan node and goes through unwrap_or_default() on the metric set even when a node has no metrics. None of the per-call sizes are large, but the path runs on every batch returned to the JVM, so a few small allocations per node multiply quickly.

A small Cargo bench (release, M-series laptop) on synthetic plan shapes:

plan shape tree walk tree walk + encode
linear 3 x 5 mtx 826ns -> 614ns 1.11us -> 1.03us
linear 8 x 8 mtx 3.95us -> 2.24us 5.67us -> 4.30us
linear 20 x 10 mtx 11.1us -> 6.91us 19.3us -> 15.9us
join 2x5 chains x 8 5.46us -> 3.10us 7.65us -> 5.71us

Tree-walk drops 26-43%. Combined with the protobuf encode the net per-call saving is 7-25%.

This does not solve all of the overhead in #4072 (JVM-side parse, JNI byte-array copy, and per-metric to_string are unchanged), but it is a low-risk wins-only change that does not touch the wire format or the JVM side. Larger restructuring can be evaluated separately.

What changes are included in this PR?

One file, native/core/src/execution/metrics/utils.rs:

  1. HashMap::with_capacity(16) instead of HashMap::new(). Most operator metric maps are well under 20 entries (e.g. hashJoinMetrics reports 9, nativeScanMetrics ~20), so 16 avoids the default-capacity rehash on virtually every node without over-allocating.
  2. Vec::with_capacity(children.len()) for the children vec.
  3. Replace node_metrics.unwrap_or_default().iter() with an if let Some(metrics) block, skipping the empty MetricsSet allocation when a node produces no metrics.

The wire format is unchanged.

How are these changes tested?

Existing test coverage. CometTaskMetricsSuite continues to pass and exercises the metric pipeline end-to-end. A representative cargo bench (numbers above) confirms the perf intent; the bench itself is not included in this PR to keep it minimal.

Pre-size the per-node HashMap and children Vec, and skip the empty
MetricsSet allocation for nodes that produce no metrics. Net effect on
the Rust side of the protobuf metric pipeline (Cargo bench, M-series
laptop, release build):

| plan shape          | tree walk     | tree walk + encode |
| ------------------- | ------------- | ------------------ |
| linear  3 x  5 mtx  |  826 -> 614ns | 1.11 -> 1.03us     |
| linear  8 x  8 mtx  | 3.95 -> 2.24us| 5.67 -> 4.30us     |
| linear 20 x 10 mtx  | 11.1 -> 6.91us| 19.3 -> 15.9us     |
| join 2x5 chains x 8 | 5.46 -> 3.10us| 7.65 -> 5.71us     |

The HashMap pre-sizing dominates. Most operator metric maps are well
under 20 entries (hash-join reports 9, native-scan ~20), so the literal
16 avoids the default-capacity rehash on virtually every node.

Refs apache#4072.
Copy link
Copy Markdown
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. minor comment.

let mut native_metric_node = NativeMetricNode {
// Most operator metric maps are well under 20 entries (e.g. hash-join: 9,
// native-scan: ~20). Pre-sizing to 16 avoids the default-capacity rehash.
metrics: HashMap::with_capacity(16),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use a constant for readability?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants