Problem
The metrics-proxy JVM heap size is hardcoded in
MetricsProxyContainer.java with no override mechanism:
int heapSize = adminCluster ? 96 : 320;
builder.jvm.heapsize(heapSize);
builder.jvm.minHeapsize(heapSize);
For clusters with high metrics cardinality, driven by the multiplicative combination of content containers * document types * rank profiles, the metrics-proxy runs out of memory and stops emitting metrics.
In our case, a cluster with 12 document types and 120+ content containers causes the metrics-proxy to OOM. The cardinality of rank-profile metrics alone (with dimensions host * documenttype * rankProfile *
metricName * suffix) exceeds what 320MB can handle.
Proposed solution
Allow the metrics-proxy heap to accommodate large clusters. Some options:
- Scale the heap dynamically based on cluster characteristics (number of content nodes, document types, rank profiles)
- Make it configurable via services.xml
- Increase the default to handle larger deployments
The current fixed value of 320MB breaks for larger clusters.
Context
- The number of metrics scales multiplicatively with content nodes, document types, and rank profiles
- There is no way to reduce metrics-proxy memory usage through consumer filtering in services.xml. The proxy tracks all metrics internally regardless
- Vertical scaling (fewer, larger nodes) or reducing document types/rank profiles are workarounds, but the root issue is the non-configurable heap
Problem
The metrics-proxy JVM heap size is hardcoded in
MetricsProxyContainer.javawith no override mechanism:For clusters with high metrics cardinality, driven by the multiplicative combination of content containers * document types * rank profiles, the metrics-proxy runs out of memory and stops emitting metrics.
In our case, a cluster with 12 document types and 120+ content containers causes the metrics-proxy to OOM. The cardinality of rank-profile metrics alone (with dimensions host * documenttype * rankProfile *
metricName * suffix) exceeds what 320MB can handle.
Proposed solution
Allow the metrics-proxy heap to accommodate large clusters. Some options:
The current fixed value of 320MB breaks for larger clusters.
Context