-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Improve ConfigMap generation: add metricsConfig block when enabled: false #3751
Description
Describe the bug
Summary
When controller.metricsConfig.enabled is set to false, the argo-workflows Helm chart does not add a metricsConfig block to the workflow controller ConfigMap. The controller then falls back to its binary defaults, which include the Prometheus metrics server enabled (and often with TLS). This causes problems when using OTLP-only metrics with a default PodMonitor that scrapes pods with the sidecar label.
Environment
- Chart: argo-workflows (e.g. 0.47.x)
- Controller: Argo Workflows v3.7.x
- Setup: Controller configured for OTLP push to an injected OpenTelemetry sidecar; Prometheus scrapes via PodMonitor that selects pods with
sidecar.opentelemetry.io/injected: Exists
Current Chart Behavior
From workflow-controller-config-map.yaml:
{{- if .Values.controller.metricsConfig.enabled }}
metricsConfig:
enabled: {{ .Values.controller.metricsConfig.enabled }}
path: {{ .Values.controller.metricsConfig.path }}
port: {{ .Values.controller.metricsConfig.port }}
# ... secure, etc.
{{- end }}When metricsConfig.enabled is false:
- The
{{- if }}condition is false, so the entiremetricsConfigblock is omitted from the ConfigMap. - The workflow controller reads the ConfigMap and finds no
metricsConfigkey. - The controller falls back to its built-in defaults, which typically include:
enabled: true(Prometheus metrics server on)secure: true(TLS on port 9090)port: 9090
So setting metricsConfig.enabled: false in values does not result in the controller actually disabling its Prometheus server.
Problem When Using OTLP + Default PodMonitor
Desired flow: Controller pushes metrics via OTLP only → sidecar → gateway → metric upstreams. No direct scrape of the controller.
What happens instead:
- User sets
controller.metricsConfig.enabled: falseexpecting OTLP-only metrics. - Chart omits
metricsConfigfrom the ConfigMap; controller uses defaults and still runs the Prometheus server on 9090 (with TLS). - A default PodMonitor (e.g. from an OpenTelemetry when PodMonitor is enabled) selects pods with
sidecar.opentelemetry.io/injected: Existsand scrapes port namemetrics. - The workflow controller pod has two containers with a port named
metrics:- Controller: 9090 (Prometheus server, HTTPS by default)
- Sidecar: 8888 (collector metrics, HTTP)
- Prometheus discovers both targets and scrapes them with HTTP.
- Scraping the controller’s 9090 over HTTP fails with:
http: TLS handshake error from <prometheus-ip>: client sent an HTTP request to an HTTPS server
Result: Continuous TLS handshake errors in controller logs, and the user cannot cleanly achieve an OTLP-only metrics flow without workarounds (e.g. enabling metrics with secure: false just to silence errors, or custom PodMonitors).
Proposed Fix
When metricsConfig.enabled is false, the chart should still emit a metricsConfig block so the controller explicitly disables its Prometheus server:
{{- if ne (index .Values.controller "metricsConfig") nil }}
metricsConfig:
enabled: {{ .Values.controller.metricsConfig.enabled }}
{{- if .Values.controller.metricsConfig.enabled }}
path: {{ .Values.controller.metricsConfig.path }}
port: {{ .Values.controller.metricsConfig.port }}
secure: {{ .Values.controller.metricsConfig.secure }}
# ... other fields
{{- end }}
{{- end }}Or, more simply, always include the block when metricsConfig is defined in values, and let enabled control behavior:
{{- with .Values.controller.metricsConfig }}
metricsConfig:
enabled: {{ .enabled }}
path: {{ .path }}
port: {{ .port }}
secure: {{ .secure }}
{{- /* ... other fields when enabled */ -}}
{{- end }}The important change: output metricsConfig: enabled: false when the user sets enabled: false, so the controller does not fall back to defaults that keep the server on.
Workaround (Current)
To avoid TLS errors while keeping the default PodMonitor, we must set metricsConfig.enabled: true and metricsConfig.secure: false so the chart injects the block and the controller serves HTTP on 9090. This is a workaround, not the desired OTLP-only setup.
References
- Argo Workflows metrics docs – OTLP is the recommended approach
- Workflow controller ConfigMap template:
templates/controller/workflow-controller-config-map.yaml - Controller defaults when ConfigMap lacks
metricsConfig: binary fallback (metrics on, secure on)
Related helm chart
argo-workflows
Helm chart version
0.47.3
To Reproduce
- Set
metricsConfig.enabled: false - In the Argo Workflows Controller logs, there is information about
Starting Prometheus metrics exporter, which is a default binary behaviour whenmetricsConfigdoes not exist.
Expected behavior
metricsConfig.enabled == false should result in the Prometheus metrics exporter not starting at all.
Screenshots
No response
Additional context
No response