Skip to content

Improve ConfigMap generation: add metricsConfig block when enabled: false #3751

@SheGe

Description

@SheGe

Describe the bug

Summary

When controller.metricsConfig.enabled is set to false, the argo-workflows Helm chart does not add a metricsConfig block to the workflow controller ConfigMap. The controller then falls back to its binary defaults, which include the Prometheus metrics server enabled (and often with TLS). This causes problems when using OTLP-only metrics with a default PodMonitor that scrapes pods with the sidecar label.

Environment

  • Chart: argo-workflows (e.g. 0.47.x)
  • Controller: Argo Workflows v3.7.x
  • Setup: Controller configured for OTLP push to an injected OpenTelemetry sidecar; Prometheus scrapes via PodMonitor that selects pods with sidecar.opentelemetry.io/injected: Exists

Current Chart Behavior

From workflow-controller-config-map.yaml:

{{- if .Values.controller.metricsConfig.enabled }}
metricsConfig:
  enabled: {{ .Values.controller.metricsConfig.enabled }}
  path: {{ .Values.controller.metricsConfig.path }}
  port: {{ .Values.controller.metricsConfig.port }}
  # ... secure, etc.
{{- end }}

When metricsConfig.enabled is false:

  • The {{- if }} condition is false, so the entire metricsConfig block is omitted from the ConfigMap.
  • The workflow controller reads the ConfigMap and finds no metricsConfig key.
  • The controller falls back to its built-in defaults, which typically include:
    • enabled: true (Prometheus metrics server on)
    • secure: true (TLS on port 9090)
    • port: 9090

So setting metricsConfig.enabled: false in values does not result in the controller actually disabling its Prometheus server.

Problem When Using OTLP + Default PodMonitor

Desired flow: Controller pushes metrics via OTLP only → sidecar → gateway → metric upstreams. No direct scrape of the controller.

What happens instead:

  1. User sets controller.metricsConfig.enabled: false expecting OTLP-only metrics.
  2. Chart omits metricsConfig from the ConfigMap; controller uses defaults and still runs the Prometheus server on 9090 (with TLS).
  3. A default PodMonitor (e.g. from an OpenTelemetry when PodMonitor is enabled) selects pods with sidecar.opentelemetry.io/injected: Exists and scrapes port name metrics.
  4. The workflow controller pod has two containers with a port named metrics:
    • Controller: 9090 (Prometheus server, HTTPS by default)
    • Sidecar: 8888 (collector metrics, HTTP)
  5. Prometheus discovers both targets and scrapes them with HTTP.
  6. Scraping the controller’s 9090 over HTTP fails with:
    http: TLS handshake error from <prometheus-ip>: client sent an HTTP request to an HTTPS server

Result: Continuous TLS handshake errors in controller logs, and the user cannot cleanly achieve an OTLP-only metrics flow without workarounds (e.g. enabling metrics with secure: false just to silence errors, or custom PodMonitors).

Proposed Fix

When metricsConfig.enabled is false, the chart should still emit a metricsConfig block so the controller explicitly disables its Prometheus server:

{{- if ne (index .Values.controller "metricsConfig") nil }}
metricsConfig:
  enabled: {{ .Values.controller.metricsConfig.enabled }}
  {{- if .Values.controller.metricsConfig.enabled }}
  path: {{ .Values.controller.metricsConfig.path }}
  port: {{ .Values.controller.metricsConfig.port }}
  secure: {{ .Values.controller.metricsConfig.secure }}
  # ... other fields
  {{- end }}
{{- end }}

Or, more simply, always include the block when metricsConfig is defined in values, and let enabled control behavior:

{{- with .Values.controller.metricsConfig }}
metricsConfig:
  enabled: {{ .enabled }}
  path: {{ .path }}
  port: {{ .port }}
  secure: {{ .secure }}
  {{- /* ... other fields when enabled */ -}}
{{- end }}

The important change: output metricsConfig: enabled: false when the user sets enabled: false, so the controller does not fall back to defaults that keep the server on.

Workaround (Current)

To avoid TLS errors while keeping the default PodMonitor, we must set metricsConfig.enabled: true and metricsConfig.secure: false so the chart injects the block and the controller serves HTTP on 9090. This is a workaround, not the desired OTLP-only setup.

References

  • Argo Workflows metrics docs – OTLP is the recommended approach
  • Workflow controller ConfigMap template: templates/controller/workflow-controller-config-map.yaml
  • Controller defaults when ConfigMap lacks metricsConfig: binary fallback (metrics on, secure on)

Related helm chart

argo-workflows

Helm chart version

0.47.3

To Reproduce

  1. Set metricsConfig.enabled: false
  2. In the Argo Workflows Controller logs, there is information about Starting Prometheus metrics exporter, which is a default binary behaviour when metricsConfig does not exist.

Expected behavior

metricsConfig.enabled == false should result in the Prometheus metrics exporter not starting at all.

Screenshots

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions