-
Notifications
You must be signed in to change notification settings - Fork 686
Monitoring performed by systemd-monitor-counter.json file does not work for type FrequentContainerDRestart #1258
Copy link
Copy link
Open
Description
As described here : #786
We have the same issue, we use the same Helm Chart version 2.4.0 . Actually testing FrequentContainerdRestart.
I0401 06:17:52.093199 4510 custom_plugin_monitor.go:81] Finish parsing custom plugin monitor config file /config/kernel-monitor-counter.json: {Plugin:custom PluginGlobalConfig:{InvokeIntervalString:0xc0003c0af0 TimeoutString:0xc0003c0b00 InvokeInterval:5m0s Timeout:1m0s MaxOutputLength:0xc000433b00 Concurrency:0xc000433b08 EnableMessageChangeBasedConditionUpdate:0x30fdf95 SkipInitialStatus:0x30fdf96} Source:kernel-monitor DefaultConditions:[{Type:FrequentUnregisterNetDevice Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:NoFrequentUnregisterNetDevice Message:node is functioning properly}] Rules:[0xc000408d90] EnableMetricsReporting:0xc000433b1e}
I0401 06:17:52.093680 4510 custom_plugin_monitor.go:81] Finish parsing custom plugin monitor config file /config/systemd-monitor-counter.json: {Plugin:custom PluginGlobalConfig:{InvokeIntervalString:0xc0003c0bd0 TimeoutString:0xc0003c0be0 InvokeInterval:5m0s Timeout:1m0s MaxOutputLength:0xc000433bf0 Concurrency:0xc000433bf8 EnableMessageChangeBasedConditionUpdate:0x30fdf95 SkipInitialStatus:0x30fdf96} Source:systemd-monitor DefaultConditions:[{Type:FrequentKubeletRestart Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:NoFrequentKubeletRestart Message:kubelet is functioning properly} {Type:FrequentDockerRestart Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:NoFrequentDockerRestart Message:docker is functioning properly} {Type:FrequentContainerdRestart Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:NoFrequentContainerdRestart Message:containerd is functioning properly}] Rules:[0xc000408fc0 0xc000409030 0xc0004090a0] EnableMetricsReporting:0xc000433c0f}
I0401 06:17:52.093973 4510 custom_plugin_monitor.go:81] Finish parsing custom plugin monitor config file /config/iptables-mode-monitor.json: {Plugin:custom PluginGlobalConfig:{InvokeIntervalString:0xc0003c0dc0 TimeoutString:0xc0003c0dd0 InvokeInterval:24h0m0s Timeout:5s MaxOutputLength:0xc000433d58 Concurrency:0xc000433d60 EnableMessageChangeBasedConditionUpdate:0x30fdf95 SkipInitialStatus:0x30fdf96} Source:iptables-mode-monitor DefaultConditions:[] Rules:[0xc000409260] EnableMetricsReporting:0xc000433d68}
I0401 06:17:52.094308 4510 custom_plugin_monitor.go:81] Finish parsing custom plugin monitor config file /config/network-problem-monitor.json: {Plugin:custom PluginGlobalConfig:{InvokeIntervalString:0xc0003c0e40 TimeoutString:0xc0003c0e50 InvokeInterval:30s Timeout:5s MaxOutputLength:0xc000433de0 Concurrency:0xc000433de8 EnableMessageChangeBasedConditionUpdate:0x30fdf95 SkipInitialStatus:0x30fdf96} Source:network-custom-plugin-monitor DefaultConditions:[] Rules:[0xc000409490 0xc000409500] EnableMetricsReporting:0xc000433df0}
I0401 06:17:52.094939 4510 log_monitor.go:80] Finish parsing log monitor config file /config/kernel-monitor.json: {WatcherConfig:{Plugin:kmsg PluginConfig:map[] SkipList:[] LogPath:/dev/kmsg Lookback:5m Delay:} BufferSize:10 Source:kernel-monitor DefaultConditions:[{Type:KernelDeadlock Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:XfsShutdown Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:XfsHasNotShutDown Message:XFS has not shutdown} {Type:CperHardwareErrorFatal Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:CperHardwareHasNoFatalError Message:UEFI CPER has no fatal error}] Rules:[{Type:temporary Condition: Reason:OOMKilling Pattern:Killed process \d+ (.+) total-vm:\d+kB, anon-rss:\d+kB, file-rss:\d+kB.* PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:TaskHung Pattern:task [\S ]+:\w+ blocked for more than \w+ seconds\. PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:UnregisterNetDevice Pattern:unregister_netdevice: waiting for \w+ to become free. Usage count = \d+ PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:KernelOops Pattern:BUG: unable to handle kernel NULL pointer dereference at .* PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:KernelOops Pattern:divide error: 0000 \[#\d+\] SMP PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:Ext4Error Pattern:EXT4-fs error .* PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:Ext4Warning Pattern:EXT4-fs warning .* PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:IOError Pattern:Buffer I/O error .* PatternGeneratedMessageSuffix:} {Type:permanent Condition:XfsShutdown Reason:XfsHasShutdown Pattern:XFS .* Shutting down filesystem.? PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:MemoryReadError Pattern:CE memory read error .* PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:CperHardwareErrorCorrected Pattern:.*\[Hardware Error\]: event severity: corrected$ PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:CperHardwareErrorRecoverable Pattern:.*\[Hardware Error\]: event severity: recoverable$ PatternGeneratedMessageSuffix:} {Type:permanent Condition:CperHardwareErrorFatal Reason:CperHardwareErrorFatal Pattern:.*\[Hardware Error\]: event severity: fatal$ PatternGeneratedMessageSuffix:} {Type:permanent Condition:KernelDeadlock Reason:DockerHung Pattern:task docker:\w+ blocked for more than \w+ seconds\. PatternGeneratedMessageSuffix:}] EnableMetricsReporting:0xc00041241e}
I0401 06:17:52.095034 4510 log_watchers.go:40] Use log watcher of plugin "kmsg"
I0401 06:17:52.095347 4510 log_monitor.go:80] Finish parsing log monitor config file /config/readonly-monitor.json: {WatcherConfig:{Plugin:kmsg PluginConfig:map[] SkipList:[] LogPath:/dev/kmsg Lookback:5m Delay:} BufferSize:10 Source:readonly-monitor DefaultConditions:[{Type:ReadonlyFilesystem Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:FilesystemIsNotReadOnly Message:Filesystem is not read-only}] Rules:[{Type:permanent Condition:ReadonlyFilesystem Reason:FilesystemIsReadOnly Pattern:Remounting filesystem read-only PatternGeneratedMessageSuffix:}] EnableMetricsReporting:0xc000412b60}
I0401 06:17:52.095370 4510 log_watchers.go:40] Use log watcher of plugin "kmsg"
I0401 06:17:52.095683 4510 log_monitor.go:80] Finish parsing log monitor config file /config/systemd-monitor.json: {WatcherConfig:{Plugin:journald PluginConfig:map[source:systemd] SkipList:[] LogPath:/var/log/journal Lookback:5m Delay:} BufferSize:10 Source:systemd-monitor DefaultConditions:[] Rules:[{Type:temporary Condition: Reason:KubeletStart Pattern:Started (Kubernetes kubelet|kubelet.service|kubelet.service - Kubernetes kubelet). PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:DockerStart Pattern:Starting (Docker Application Container Engine|docker.service|docker.service - Docker Application Container Engine)... PatternGeneratedMessageSuffix:} {Type:temporary Condition: Reason:ContainerdStart Pattern:Starting (containerd container runtime|containerd.service|containerd.service - containerd container runtime)... PatternGeneratedMessageSuffix:}] EnableMetricsReporting:0xc000412c1f}
I0401 06:17:52.095709 4510 log_watchers.go:40] Use log watcher of plugin "journald"
I0401 06:17:52.096482 4510 k8s_exporter.go:56] Waiting for kube-apiserver to be ready (timeout 5m0s)...
I0401 06:17:52.108894 4510 node_problem_detector.go:56] K8s exporter started.
I0401 06:17:52.109801 4510 node_problem_detector.go:60] Prometheus exporter started.
I0401 06:17:52.109829 4510 custom_plugin_monitor.go:112] Start custom plugin monitor /config/systemd-monitor-counter.json
I0401 06:17:52.109841 4510 custom_plugin_monitor.go:112] Start custom plugin monitor /config/iptables-mode-monitor.json
I0401 06:17:52.109849 4510 custom_plugin_monitor.go:112] Start custom plugin monitor /config/network-problem-monitor.json
I0401 06:17:52.109891 4510 log_monitor.go:112] Start log monitor /config/kernel-monitor.json
I0401 06:17:52.109950 4510 log_monitor.go:112] Start log monitor /config/readonly-monitor.json
I0401 06:17:52.109976 4510 log_monitor.go:112] Start log monitor /config/systemd-monitor.json
I0401 06:17:52.110692 4510 custom_plugin_monitor.go:313] Initialized conditions for /config/network-problem-monitor.json: []
I0401 06:17:52.110892 4510 custom_plugin_monitor.go:302] Sending initial status for network-custom-plugin-monitor with conditions: []
I0401 06:17:52.111819 4510 custom_plugin_monitor.go:313] Initialized conditions for /config/iptables-mode-monitor.json: []
I0401 06:17:52.111835 4510 custom_plugin_monitor.go:302] Sending initial status for iptables-mode-monitor with conditions: []
I0401 06:17:52.111780 4510 custom_plugin_monitor.go:313] Initialized conditions for /config/systemd-monitor-counter.json: [{Type:FrequentKubeletRestart Status:False Transition:2026-04-01 06:17:52.111766166 +0000 UTC m=+0.051430149 Reason:NoFrequentKubeletRestart Message:kubelet is functioning properly} {Type:FrequentDockerRestart Status:False Transition:2026-04-01 06:17:52.111766317 +0000 UTC m=+0.051430300 Reason:NoFrequentDockerRestart Message:docker is functioning properly} {Type:FrequentContainerdRestart Status:False Transition:2026-04-01 06:17:52.111766412 +0000 UTC m=+0.051430394 Reason:NoFrequentContainerdRestart Message:containerd is functioning properly}]
I0401 06:17:52.111910 4510 custom_plugin_monitor.go:302] Sending initial status for systemd-monitor with conditions: [{Type:FrequentKubeletRestart Status:False Transition:2026-04-01 06:17:52.111766166 +0000 UTC m=+0.051430149 Reason:NoFrequentKubeletRestart Message:kubelet is functioning properly} {Type:FrequentDockerRestart Status:False Transition:2026-04-01 06:17:52.111766317 +0000 UTC m=+0.051430300 Reason:NoFrequentDockerRestart Message:docker is functioning properly} {Type:FrequentContainerdRestart Status:False Transition:2026-04-01 06:17:52.111766412 +0000 UTC m=+0.051430394 Reason:NoFrequentContainerdRestart Message:containerd is functioning properly}]
I0401 06:17:52.111943 4510 log_monitor.go:237] Initialize condition generated: [{Type:KernelDeadlock Status:False Transition:2026-04-01 06:17:52.111933414 +0000 UTC m=+0.051597396 Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:XfsShutdown Status:False Transition:2026-04-01 06:17:52.111933556 +0000 UTC m=+0.051597539 Reason:XfsHasNotShutDown Message:XFS has not shutdown} {Type:CperHardwareErrorFatal Status:False Transition:2026-04-01 06:17:52.111933662 +0000 UTC m=+0.051597644 Reason:CperHardwareHasNoFatalError Message:UEFI CPER has no fatal error}]
I0401 06:17:52.121795 4510 log_monitor.go:237] Initialize condition generated: [{Type:ReadonlyFilesystem Status:False Transition:2026-04-01 06:17:52.121766199 +0000 UTC m=+0.061430184 Reason:FilesystemIsNotReadOnly Message:Filesystem is not read-only}]
I0401 06:17:52.125033 4510 custom_plugin_monitor.go:112] Start custom plugin monitor /config/kernel-monitor-counter.json
I0401 06:17:52.125090 4510 problem_detector.go:77] Problem detector started
I0401 06:17:52.131797 4510 log_monitor.go:237] Initialize condition generated: []
I0401 06:17:52.131869 4510 custom_plugin_monitor.go:313] Initialized conditions for /config/kernel-monitor-counter.json: [{Type:FrequentUnregisterNetDevice Status:False Transition:2026-04-01 06:17:52.131855316 +0000 UTC m=+0.071519288 Reason:NoFrequentUnregisterNetDevice Message:node is functioning properly}]
I0401 06:17:52.131914 4510 custom_plugin_monitor.go:302] Sending initial status for kernel-monitor with conditions: [{Type:FrequentUnregisterNetDevice Status:False Transition:2026-04-01 06:17:52.131855316 +0000 UTC m=+0.071519288 Reason:NoFrequentUnregisterNetDevice Message:node is functioning properly}]
I0401 06:17:52.192138 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:15:36.77789 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:19:41.671949 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:19:41.593223 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:20:01.626657 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:20:01.333858 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:20:16.786375 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:20:16.519023 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:20:22.396804 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:20:22.082176 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:20:28.629390 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:20:28.391211 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:21:01.648406 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:21:01.309319 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:21:35.137999 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:21:34.906559 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:21:44.142341 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:21:43.904364 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:21:45.891443 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:21:45.612809 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:22:02.079859 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:22:01.827101 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:22:05.393587 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:22:05.156369 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:44:06.497624 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:44:06.257501 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 06:44:42.636206 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 06:44:42.353282 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:02:47.651327 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:02:47.389301 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:03.414655 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:03.027195 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:05.394581 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:05.132242 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:06.774630 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:06.55276 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:08.057202 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:07.709293 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:09.899276 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:09.613928 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:14.390863 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:14.107426 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:16.369108 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:16.111857 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:18.645132 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:18.377895 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:49.915407 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:49.887923 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:53.394354 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:53.101364 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:03:56.689072 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:56.462619 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:04:00.034938 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:03:59.807404 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:04:03.155947 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:04:02.894065 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:04:05.643607 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:04:05.42907 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:04:08.399255 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:04:08.147461 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:04:15.120155 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:04:14.699852 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:04:17.389640 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:04:17.1056 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:04:20.632406 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:04:20.385319 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:04:47.400920 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:04:47.16846 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:04:49.644431 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:04:49.336994 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:05:12.890177 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:05:12.671305 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:05:16.082684 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:05:15.971226 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:05:23.141585 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:05:22.921637 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:05:26.405019 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:05:26.163813 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:05:33.653184 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:05:33.424341 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:05:58.399398 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:05:58.151135 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:06:00.405801 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:06:00.054218 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:06:23.895000 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:06:23.622262 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:06:26.696025 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:06:26.451342 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:08:33.505146 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:08:33.205087 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
I0401 07:14:35.785010 4510 log_monitor.go:161] New status generated: &{Source:systemd-monitor Events:[{Severity:warn Timestamp:2026-04-01 07:14:35.446393 +0000 UTC Reason:ContainerdStart Message:Starting containerd.service - containerd container runtime...}] Conditions:[]}
As you can see i have manually restart containerd on a node multiple times (more than 5) :
sudo systemctl restart containerd
NPD seems to detect it without any problem.
If i manually trigger the script, it is well detected :
root@node-problem-detector-xhxkh:/# /home/kubernetes/bin/log-counter \
> --journald-source=systemd \
> --log-path=/var/log/journal \
> --lookback=20m \
> --count=5 \
> --pattern="Starting (containerd container runtime|containerd.service|containerd.service - containerd container runtime)..."
Found 32 matching logs, which meets the threshold of 5
But the condition on the node is not being updated.
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
FrequentDockerRestart False Wed, 01 Apr 2026 09:33:02 +0200 Wed, 01 Apr 2026 08:17:52 +0200 NoFrequentDockerRestart docker is functioning properly
FrequentContainerdRestart False Wed, 01 Apr 2026 09:33:02 +0200 Wed, 01 Apr 2026 08:17:52 +0200 NoFrequentContainerdRestart containerd is functioning properly
KernelDeadlock False Wed, 01 Apr 2026 09:33:02 +0200 Wed, 01 Apr 2026 08:17:52 +0200 KernelHasNoDeadlock kernel has no deadlock
XfsShutdown False Wed, 01 Apr 2026 09:33:02 +0200 Wed, 01 Apr 2026 08:17:52 +0200 XfsHasNotShutDown XFS has not shutdown
CperHardwareErrorFatal False Wed, 01 Apr 2026 09:33:02 +0200 Wed, 01 Apr 2026 08:17:52 +0200 CperHardwareHasNoFatalError UEFI CPER has no fatal error
ReadonlyFilesystem False Wed, 01 Apr 2026 09:33:02 +0200 Wed, 01 Apr 2026 08:17:52 +0200 FilesystemIsNotReadOnly Filesystem is not read-only
FrequentUnregisterNetDevice False Wed, 01 Apr 2026 09:33:02 +0200 Wed, 01 Apr 2026 08:17:52 +0200 NoFrequentUnregisterNetDevice node is functioning properly
FrequentKubeletRestart False Wed, 01 Apr 2026 09:33:02 +0200 Wed, 01 Apr 2026 08:17:52 +0200 NoFrequentKubeletRestart kubelet is functioning properly
NetworkUnavailable False Wed, 01 Apr 2026 08:17:39 +0200 Wed, 01 Apr 2026 08:17:39 +0200 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 01 Apr 2026 09:33:31 +0200 Wed, 01 Apr 2026 08:17:08 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 01 Apr 2026 09:33:31 +0200 Wed, 01 Apr 2026 08:17:08 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 01 Apr 2026 09:33:31 +0200 Wed, 01 Apr 2026 08:17:08 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 01 Apr 2026 09:33:31 +0200 Wed, 01 Apr 2026 08:17:27 +0200 KubeletReady kubelet is posting ready status
HR :
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: node-problem-detector
namespace: kube-system
spec:
interval: 2m
releaseName: node-problem-detector
targetNamespace: kube-system
install:
disableWait: true
crds: "CreateReplace"
chart:
spec:
chart: node-problem-detector
version: "2.4.0"
sourceRef:
kind: HelmRepository
name: node-problem-detector
namespace: flux-system
interval: 10m
upgrade:
force: false
disableWait: true
remediation:
remediateLastFailure: false
values:
maxUnavailable: 10%
hostPID: true
tolerations:
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists
resources:
requests:
cpu: 20m
memory: 64Mi
limits:
memory: 256Mi
settings:
log_monitors:
- /config/kernel-monitor.json
- /config/readonly-monitor.json
- /config/systemd-monitor.json
custom_plugin_monitors:
- /config/kernel-monitor-counter.json
- /config/systemd-monitor-counter.json
- /config/iptables-mode-monitor.json
- /config/network-problem-monitor.json
metrics:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
prometheus: system
attachMetadata:
node: true
prometheusRule:
enabled: true
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels