Summary
A race condition with crio and dranet causes pod's networking namespace to be destroyed before firing the NRI StopPodSandbox hook to dranet.
What should happen
- Pod deletion request
- CRI-O calls NRI
StopPodSandbox hook
- dranet opens the pod's netns via
netns.GetFromPath(containerNsPath) (nri_hooks.go:372)
- dranet calls
nsDetachNetdev / nsDetachRdmadev` to move NICs + RDMA devices back to root namespace
- CRI-O tears down the network namespace
What actually happens in CRI-O 1.34
- Pod deletion requested
- CRI-O tears down the network namespace first
- CRI-O calls NRI
StopPodSandbox
- dranet tries
netns.GetFromPath(containerNsPath) -> fails because the namespace file descriptor no longer exists at that path
nsDetachNetdev and nsDetachRdmadev both error out
- But
StopPodSandbox swallows errors (logs then, returns nil) so the pod shutdown continues
- The physical NICs and RDMA link devices that were moved into the now-destroyed namespace are orphaned
Current process to get NICs back
After this happens, the NICs are now disappeared from both the host and the ResourceSlice so rdma: false.
Manually, on each node, it is required to unbind and re-bind the NICs.
Proposed Solution (open to suggestions here)
As a short-term fix, it may be possible to anchor a file-descriptor to that path in the pod so that CRI-O does not remove it / tear it down. Then, the logic to move stuff back from the pod to the container should work fine.
Summary
A race condition with crio and dranet causes pod's networking namespace to be destroyed before firing the NRI StopPodSandbox hook to dranet.
What should happen
StopPodSandboxhooknetns.GetFromPath(containerNsPath)(nri_hooks.go:372)nsDetachNetdev/ nsDetachRdmadev` to move NICs + RDMA devices back to root namespaceWhat actually happens in CRI-O 1.34
StopPodSandboxnetns.GetFromPath(containerNsPath)-> fails because the namespace file descriptor no longer exists at that pathnsDetachNetdevandnsDetachRdmadevboth error outStopPodSandboxswallows errors (logs then, returns nil) so the pod shutdown continuesCurrent process to get NICs back
After this happens, the NICs are now disappeared from both the host and the
ResourceSlicesordma: false.Manually, on each node, it is required to unbind and re-bind the NICs.
Proposed Solution (open to suggestions here)
As a short-term fix, it may be possible to anchor a file-descriptor to that path in the pod so that CRI-O does not remove it / tear it down. Then, the logic to move stuff back from the pod to the container should work fine.