What happened:
During PVC provisioning with topology constraints, the external-provisioner fails with error generating accessibility requirements: topology labels from selected node map[] does not match topology keys from CSINode, even though all required topology labels are present on the node.
The provisioners topology cache can capture incomplete pvcNodeStore when additional labels are patched on after node creation. Then, on future reconciles the cache is never updated with newer labels because there was a cache hit for the node (code ref). The can causes continued provisioning failures until controller restart.
Additionally, the full selectedNodeLabels is never logged in the error message
fmt.Errorf("topology labels from selected node %v does not match topology keys from CSINode %v", selectedNodeLabels, topologyKeys)
even if some of the topology labels exist. The reason for this is extractTopologyTerm returns nil as selectedNodeLabels if any topology label does not exist on the Node.
What you expected to happen:
The provisioner should not be blocked by stale cached data, after the necessary topology labels are added to the Node provisioning should succeed.
How to reproduce it:
To reproduce this determinstically:
- Launch Node with all required topology labels
- Remove one topology label (ensure theres no syncing to add it back on)
- Create Stateful Set (in turn triggers volume provisioning)
- PVC will remain unbound with
error generating accessibility requirements: topology labels from selected node map[] does not match topology keys from CSINode logs from the provisioner, you've hit the race condition
- Add topology label back onto the Node, PVC will continue to remain unbound
Anything else we need to know?:
I'll be happy to submit a PR for this
Environment:
- Driver version:
- Kubernetes version (use
kubectl version):
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a):
- Install tools:
- Others:
What happened:
During PVC provisioning with topology constraints, the external-provisioner fails with
error generating accessibility requirements: topology labels from selected node map[] does not match topology keys from CSINode, even though all required topology labels are present on the node.The provisioners topology cache can capture incomplete
pvcNodeStorewhen additional labels are patched on after node creation. Then, on future reconciles the cache is never updated with newer labels because there was a cache hit for the node (code ref). The can causes continued provisioning failures until controller restart.Additionally, the full
selectedNodeLabelsis never logged in the error messageeven if some of the topology labels exist. The reason for this is extractTopologyTerm returns
nilasselectedNodeLabelsif any topology label does not exist on the Node.What you expected to happen:
The provisioner should not be blocked by stale cached data, after the necessary topology labels are added to the Node provisioning should succeed.
How to reproduce it:
To reproduce this determinstically:
error generating accessibility requirements: topology labels from selected node map[] does not match topology keys from CSINodelogs from the provisioner, you've hit the race conditionAnything else we need to know?:
I'll be happy to submit a PR for this
Environment:
kubectl version):uname -a):