Skip to content

Cleanup job pod can be assigned to incorrect node #1453

@dkwon17

Description

@dkwon17

Description

It is possible for the cleanup job pod can be assigned to the incorrect node.

To summarize, the common PVC cleanup pod should be scheduled to the same node where a running workspace pod mounting the PVC is scheduled (if it exists).

This issue happens when the workspace pod is scheduled on a different node than the claim-devworkspace PVC's volume.kubernetes.io/selected-node .

How To Reproduce

  1. On a multinode cluster, create a devworkspace:
$ curl https://raw.githubusercontent.com/devfile/devworkspace-operator/refs/heads/main/samples/code-latest.yaml | oc apply -f -

This will create a workspace pod, and a PVC named claim-devworkspace with:

metadata:
  annotations:
    volume.kubernetes.io/selected-node: <node-name>
  1. Create multiple more devworkspaces named code-latest-2, code-latest-3, code-latest-4 etc. until there are two workspace pods that is scheduled on a node <node-name*> which is different than <node-name> from the previous step. These workspace pods will not be Running because of a PVC attach error.
$ curl https://raw.githubusercontent.com/devfile/devworkspace-operator/refs/heads/main/samples/code-latest.yaml | yq '.metadata.name = "code-latest-2"' | oc apply -f -

$ curl https://raw.githubusercontent.com/devfile/devworkspace-operator/refs/heads/main/samples/code-latest.yaml | yq '.metadata.name = "code-latest-3"' | oc apply -f -

$ curl https://raw.githubusercontent.com/devfile/devworkspace-operator/refs/heads/main/samples/code-latest.yaml | yq '.metadata.name = "code-latest-4"' | oc apply -f -

For the sake of clarity, let's assume that code-latest-3 and code-latest-4 has it's pods scheduled on <node-name-*>

  1. Stop all devworkspaces except for code-latest-3 and code-latest-4 . This will fix the pvc attach error, and the two workspace pods will have the Running state.

  2. Delete the code-latest-3 devworkspace. Since the PVC's volume.kubernetes.io/selected-node is <node-name> instead of <node-name-*>, there will be a multi-attach error for the common PVC cleanup pod.

Expected behavior

The common PVC cleanup pod should be scheduled on <node-name-*>.

Additional context

Here is an example where the workspace pod is scheduled on ip-10-0-98-78.us-west-2.compute.internal, but the claim-devworkspace PVC's volume.kubernetes.io/selected-node is ip-10-0-101-28.us-west-2.compute.internal:

output.mp4

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions