This guide walks through deploying karpenter to an AKS Flex cluster and using Karpenter to automatically provision and deprovision cloud nodes. By the end you will have:
- The karpenter controller running in the cluster
NodeClassandNodePoolresources configured for Azure and/or Nebius compute instances- Workloads that trigger automatic node scale-up
- An understanding of how to scale down and clean up provisioned nodes
Karpenter watches for unschedulable pods and automatically provisions new nodes to meet demand. The karpenter extends Karpenter with multiple cloud providers:
- Azure (
AKSNodeClass) — provisions Azure VMs directly into the cluster's node resource group, joining the existing AKS cluster. - Nebius (
NebiusNodeClass) — provisions Nebius VMs that join the AKS cluster as worker nodes over WireGuard or Unbounded CNI.
- AKS Flex CLI -- installed and configured with a
.envfile. See CLI Setup. - AKS cluster -- an AKS cluster provisioned via the CLI. For Nebius nodes, the cluster must also have WireGuard or Unbounded CNI enabled for cross-cloud connectivity. See AKS Cluster Setup.
- Nebius service account credentials (Nebius only) -- a Nebius credentials JSON file for the karpenter controller. See the Nebius authorized keys documentation.
- Helm -- required for installing the karpenter chart.
Ensure your .env file contains the standard Azure settings:
export LOCATION=southcentralus
export AZURE_SUBSCRIPTION_ID=<your-subscription-id>
export RESOURCE_GROUP_NAME=rg-aks-flex-<username>
export CLUSTER_NAME=aksThe CLI resolves all Helm chart values from these environment variables and the live AKS cluster. No additional Karpenter-specific environment variables are required.
$ kubectl create namespace karpenterThe karpenter controller needs Nebius API credentials to provision VMs. The credentials file is a JSON file generated by the Nebius console (see the Nebius authorized keys documentation).
Note the local path to this file — you will pass it to the CLI in step 4 via --nebius-credentials-file. The chart will create the nebius-credentials Secret in the karpenter namespace automatically during helm upgrade --install; no separate kubectl create secret step is needed.
The AKS ARM template (aks-flex-cli aks deploy) automatically provisions a user-assigned managed identity named karpenter-flex and assigns the following roles:
| Role | Scope | Purpose |
|---|---|---|
| Network Contributor | Resource group | VNET GUID resolution at startup, subnet join when creating NICs |
| Virtual Machine Contributor | Node resource group | VM lifecycle — create and delete Azure VMs |
| Network Contributor | Node resource group | NIC lifecycle — create and delete NICs for provisioned VMs |
| Managed Identity Operator | Node resource group | Assign managed identities to provisioned VMs |
The template also creates a federated identity credential that pairs the managed identity with the AKS cluster's OIDC issuer, granting access to the karpenter service account in the karpenter namespace. This enables workload identity — no manual role assignment steps are required.
Use the CLI to generate a karpenter_values.yaml file with all required values pre-populated. Pass --nebius-credentials-file to have the chart create the nebius-credentials Secret automatically, and --ssh-public-key-file to embed the SSH public key used when bootstrapping provisioned nodes:
$ aks-flex-cli config karpenter helm \
--nebius-credentials-file ~/.nebius/credentials.json \
--ssh-public-key-file ~/.ssh/id_ed25519.pubThe command reads both files, embeds their contents into karpenter_values.yaml, and prints the install command to stdout:
helm upgrade --install karpenter charts/karpenter \
--namespace karpenter --create-namespace \
--values karpenter_values.yaml
The generated karpenter_values.yaml looks like:
# Karpenter Helm values — generated by: aks-flex config karpenter helm
settings:
clusterName: "aks"
clusterEndpoint: "https://aks-xxxx.hcp.eastus2.azmk8s.io:443"
logLevel: debug
replicas: 1
serviceAccount:
annotations:
azure.workload.identity/client-id: "<karpenter-flex-client-id>"
podLabels:
azure.workload.identity/use: "true"
controller:
nebiusCredentials:
enabled: true
image:
digest: ""
env:
- name: ARM_CLOUD
value: "AzurePublicCloud"
- name: LOCATION
value: "southcentralus"
- name: ARM_RESOURCE_GROUP
value: "rg-aks-flex-<username>"
- name: AZURE_TENANT_ID
value: "<tenant-id>"
- name: AZURE_CLIENT_ID
value: "<karpenter-flex-client-id>"
- name: AZURE_SUBSCRIPTION_ID
value: "<subscription-id>"
- name: AZURE_NODE_RESOURCE_GROUP
value: "<node-resource-group>"
- name: SSH_PUBLIC_KEY
value: "ssh-key-not-set"
- name: VNET_SUBNET_ID
value: "/subscriptions/.../subnets/aks"
- name: KUBELET_BOOTSTRAP_TOKEN
value: "<token-id>.<token-secret>"
- name: DISABLE_LEADER_ELECTION
value: "false"If any value cannot be resolved (e.g. the cluster is not reachable), it is replaced with <replace-with-actual-value>. Edit the file before running the install command.
Run the install from the karpenter/ directory:
$ helm upgrade --install karpenter charts/karpenter \
--namespace karpenter --create-namespace \
--values karpenter_values.yamlBy default the values file is written to karpenter_values.yaml in the current directory. Use --output to write it elsewhere:
$ aks-flex-cli config karpenter helm \
--nebius-credentials-file ~/.nebius/credentials.json \
--ssh-public-key-file ~/.ssh/id_ed25519.pub \
--output /path/to/my-values.yamlTo use a custom controller image instead of the chart default, pass the --image flag:
$ aks-flex-cli config karpenter helm \
--nebius-credentials-file ~/.nebius/credentials.json \
--ssh-public-key-file ~/.ssh/id_ed25519.pub \
--image myregistry.io/karpenter:v0.2.0This adds controller.image.repository and controller.image.tag entries to the generated values file.
$ kubectl -n karpenter get pods
NAME READY STATUS RESTARTS AGE
karpenter-6b55df659d-m2d5g 1/1 Running 7 (13m ago) 20mWith the karpenter controller running, you can define an AKSNodeClass and NodePool to provision Azure VMs directly into the cluster's node resource group.
The AKSNodeClass defines the Azure-specific configuration for provisioned nodes:
$ kubectl apply -f examples/azure/nodeclass.yamlVerify the node class is ready:
$ kubectl get aksnodeclass
NAME READY AGE
azure True 5s$ kubectl apply -f examples/azure/cpu_nodepool.yamlVerify the node pool is ready:
$ kubectl get nodepool
NAME NODECLASS NODES READY AGE
azure-cpu-nodepool azure 0 True 4sFor GPU workloads, create a NodePool that pins to a specific GPU SKU via node.kubernetes.io/instance-type:
$ kubectl apply -f examples/azure/gpu_nodepool.yamlBoth node pools should now be ready:
$ kubectl get nodepool
NAME NODECLASS NODES READY AGE
azure-cpu-nodepool azure 0 True 4s
azure-gpu-nodepool azure 0 True 2s$ kubectl apply -f examples/azure/cpu_deployment.yamlKarpenter detects the unschedulable pod and creates a NodeClaim:
$ kubectl get nodeclaims
NAME TYPE CAPACITY ZONE NODE READY AGE
azure-cpu-nodepool-6rhlk aks-azure-cpu-nodepool-6rhlk True 2mNote: GPU workloads require an NVIDIA plugin to advertise GPU resources. Install one with the CLI before creating GPU workloads:
# NVIDIA GPU Device Plugin (standard resource-based allocation) aks-flex-cli aks deploy --nvidia-device-plugin --skip-arm # NVIDIA DRA Driver (Dynamic Resource Allocation) aks-flex-cli aks deploy --nvidia-dra-driver --skip-arm
With the karpenter controller running, you can define a NebiusNodeClass and NodePool to tell Karpenter how and when to provision Nebius nodes.
The NebiusNodeClass defines the Nebius-specific configuration for provisioned nodes:
$ kubectl apply -f examples/nebius/nodeclass.yamlVerify the node class is ready:
$ kubectl get nebiusnodeclass
NAME READY AGE
nebius True 3sNote: The
wireguardPeerCIDRfield in theNebiusNodeClassis only required when using WireGuard for cross-cloud connectivity. When using Unbounded CNI, this field should not be set.
The NodePool defines scheduling constraints and references the NebiusNodeClass:
$ kubectl apply -f examples/nebius/cpu_nodepool.yamlVerify the node pool is ready:
$ kubectl get nodepool
NAME NODECLASS NODES READY AGE
nebius-cpu-nodepool nebius 0 True 4sFor GPU workloads, create a separate NodePool that does not restrict by CPU SKU. The karpenter.azure.com/sku-cpu label is not present on GPU instance types, so the CPU NodePool's Gt requirement would prevent GPU instances from ever being selected. The GPU NodePool omits that constraint and relies on the workload's node affinity (via node.kubernetes.io/instance-type) to select the appropriate GPU instance:
$ kubectl apply -f examples/nebius/gpu_nodepool.yamlBoth node pools should now be ready:
$ kubectl get nodepool
NAME NODECLASS NODES READY AGE
nebius-cpu-nodepool nebius 0 True 4s
nebius-gpu-nodepool nebius 0 True 2sCreate a deployment that schedules pods away from system nodes. Karpenter will detect the unschedulable pods and provision a new Nebius node:
$ kubectl apply -f examples/nebius/cpu_deployment.yamlThe pod will initially be in Pending state while Karpenter provisions a new node:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
sample-cpu-app-6c7bb4ccb-wbl5h 1/1 Running 0 9m51sKarpenter creates a NodeClaim to request a new node from Nebius:
$ kubectl get nodeclaims
NAME TYPE CAPACITY ZONE NODE READY AGE
nebius-cpu-nodepool-6g8v8 cpu-d3-16vcpu-64gb on-demand 1 computeinstance-e00a4p0rrnms9n24jp True 9m35sAfter a few minutes, the new Nebius node should appear:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-system-94214615-vmss000000 Ready <none> 3h13m v1.34.2 172.16.1.4 <none> Ubuntu 22.04.5 LTS 5.15.0-1102-azure containerd://1.7.30-1
aks-system-94214615-vmss000001 Ready <none> 3h13m v1.34.2 172.16.1.5 <none> Ubuntu 22.04.5 LTS 5.15.0-1102-azure containerd://1.7.30-1
aks-system-94214615-vmss000002 Ready <none> 3h13m v1.34.2 172.16.1.6 <none> Ubuntu 22.04.5 LTS 5.15.0-1102-azure containerd://1.7.30-1
aks-wireguard-23306360-vmss000000 Ready <none> 3h9m v1.34.2 172.16.2.4 20.91.194.208 Ubuntu 22.04.5 LTS 5.15.0-1102-azure containerd://1.7.30-1
computeinstance-e00a4p0rrnms9n24jp Ready <none> 8m30s v1.33.3 100.96.1.237 <none> Ubuntu 24.04.4 LTS 6.11.0-1016-nvidia containerd://2.0.4The pod is now running on the Nebius node:
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sample-cpu-app-6c7bb4ccb-wbl5h 1/1 Running 0 10m 10.0.10.159 computeinstance-e00a4p0rrnms9n24jp <none> <none>$ kubectl logs -f sample-cpu-app-6c7bb4ccb-wbl5h
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2026/02/27 20:37:12 [notice] 1#1: using the "epoll" event method
2026/02/27 20:37:12 [notice] 1#1: nginx/1.21.6
2026/02/27 20:37:12 [notice] 1#1: built by gcc 10.2.1 20210110 (Debian 10.2.1-6)
2026/02/27 20:37:12 [notice] 1#1: OS: Linux 6.11.0-1016-nvidia
2026/02/27 20:37:12 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576For GPU workloads, create a deployment that requests GPU resources and targets GPU instance types:
$ kubectl apply -f examples/nebius/gpu_deployment.yamlThe GPU pod will be pending while Karpenter provisions a GPU node:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
sample-cpu-app-6c7bb4ccb-wbl5h 1/1 Running 0 11m
sample-gpu-app-5d8b85c989-5l9zt 0/1 Pending 0 5sKarpenter creates a new NodeClaim for the GPU instance:
$ kubectl get nodeclaims
NAME TYPE CAPACITY ZONE NODE READY AGE
nebius-cpu-nodepool-6g8v8 cpu-d3-16vcpu-64gb on-demand 1 computeinstance-e00a4p0rrnms9n24jp True 11m
nebius-gpu-nodepool-r2qwq gpu-h100-sxm-8gpu-128vcpu-1600gb on-demand Unknown 16sNote: GPU workloads require an NVIDIA plugin to be installed so that GPU resources are advertised to the scheduler. Install one with the CLI before creating GPU workloads:
# NVIDIA GPU Device Plugin (standard resource-based allocation) aks-flex-cli aks deploy --nvidia-device-plugin --skip-arm # NVIDIA DRA Driver (Dynamic Resource Allocation) aks-flex-cli aks deploy --nvidia-dra-driver --skip-arm
After the GPU node is provisioned, both nodes and pods should be running:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-system-94214615-vmss000000 Ready <none> 3h19m v1.34.2
aks-system-94214615-vmss000001 Ready <none> 3h19m v1.34.2
aks-system-94214615-vmss000002 Ready <none> 3h18m v1.34.2
aks-wireguard-23306360-vmss000000 Ready <none> 3h14m v1.34.2
computeinstance-e00a4p0rrnms9n24jp Ready <none> 13m v1.33.3
computeinstance-e00zjdx1e50bxcfekk Ready <none> 107s v1.33.3$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sample-cpu-app-6c7bb4ccb-7dg9t 1/1 Running 0 75s 10.0.12.199 computeinstance-e00zjdx1e50bxcfekk <none> <none>
sample-gpu-app-5d8b85c989-5l9zt 1/1 Running 0 4m17s 10.0.12.66 computeinstance-e00zjdx1e50bxcfekk <none> <none>$ kubectl logs -f sample-gpu-app-76b4884cbd-m8bft
Sun Feb 22 22:05:49 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.195.03 Driver Version: 570.195.03 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:8D:00.0 Off | 0 |
| N/A 28C P0 68W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+When demand decreases, Karpenter automatically deprovisions nodes that are no longer needed. To test this, scale the deployments down:
$ kubectl scale deployment sample-cpu-app --replicas=0
$ kubectl scale deployment sample-gpu-app --replicas=0You can observe the disruption lifecycle by describing a node claim:
$ kubectl describe nodeclaims nebius-nodepool-dfpmw
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Launched 9m15s karpenter Status condition transitioned, Type: Launched, Status: Unknown -> True, Reason: Launched
Normal DisruptionBlocked 7m5s (x2 over 9m15s) karpenter Nodeclaim does not have an associated node
Normal Registered 5m30s karpenter Status condition transitioned, Type: Registered, Status: Unknown -> True, Reason: Registered
Normal DisruptionBlocked 4m41s karpenter Node isn't initialized
Normal Initialized 2m56s karpenter Status condition transitioned, Type: Initialized, Status: Unknown -> True, Reason: Initialized
Normal Ready 2m56s karpenter Status condition transitioned, Type: Ready, Status: Unknown -> True, Reason: Ready
Normal DisruptionBlocked 2m35s karpenter Node is nominated for a pending pod
Normal Unconsolidatable 111s karpenter Not all pods would schedule, default/sample-gpu-app-76b4884cbd-m8bft => would schedule against uninitialized nodeclaim/nebius-nodepool-7g2rq default/sample-app-66986dd6c6-qs6gt => would schedule against uninitialized nodeclaim/nebius-nodepool-7g2rq
Normal DisruptionTerminating 19s karpenter Disrupting NodeClaim: Underutilized
Normal DisruptionBlocked 19s karpenter Node is deleting or marked for deletionAfter the disruption grace period, Karpenter will terminate the idle Nebius nodes and they will be removed from the cluster:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-system-32742974-vmss000000 Ready <none> 18h v1.33.6
aks-system-32742974-vmss000001 Ready <none> 18h v1.33.6
aks-wireguard-12237243-vmss000000 Ready <none> 18h v1.33.6

