Nebius Cloud Integration

Overview

This guide walks through joining nodes from Nebius cloud to an existing AKS Flex cluster. By the end you will have:

A Nebius VPC network peered with the Azure-side network
A CPU node running in Nebius, joined to the AKS cluster
A GPU node running in Nebius, joined to the AKS cluster

The workflow uses two CLI command groups:

aks-flex-cli config -- generates JSON configuration templates for Nebius resources
aks-flex-cli plugin -- applies, queries, and deletes Nebius resources via the plugin backend

Setup

Prerequisites

An AKS cluster with network resources already deployed -- see AKS Cluster Setup
The .env file must include Nebius configuration (generate with aks-flex-cli config env --nebius):

# Nebius Config
export NEBIUS_PROJECT_ID=<your-nebius-project-id>
export NEBIUS_REGION=<your-nebius-region>
export NEBIUS_CREDENTIALS_FILE=<path-to-nebius-credentials-json>

See the Nebius authorized keys documentation for creating the credentials file.

Desired Cluster Setup

We will join two Nebius nodes to the AKS cluster:

Node	Platform	Preset	Image Family	Purpose
CPU node	`cpu-d3`	`4vcpu-16gb`	`ubuntu24.04-driverless`	General-purpose workloads
GPU node	`gpu-h100-sxm`	`1gpu-16vcpu-200gb`	`ubuntu24.04-cuda12`	GPU-accelerated workloads

Create Nebius Network Resources

Before creating nodes, you need to provision a VPC network in Nebius that will be connected to the Azure-side network.

Generate the network config

Use config networks to generate a default Nebius network JSON template:

$ aks-flex-cli config networks nebius > nebius-network.json

This produces a JSON file like:

{
  "metadata": {
    "type": "networks.nebius.network.Network",
    "id": "<replace-with-unique-network-name>"
  },
  "spec": {
    "projectId": "<your-nebius-project-id>",
    "region": "<your-nebius-region>",
    "vnet": {
      "cidrBlock": "172.20.0.0/16"
    }
  }
}

Review the generated file and update the placeholder values:

Field	Description	Default
`metadata.id`	Name of the network resource, should be unique within the Nebius project	`nebius-default` (replace with your own)
`spec.projectId`	Nebius project ID	from `.env`
`spec.region`	Nebius region	from `.env`
`spec.vnet.cidrBlock`	CIDR block for the Nebius VPC	`172.20.0.0/16`

Apply the network config

Pipe the JSON into the plugin apply command:

$ cat nebius-network.json | aks-flex-cli plugin apply networks

Expected output:

2026/02/21 20:07:00 Applied "nebius-default" (type: networks.nebius.network.Network)

Verify the network

$ aks-flex-cli plugin get networks nebius-default

{
  "metadata": {
    "type": "type.googleapis.com/networks.nebius.network.Network",
    "id": "<your-network-name>"
  },
  "spec": {
    "projectId": "<your-nebius-project-id>",
    "region": "<your-nebius-region>",
    "vnet": {
      "cidrBlock": "172.20.0.0/16"
    }
  },
  "status": {
    ...
  }
}

Nebius - Azure Network Connectivity

AKS Flex uses WireGuard to establish an encrypted site-to-site tunnel between the Azure VNet and the Nebius VPC. On top of this tunnel, Cilium's VXLAN overlay is used to extend the Kubernetes pod network across clouds so that pods on Azure and Nebius nodes can communicate seamlessly.

The following diagram illustrates the connectivity:

                  Azure                                             Nebius
  ┌──────────────────────────────┐             ┌──────────────────────────────┐
  │  VNet: 172.16.0.0/16         │             │  VPC: 172.20.0.0/16          │
  │                              │             │                              │
  │  ┌────────────┐              │  WireGuard  │              ┌────────────┐  │
  │  │ AKS Node   │              │  Tunnel     │              │ Nebius VM  │  │
  │  │            │              │◄───────────►│              │            │  │
  │  └────────────┘              │  (UDP/51820)│              └────────────┘  │
  │                              │             │                              │
  │  ┌────────────┐              │             │              ┌────────────┐  │
  │  │ WireGuard  │──────────────┼─────────────┼──────────────│ WireGuard  │  │
  │  │ Gateway    │  Peer IP: 100.96.x.x       │              │ Peer       │  │
  │  └────────────┘              │             │              └────────────┘  │
  │                              │             │                              │
  │         Cilium VXLAN overlay spans across both clouds                     │
  └──────────────────────────────┘             └──────────────────────────────┘

Peer IP assignment

Each node that participates in the WireGuard mesh is assigned a peer IP from the 100.96.0.0/12 range. This peer IP is critical because it serves as the node's address within the WireGuard tunnel and must be set as the node's InternalIP in Kubernetes. This allows kube-proxy to correctly forward service traffic to the node, since kube-proxy routes based on the node's InternalIP.

When configuring agent pools, each node must be assigned a unique peer IP from this range. For example:

Node	Peer IP
CPU node	`100.96.1.111`
GPU node	`100.96.1.112`

Create Nebius CPU Node

Generate the agent pool config

Use config agentpools to generate a default Nebius agent pool JSON template:

$ aks-flex-cli config agentpools nebius > nebius-cpu.json

This produces a JSON file like:

{
  "metadata": {
    "type": "agentpools.nebius.instance.AgentPool",
    "id": "nebius-default"
  },
  "spec": {
    "projectId": "<your-nebius-project-id>",
    "region": "<your-nebius-region>",
    "subnetId": "<replace-with-actual-value>",
    "platform": "<replace-with-actual-value>",
    "preset": "<replace-with-actual-value>",
    "imageFamily": "<replace-with-actual-value>",
    "osDiskSizeGibibytes": "128",
    "kubeadm": {
      "server": "https://<aks-cluster-fqdn>:443",
      "certificateAuthorityData": "<base64-ca-cert>",
      "token": "<bootstrap-token>",
      "nodeLabels": {
        "aks.azure.com/stretch-managed": "true",
        ...
      }
    },
    "wireguard": {
      "peerIp": "<replace-with-actual-value>"
    }
  }
}

Edit the file to configure a CPU node. Update the placeholder fields:

Field	Value for CPU node	Description
`metadata.id`	`nebius-cpu`	Unique name for this agent pool
`spec.subnetId`	(from Nebius network output)	Subnet ID from the network created in the previous step
`spec.platform`	`cpu-d3`	Nebius compute platform
`spec.preset`	`4vcpu-16gb`	VM size preset
`spec.imageFamily`	`ubuntu24.04-driverless`	OS image family
`spec.wireguard.peerIp`	(unique IP in `100.96.0.0/12`)	WireGuard peer IP for this node (see Peer IP assignment)

The kubeadm section is auto-populated from the running AKS cluster when the .env is configured correctly. If the cluster is not reachable, placeholder values are generated that must be replaced manually.

Apply the agent pool config

$ cat nebius-cpu.json | aks-flex-cli plugin apply agentpools

Expected output:

2026/02/21 20:10:24 Applied "nebius-cpu" (type: agentpools.nebius.instance.AgentPool)

Verify the node joined the cluster

After the node provisions and bootstraps (this may take a few minutes), verify it appears in the AKS cluster:

$ aks-flex-cli plugin get agentpools nebius-cpu

$ export KUBECONFIG=./aks.kubeconfig
$ kubectl get nodes -o wide
k get node -o wide
NAME                                 STATUS     ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
aks-system-32742974-vmss000000       Ready      <none>   40m   v1.33.6   172.16.1.4     <none>        Ubuntu 22.04.5 LTS   5.15.0-1102-azure    containerd://1.7.30-1
aks-system-32742974-vmss000001       Ready      <none>   40m   v1.33.6   172.16.1.5     <none>        Ubuntu 22.04.5 LTS   5.15.0-1102-azure    containerd://1.7.30-1
aks-wireguard-12237243-vmss000000    Ready      <none>   21m   v1.33.6   172.16.2.4     <MASKED>      Ubuntu 22.04.5 LTS   5.15.0-1102-azure    containerd://1.7.30-1
computeinstance-e00c3m3yvj3rhnvhan   Ready      <none>   58s   v1.33.8   100.96.1.111   <none>        Ubuntu 24.04.4 LTS   6.11.0-1016-nvidia   containerd://1.7.28

Create Nebius GPU Node

The process is the same as for the CPU node, but with GPU-specific platform and image settings.

Generate and edit the agent pool config

$ aks-flex-cli config agentpools nebius > nebius-gpu.json

Edit the file to configure a GPU node:

Field	Value for GPU node	Description
`metadata.id`	`nebius-gpu`	Unique name for this agent pool
`spec.subnetId`	(from Nebius network output)	Same subnet as the CPU node
`spec.platform`	(GPU platform, e.g. `gpu-h100-sxm`)	Nebius GPU compute platform
`spec.preset`	(GPU preset, e.g. `1gpu-16vcpu-200gb`)	GPU VM size preset
`spec.imageFamily`	(GPU image, e.g. `ubuntu24.04-cuda12`)	OS image with GPU drivers
`spec.wireguard.peerIp`	(unique IP in `100.96.0.0/12`)	WireGuard peer IP (must differ from CPU node, see Peer IP assignment)

Apply the agent pool config

$ cat nebius-gpu.json | aks-flex-cli plugin apply agentpools

Expected output:

2026/02/21 20:16:36 Applied "nebius-gpu" (type: agentpools.nebius.instance.AgentPool)

Verify the node joined the cluster

$ aks-flex-cli plugin get agentpools nebius-gpu

$ kubectl get nodes -o wide
k get node -o wide
NAME                                 STATUS   ROLES    AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
aks-system-32742974-vmss000000       Ready    <none>   50m     v1.33.6   172.16.1.4     <none>        Ubuntu 22.04.5 LTS   5.15.0-1102-azure    containerd://1.7.30-1
aks-system-32742974-vmss000001       Ready    <none>   50m     v1.33.6   172.16.1.5     <none>        Ubuntu 22.04.5 LTS   5.15.0-1102-azure    containerd://1.7.30-1
aks-wireguard-12237243-vmss000000    Ready    <none>   31m     v1.33.6   172.16.2.4     <MASKED>      Ubuntu 22.04.5 LTS   5.15.0-1102-azure    containerd://1.7.30-1
computeinstance-e00c3m3yvj3rhnvhan   Ready    <none>   9m57s   v1.33.8   100.96.1.111   <none>        Ubuntu 24.04.4 LTS   6.11.0-1016-nvidia   containerd://1.7.28
computeinstance-e00vm3hfp0gac4e5vz   Ready    <none>   117s    v1.33.8   100.96.1.112   <none>        Ubuntu 24.04.4 LTS   6.11.0-1016-nvidia   containerd://1.7.28

Validating cross-cloud connectivity

With the WireGuard tunnel and Cilium VXLAN overlay in place, pods running on the Nebius nodes should be able to communicate with pods on the AKS nodes, and vice versa. We can validate this by checking the logs from pods running on the Nebius nodes:

$ export GPU_NODE_NAME="computeinstance-e00vm3hfp0gac4e5vz"
$ kubectl -n kube-system logs -f $(kubectl -n kube-system get pod --field-selector spec.nodeName=$GPU_NODE_NAME -l component=kube-proxy -o jsonpath='{.items[*].metadata.name}')
Defaulted container "kube-proxy" out of: kube-proxy, kube-proxy-bootstrap (init)
I0222 04:20:45.184240       1 server_linux.go:63] "Using iptables proxy"
I0222 04:20:45.184345       1 flags.go:64] FLAG: --bind-address="0.0.0.0"
I0222 04:20:45.184351       1 flags.go:64] FLAG: --bind-address-hard-fail="false"
I0222 04:20:45.184355       1 flags.go:64] FLAG: --boot-id-file="/proc/sys/kernel/random/boot_id"
I0222 04:20:45.184357       1 flags.go:64] FLAG: --cleanup="false"
I0222 04:20:45.184359       1 flags.go:64] FLAG: --cluster-cidr="10.244.0.0/16"

GPU Device Plugin

TODO: GPU workloads require a device plugin to expose GPU resources to the Kubernetes scheduler. Currently this must be installed manually. Document the steps for installing the NVIDIA device plugin (or GPU operator) on the Nebius GPU node once the process is finalized.

Clean up resources

To remove the Nebius nodes and network, delete them in reverse order: agent pools first, then the network.

Delete agent pools

$ aks-flex-cli plugin delete agentpools nebius-gpu
$ aks-flex-cli plugin delete agentpools nebius-cpu

Expected output:

...
2026/02/21 20:29:19 Deleting "nebius-cpu"...
2026/02/21 20:30:47 Successfully deleted "nebius-cpu"

Verify the nodes are showing as NotReady in Kubernetes, indicating the kubelet has been stopped.

$ kubectl get nodes
NAME                                 STATUS     ROLES    AGE   VERSION
aks-system-32742974-vmss000000       Ready      <none>   58m   v1.33.6
aks-system-32742974-vmss000001       Ready      <none>   58m   v1.33.6
aks-wireguard-12237243-vmss000000    Ready      <none>   39m   v1.33.6
computeinstance-e00c3m3yvj3rhnvhan   NotReady   <none>   18m   v1.33.8
computeinstance-e00vm3hfp0gac4e5vz   NotReady   <none>   10m   v1.33.8

Delete the network

$ aks-flex-cli plugin delete networks nebius-default

Expected output:

2026/02/21 20:31:35 Deleting "nebius-default"...
2026/02/21 20:31:37 Successfully deleted "nebius-default"

List remaining resources

Confirm all Nebius resources are cleaned up:

$ aks-flex-cli plugin get networks
[]
$ aks-flex-cli plugin get agentpools
[]

Both commands should return empty lists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nebius Cloud Integration

Overview

Setup

Prerequisites

Desired Cluster Setup

Create Nebius Network Resources

Generate the network config

Apply the network config

Verify the network

Nebius - Azure Network Connectivity

Peer IP assignment

Create Nebius CPU Node

Generate the agent pool config

Apply the agent pool config

Verify the node joined the cluster

Create Nebius GPU Node

Generate and edit the agent pool config

Apply the agent pool config

Verify the node joined the cluster

Validating cross-cloud connectivity

GPU Device Plugin

Clean up resources

Delete agent pools

Delete the network

List remaining resources

FilesExpand file tree

cli-plugin-nebius.md

Latest commit

History

cli-plugin-nebius.md

File metadata and controls

Nebius Cloud Integration

Overview

Setup

Prerequisites

Desired Cluster Setup

Create Nebius Network Resources

Generate the network config

Apply the network config

Verify the network

Nebius - Azure Network Connectivity

Peer IP assignment

Create Nebius CPU Node

Generate the agent pool config

Apply the agent pool config

Verify the node joined the cluster

Create Nebius GPU Node

Generate and edit the agent pool config

Apply the agent pool config

Verify the node joined the cluster

Validating cross-cloud connectivity

GPU Device Plugin

Clean up resources

Delete agent pools

Delete the network

List remaining resources