Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new AKS website blog post explaining how to use DRANET + Dynamic Resource Allocation (DRA) for NUMA-aware GPU/NIC alignment (RDMA performance) and includes accompanying control-plane/data-plane diagrams (Mermaid sources + exported SVGs).
Changes:
- New blog post: RDMA/NUMA scheduling problem statement, DRANET architecture, DRA ResourceClaimTemplate examples, and NCCL benchmark walkthrough/results.
- Added control-plane and data-plane diagrams as both
.mmdsources and rendered.svgassets.
Reviewed changes
Copilot reviewed 3 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| website/blog/2026-04-01-dranet-rdma-optimization-for-ai-on-aks/index.md | New blog post content + configuration examples + benchmark walkthrough |
| website/blog/2026-04-01-dranet-rdma-optimization-for-ai-on-aks/control-plane-diagram.mmd | Mermaid source for control-plane diagram |
| website/blog/2026-04-01-dranet-rdma-optimization-for-ai-on-aks/control-plane-diagram.svg | Rendered control-plane diagram used by the post |
| website/blog/2026-04-01-dranet-rdma-optimization-for-ai-on-aks/data-plane-diagram.mmd | Mermaid source for data-plane diagram |
| website/blog/2026-04-01-dranet-rdma-optimization-for-ai-on-aks/data-plane-diagram.svg | Rendered data-plane diagram used by the post |
website/blog/2026-04-01-dranet-rdma-optimization-for-ai-on-aks/index.md
Outdated
Show resolved
Hide resolved
|
|
||
| Large-scale AI training and inferencing on Kubernetes depends on high-throughput, low-latency GPU-to-GPU communication. [DRANET](https://github.com/kubernetes-sigs/dranet) is an open-source DRA network driver that discovers RDMA capable devices, exposes their topology as Kubernetes DRA attributes, and injects only desired devices into each container. Combined with the [NVIDIA GPU DRA driver](https://github.com/kubernetes-purgatory/nvidia-dra-driver-gpu), it enables topology-aware co-scheduling of GPUs and NICs to deliver high-performance networking for demanding applications in Kubernetes. | ||
|
|
||
| In previous post, we covered [fundamental DRA concepts](/2025/11/17/dra-devices-and-drivers-on-kubernetes). In this post, we walk through how DRANET works on [AKS 1.34](https://kubernetes.io/blog/2025/09/01/kubernetes-v1-34-dra-updates/) with [ND GB300-v6](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nd-gb300-v6-series?tabs=sizebasic) nodes, demonstrate three NUMA (Non-uniform memory access) alignment scenarios, and show the benchmark results. |
There was a problem hiding this comment.
Grammar: "In previous post" reads like a missing article. Consider changing to "In a previous post, we covered..." (or similar) for correct English.
| In previous post, we covered [fundamental DRA concepts](/2025/11/17/dra-devices-and-drivers-on-kubernetes). In this post, we walk through how DRANET works on [AKS 1.34](https://kubernetes.io/blog/2025/09/01/kubernetes-v1-34-dra-updates/) with [ND GB300-v6](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nd-gb300-v6-series?tabs=sizebasic) nodes, demonstrate three NUMA (Non-uniform memory access) alignment scenarios, and show the benchmark results. | |
| In a previous post, we covered [fundamental DRA concepts](/2025/11/17/dra-devices-and-drivers-on-kubernetes). In this post, we walk through how DRANET works on [AKS 1.34](https://kubernetes.io/blog/2025/09/01/kubernetes-v1-34-dra-updates/) with [ND GB300-v6](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nd-gb300-v6-series?tabs=sizebasic) nodes, demonstrate three NUMA (Non-uniform memory access) alignment scenarios, and show the benchmark results. |
|
|
||
| Large-scale AI training and inferencing on Kubernetes depends on high-throughput, low-latency GPU-to-GPU communication. [DRANET](https://github.com/kubernetes-sigs/dranet) is an open-source DRA network driver that discovers RDMA capable devices, exposes their topology as Kubernetes DRA attributes, and injects only desired devices into each container. Combined with the [NVIDIA GPU DRA driver](https://github.com/kubernetes-purgatory/nvidia-dra-driver-gpu), it enables topology-aware co-scheduling of GPUs and NICs to deliver high-performance networking for demanding applications in Kubernetes. | ||
|
|
||
| In previous post, we covered [fundamental DRA concepts](/2025/11/17/dra-devices-and-drivers-on-kubernetes). In this post, we walk through how DRANET works on [AKS 1.34](https://kubernetes.io/blog/2025/09/01/kubernetes-v1-34-dra-updates/) with [ND GB300-v6](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nd-gb300-v6-series?tabs=sizebasic) nodes, demonstrate three NUMA (Non-uniform memory access) alignment scenarios, and show the benchmark results. |
There was a problem hiding this comment.
The Microsoft Learn URL uses a locale-specific path (/en-us/). Repo blog guidance recommends using locale-agnostic Learn links (no /en-us/) to avoid unnecessary redirects and keep links consistent.
| In previous post, we covered [fundamental DRA concepts](/2025/11/17/dra-devices-and-drivers-on-kubernetes). In this post, we walk through how DRANET works on [AKS 1.34](https://kubernetes.io/blog/2025/09/01/kubernetes-v1-34-dra-updates/) with [ND GB300-v6](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nd-gb300-v6-series?tabs=sizebasic) nodes, demonstrate three NUMA (Non-uniform memory access) alignment scenarios, and show the benchmark results. | |
| In previous post, we covered [fundamental DRA concepts](/2025/11/17/dra-devices-and-drivers-on-kubernetes). In this post, we walk through how DRANET works on [AKS 1.34](https://kubernetes.io/blog/2025/09/01/kubernetes-v1-34-dra-updates/) with [ND GB300-v6](https://learn.microsoft.com/azure/virtual-machines/sizes/gpu-accelerated/nd-gb300-v6-series?tabs=sizebasic) nodes, demonstrate three NUMA (Non-uniform memory access) alignment scenarios, and show the benchmark results. |
| | Resource | Count | Detail | | ||
| |---|---|---| | ||
| | GPU | 4x NVIDIA GB300 | 288 GB HBM3E each, NVLink-18 all-to-all | | ||
| | NIC | 4x Mellanox ConnectX | 800 Gb/s InfiniBand each | | ||
| | NUMA nodes | 2 | 2 GPUs + 2 NICs per NUMA node | |
There was a problem hiding this comment.
Several Markdown tables start with || (double leading pipe), which renders as an extra empty first column in CommonMark/Docusaurus tables. Removing the extra leading | (use | ... | ... |) will make the tables render as intended.
| - name: nic | ||
| exactly: | ||
| deviceClassName: dranet.net | ||
| count: 1 | ||
| selectors: | ||
| - cel: | ||
| expression: >- | ||
| device.attributes["dra.net"]["rdmaDevice"] == "mlx5_2" | ||
| ``` |
There was a problem hiding this comment.
Same inconsistency here as in the earlier templates: deviceClassName: dranet.net should match the driver/DeviceClass identifier used elsewhere in the post (e.g., the dra.net driver/attribute namespace shown above).
9842ea5 to
ff109d6
Compare
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| dra.net/numaNode: | ||
| int: 0 | ||
| dra.net/pciAddress: | ||
| string: "0101:00:00.0" | ||
| dra.net/rdma: | ||
| bool: true | ||
| dra.net/rdmaDevice: | ||
| string: mlx5_0 | ||
| dra.net/pciVendor: | ||
| string: Mellanox Technologies |
There was a problem hiding this comment.
The ResourceSlice example defines attribute keys like dra.net/numaNode / dra.net/rdmaDevice with typed values (int, string, bool), but the CEL selectors later access them as device.attributes["dra.net"]["numaNode"] / ... ["rdmaDevice"] and compare directly to primitives. Please make the attribute schema and the selector syntax consistent (either update the ResourceSlice example to match the selector structure, or update selectors to reference the exact attribute keys/types shown in the ResourceSlice example).
| dra.net/numaNode: | |
| int: 0 | |
| dra.net/pciAddress: | |
| string: "0101:00:00.0" | |
| dra.net/rdma: | |
| bool: true | |
| dra.net/rdmaDevice: | |
| string: mlx5_0 | |
| dra.net/pciVendor: | |
| string: Mellanox Technologies | |
| dra.net: | |
| numaNode: | |
| int: 0 | |
| pciAddress: | |
| string: "0101:00:00.0" | |
| rdma: | |
| bool: true | |
| rdmaDevice: | |
| string: mlx5_0 | |
| pciVendor: | |
| string: Mellanox Technologies |
| expression: >- | ||
| device.attributes["dra.net"]["rdmaDevice"] == "mlx5_0" | ||
| ``` | ||
|
|
There was a problem hiding this comment.
The CEL selector examples appear to assume attributes are nested under device.attributes["dra.net"] and directly comparable (e.g., == 0 / == true). If the published attributes follow the dra.net/<key>: {int|bool|string: ...} pattern shown earlier, these selectors won’t match as written. Please update the selector examples to the correct attribute access pattern so readers can copy/paste them successfully.
| | Resource | Count | Detail | | ||
| |---|---|---| | ||
| | GPU | 4x NVIDIA GB300 | 288 GB HBM3E each, NVLink-18 all-to-all | | ||
| | NIC | 4x Mellanox ConnectX | 800 Gb/s InfiniBand each | | ||
| | NUMA nodes | 2 | 2 GPUs + 2 NICs per NUMA node | |
There was a problem hiding this comment.
The hardware-topology table is written with a leading || on each row (for example || Resource | Count | Detail |), which renders as an extra empty first column in Markdown. Convert these rows to standard table syntax (single leading |, or | | only when you intentionally need a blank header cell) so the table renders as intended.
| dra.net/numaNode: | ||
| int: 0 | ||
| dra.net/pciAddress: | ||
| string: "0101:00:00.0" | ||
| dra.net/rdma: | ||
| bool: true | ||
| dra.net/rdmaDevice: | ||
| string: mlx5_0 | ||
| dra.net/pciVendor: | ||
| string: Mellanox Technologies |
There was a problem hiding this comment.
In the ResourceSlice example, attributes are shown with flat keys like dra.net/numaNode and dra.net/pciAddress, but later the CEL selectors access attributes as a nested map (device.attributes["dra.net"]["..."]). These formats are inconsistent; update the examples so the published attributes and selector expressions use the same schema.
| dra.net/numaNode: | |
| int: 0 | |
| dra.net/pciAddress: | |
| string: "0101:00:00.0" | |
| dra.net/rdma: | |
| bool: true | |
| dra.net/rdmaDevice: | |
| string: mlx5_0 | |
| dra.net/pciVendor: | |
| string: Mellanox Technologies | |
| dra.net: | |
| numaNode: | |
| int: 0 | |
| pciAddress: | |
| string: "0101:00:00.0" | |
| rdma: | |
| bool: true | |
| rdmaDevice: | |
| string: mlx5_0 | |
| pciVendor: | |
| string: Mellanox Technologies |
|
|
||
| ## ResourceClaimTemplates for topology-aware allocation | ||
|
|
||
| With both drivers publishing ResourceSlices, we can write ResourceClaimTemplates that use CEL selectors to express precise GPU-NIC co-location constraints. Each template creates a per-pod ResourceClaim that requests devices from both the `gpu.nvidia.com` and `dranet.net` DeviceClasses, filtered by attributes like NUMA node or PCI address. We define three templates to demonstrate different NUMA placement strategies. |
There was a problem hiding this comment.
This section says the NIC devices come from the dranet.net DeviceClass, but elsewhere in the post the NIC driver/namespace is dra.net (for example driver: dra.net and dra.net/* attributes). Please clarify the intended identifiers (DeviceClass vs driver name) and make the examples consistent so readers can copy/paste them reliably.
| With both drivers publishing ResourceSlices, we can write ResourceClaimTemplates that use CEL selectors to express precise GPU-NIC co-location constraints. Each template creates a per-pod ResourceClaim that requests devices from both the `gpu.nvidia.com` and `dranet.net` DeviceClasses, filtered by attributes like NUMA node or PCI address. We define three templates to demonstrate different NUMA placement strategies. | |
| With both drivers publishing ResourceSlices, we can write ResourceClaimTemplates that use CEL selectors to express precise GPU-NIC co-location constraints. Each template creates a per-pod ResourceClaim that requests devices from both the `gpu.nvidia.com` DeviceClass and the DRANET NIC DeviceClass, `dranet.net`, filtered by attributes like NUMA node or PCI address. In these examples, `dranet.net` is the DeviceClass name, while `dra.net` is the DRANET driver name and attribute namespace used in the published device attributes and CEL selectors. We define three templates to demonstrate different NUMA placement strategies. |
No description provided.