Skip to content

Risk of Terraform state drift when storage_profile_blob_driver_enabled is true #424

@zioproto

Description

@zioproto

Introduction

When storage_profile_blob_driver_enabled is True, the CSI driver running in the AKS cluster will create a Service Endpoint "Microsoft.Storage" as soon as the first PersistentVolumeClaim is created. This change done by the CSI driver is not tracked in the Terraform state and causes stage drift. Because of the dependencies in the modules this state drift causes the destroy and creation of a new cluster.

Is there an existing issue for this?

  • I have searched the existing issues

Greenfield/Brownfield provisioning

greenfield

Terraform Version

1.5.5

Module Version

7.3.0

AzureRM Provider Version

v3.68.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

# Minimal code to explain the issue
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.56"
    }
  }
  required_version = ">= 1.1.0"
}

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "example" {
  name     = "testResourceGroup"
  location = "eastus"
}

module "network" {
  source              = "Azure/network/azurerm"
  vnet_name           = azurerm_resource_group.example.name
  resource_group_name = azurerm_resource_group.example.name
  address_space       = "10.52.0.0/16"
  subnet_prefixes     = ["10.52.0.0/16"]
  subnet_names        = ["system"]
  use_for_each        = true
  depends_on          = [azurerm_resource_group.example]
}

resource azurerm_role_assignment "aks" {
  scope                = module.network.vnet_id
  role_definition_name = "Network Contributor"
  principal_id         = azurerm_kubernetes_cluster.example.identity[0].principal_id
}

resource "azurerm_kubernetes_cluster" "example" {
  name                = "example-aks1"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  dns_prefix          = "exampleaks1"

  default_node_pool {
    name           = "default"
    node_count     = 1
    vm_size        = "Standard_DS3_v2"
    vnet_subnet_id = module.network.vnet_subnets[0]
  }

  identity {
    type = "SystemAssigned"
  }

  storage_profile {

    blob_driver_enabled = true
  }


}

tfvars variables values

N/A

Debug Output/Panic Output

% terraform apply                  
azurerm_resource_group.example: Refreshing state... [id=/subscriptions/REDACTED/resourceGroups/testResourceGroup]
module.network.data.azurerm_resource_group.network[0]: Reading...
module.network.data.azurerm_resource_group.network[0]: Read complete after 1s [id=/subscriptions/REDACTED/resourceGroups/testResourceGroup]
module.network.azurerm_virtual_network.vnet: Refreshing state... [id=/subscriptions/REDACTED7/resourceGroups/testResourceGroup/providers/Microsoft.Network/virtualNetworks/testResourceGroup]
module.network.azurerm_subnet.subnet_for_each["system"]: Refreshing state... [id=/subscriptionsREDACTED/resourceGroups/testResourceGroup/providers/Microsoft.Network/virtualNetworks/testResourceGroup/subnets/system]
azurerm_kubernetes_cluster.example: Refreshing state... [id=/subscriptions/REDACTED/resourceGroups/testResourceGroup/providers/Microsoft.ContainerService/managedClusters/example-aks1]
azurerm_role_assignment.aks: Refreshing state... [id=/subscriptions/REDACTED/resourceGroups/testResourceGroup/providers/Microsoft.Network/virtualNetworks/testResourceGroup/providers/Microsoft.Authorization/roleAssignments/918a9acd-be5c-8f85-6b01-8866e927e2e2]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # module.network.azurerm_subnet.subnet_for_each["system"] will be updated in-place
  ~ resource "azurerm_subnet" "subnet_for_each" {
        id                                             = "/subscriptions/REDACTED/resourceGroups/testResourceGroup/providers/Microsoft.Network/virtualNetworks/testResourceGroup/subnets/system"
        name                                           = "system"
      ~ service_endpoints                              = [
          - "Microsoft.Storage",
        ]
        # (8 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Expected Behaviour

Terraform plan should show no changes

Actual Behaviour

Terraform will perform the following actions:

  # module.network.azurerm_subnet.subnet_for_each["system"] will be updated in-place
  ~ resource "azurerm_subnet" "subnet_for_each" {
        id                                             = "/subscriptions/REDACTED7/resourceGroups/testResourceGroup/providers/Microsoft.Network/virtualNetworks/testResourceGroup/subnets/system"
        name                                           = "system"
      ~ service_endpoints                              = [
          - "Microsoft.Storage",
        ]
        # (8 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Steps to Reproduce

terraform apply
az aks get-credentials --resource-group testResourceGroup --name example-aks1

apply the following with kubectl:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: echoserver-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: azureblob-nfs-premium
  resources:
    requests:
      storage: 10Gi

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echoserver
spec:
  replicas: 1
  selector:
    matchLabels:
      run: echoserver
  template:
    metadata:
      labels:
        run: echoserver
    spec:
      volumes:
      - name: volume
        persistentVolumeClaim:
          claimName: echoserver-pvc
      containers:
      - name: echoserver
        image: gcr.io/google_containers/echoserver:1.10
        imagePullPolicy: Always
        volumeMounts:
        - mountPath: "/data"
          name: volume
        ports:
        - containerPort: 8080
        readinessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 6
          periodSeconds: 10

Once the Pod is running the Terraform state has drifted, run terraform again to confirm.

Important Factoids

No response

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions