Skip to content

Release 18+ Upgrade Guide Breaks Existing Deployments #1744

@jseiser

Description

@jseiser

Description

Attempted to follow the upgrade guide to get to 18+. Our Terraform deployments generally run from a Jenkins worker pod, that exists ON the same cluster that we are upgrading. The pod has a service account on it, using the IRSA setup which grants it access to the cluster.
This all works/worked before the upgrade.

Reproduction

Attempt to follow the upgrade guide for 18.

Code Snippet to Reproduce

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "18.0.1"

  cluster_name    = format("eks-%s-%s-%s", var.layer, var.vpc_id_tag, var.platform_env)
  cluster_version = var.cluster_version

  subnet_ids = data.aws_subnet_ids.private.ids
  vpc_id     = data.terraform_remote_state.vpc.outputs.vpc_id

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = false

  cluster_security_group_additional_rules = {
    admin_access = {
      description = "Admin ingress to Kubernetes API"
      cidr_blocks = [data.terraform_remote_state.vpc.outputs.vpc_cidr_block]
      protocol    = "tcp"
      from_port   = 443
      to_port     = 443
      type        = "ingress"
    }
  }

  eks_managed_node_group_defaults = {
    ami_type                   = "AL2_x86_64"
    disk_size                  = var.node_group_default_disk_size
    enable_bootstrap_user_data = true
    pre_bootstrap_user_data    = templatefile("${path.module}/templates/userdata.tpl", {})
    desired_size               = lower(var.platform_env) == "prod" ? 3 : 2
    max_size                   = lower(var.platform_env) == "prod" ? 6 : 3
    min_size                   = lower(var.platform_env) == "prod" ? 3 : 1
    instance_types             = lower(var.platform_env) == "prod" ? var.prod_instance_types : var.dev_instance_types
    capacity_type              = "ON_DEMAND"
    additional_tags = {
      Name = format("%s-%s-%s", var.layer, var.vpc_id_tag, var.platform_env)
    }
    update_config = {
      max_unavailable_percentage = 50
    }
    update_launch_template_default_version = true
    create_launch_template                 = true
    create_iam_role                        = true
    iam_role_name                          = format("iam-%s-%s-%s", var.layer, var.vpc_id_tag, var.platform_env)
    iam_role_use_name_prefix               = false
    iam_role_description                   = "EKS managed node group Role"
    iam_role_tags                          = local.tags
    iam_role_additional_policies = [
      "arn:aws-us-gov:iam::aws:policy/AmazonSSMManagedInstanceCore"
    ]
  }
  eks_managed_node_groups = {
    private1 = {
      subnet_ids = [tolist(data.aws_subnet_ids.private.ids)[0]]
    }
    private2 = {
      subnet_ids = [tolist(data.aws_subnet_ids.private.ids)[1]]
    }
    private3 = {
      subnet_ids = [tolist(data.aws_subnet_ids.private.ids)[2]]
    }
  }

  cluster_enabled_log_types              = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
  cloudwatch_log_group_retention_in_days = 7

  enable_irsa = true

  cluster_encryption_config = [
    {
      provider_key_arn = aws_kms_key.eks.arn
      resources        = ["secrets"]
    }
  ]

  tags = merge(
    local.tags,
    {
      "Name"        = format("eks-%s-%s-%s", var.layer, var.vpc_id_tag, var.platform_env),
      "EKS_VERSION" = var.cluster_version
    }
  )

}

resource "null_resource" "patch" {
  triggers = {
    kubeconfig = base64encode(local.kubeconfig)
    cmd_patch  = "kubectl patch configmap/aws-auth --patch \"${local.aws_auth_configmap_yaml}\" -n kube-system --kubeconfig <(echo $KUBECONFIG | base64 --decode)"
  }

  provisioner "local-exec" {
    interpreter = ["/bin/bash", "-c"]
    environment = {
      KUBECONFIG = self.triggers.kubeconfig
    }
    command = self.triggers.cmd_patch
  }
}

locals

locals {

  kubeconfig = yamlencode({
    apiVersion      = "v1"
    kind            = "Config"
    current-context = "terraform"
    clusters = [{
      name = module.eks.cluster_id
      cluster = {
        certificate-authority-data = module.eks.cluster_certificate_authority_data
        server                     = module.eks.cluster_endpoint
      }
    }]
    contexts = [{
      name = "terraform"
      context = {
        cluster = module.eks.cluster_id
        user    = "terraform"
      }
    }]
    users = [{
      name = "terraform"
      user = {
        token = data.aws_eks_cluster_auth.eks.token
      }
    }]
  })

  aws_auth_configmap_yaml = <<-EOT
  ${chomp(module.eks.aws_auth_configmap_yaml)}
      - rolearn: arn:${var.iam_partition}:iam::${data.aws_caller_identity.current.account_id}:role/role-gitlab-runner-eks-${var.platform_env}
        username: gitlab:{{SessionName}}
        groups:
          - system:masters
      - rolearn: arn:${var.iam_partition}:iam::${data.aws_caller_identity.current.account_id}:role/role-jenkins-worker-eks-${var.platform_env}
        username: jenkins:{{SessionName}}
        groups:
          - system:masters
      - rolearn: arn:${var.iam_partition}:iam::${data.aws_caller_identity.current.account_id}:role/AWSReservedSSO_AdministratorAccess_f50fcd43baf05a89
        username: AWSAdministratorAccess:{{SessionName}}
        groups:
          - system:masters
  EOT
}

Expected behavior

Module will run to completion

Actual behavior

Current aws-auth

sh-4.2$ kubectl get configmap aws-auth -n kube-system -o yaml
apiVersion: v1
data:
  mapAccounts: |
    []
  mapRoles: |
    - "groups":
      - "system:bootstrappers"
      - "system:nodes"
      "rolearn": "arn:aws-us-gov:iam:::role/eks-ops-eks-dev20211104211936784200000009"
      "username": "system:node:{{EC2PrivateDNSName}}"
    - "groups":
      - "system:masters"
      "rolearn": "arn:aws-us-gov:iam:::role/role-gitlab-runner-eks-dev"
      "username": "gitlab-runner-dev"
    - "groups":
      - "system:masters"
      "rolearn": "arn:aws-us-gov:iam:::role/role-jenkins-worker-eks-dev"
      "username": "jenkins-dev"
  mapUsers: |
    - "groups":
      - "system:masters"
      "userarn": "arn:aws-us-gov:iam:::user/justin.seiser"
      "username": "jseiser"
kind: ConfigMap

The SA on the pod, that terraform is running from.

sh-4.2$ kubectl get sa jenkins-worker -n jenkins -o yaml
apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws-us-gov:iam:::role/role-jenkins-worker-eks-dev

The error terraform returns

module.eks.kubernetes_config_map.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]

Error: configmaps "aws-auth" is forbidden: User "system:serviceaccount:jenkins:jenkins-worker" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

Additional context

I do not doubt that im missing something, but that something does not appear to be covered in the documentation that I can find.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions