Skip to content

fix(eks): throw error when kubectl subnets are isolated#37217

Merged
mergify[bot] merged 7 commits intomainfrom
fix/issue-26613-isolated-subnet-validation
Mar 19, 2026
Merged

fix(eks): throw error when kubectl subnets are isolated#37217
mergify[bot] merged 7 commits intomainfrom
fix/issue-26613-isolated-subnet-validation

Conversation

@Abogical
Copy link
Copy Markdown
Member

Issue # (if applicable)

Closes #26613.

Reason for this change

When creating an eks.FargateCluster (or any EKS cluster with coreDnsComputeType: FARGATE) in a VPC with only PRIVATE_ISOLATED subnets and private endpoint access enabled, the kubectl Lambda is placed in those isolated subnets. Since isolated subnets have no internet access by definition (no NAT Gateway, no Internet Gateway route), the Lambda cannot reach the EKS API, STS, or other AWS service endpoints. This causes the CoreDnsComputeTypePatch custom resource (and any other kubectl operation) to silently time out after 15 minutes, resulting in a confusing CloudFormation deployment failure.

Description of changes

Added a ValidationError at synth time in both aws-eks and aws-eks-v2 modules that detects when the selected kubectl private subnets include PRIVATE_ISOLATED subnets. The check runs in the Cluster constructor, right after kubectlPrivateSubnets is assigned, by comparing the selected subnets against vpc.isolatedSubnets.

The error message tells users exactly what is wrong and how to fix it:

  • Use PRIVATE_WITH_EGRESS subnets with a NAT Gateway
  • Or configure VPC endpoints for STS, EKS, and ECR
  • Links to the AWS private clusters documentation

Why a hard error instead of a warning: PRIVATE_ISOLATED subnets are created by CDK with no egress route. If CDK created the VPC and the kubectl subnets are isolated, we know with certainty there is no egress and the deployment will fail. Failing fast at synth time is better than a 15-minute Lambda timeout.

Files changed:

  • packages/aws-cdk-lib/aws-eks/lib/cluster.ts — validation after kubectlPrivateSubnets assignment
  • packages/aws-cdk-lib/aws-eks-v2/lib/cluster.ts — same validation after kubectlSubnets assignment
  • packages/aws-cdk-lib/aws-eks/test/cluster.test.ts — 2 new tests
  • packages/aws-cdk-lib/aws-eks-v2/test/cluster.test.ts — 2 new tests

Describe any new or updated permissions being added

No new or updated IAM permissions. This is a synth-time validation only.

Description of how you validated changes

  • Added 4 unit tests (2 per module):
    • throws when kubectl private subnets include isolated subnets — verifies the ValidationError is thrown
    • does not throw when kubectl private subnets are PRIVATE_WITH_EGRESS — verifies no error for valid subnets
  • All 147 existing aws-eks tests pass
  • All 120 existing aws-eks-v2 tests pass
  • No TypeScript diagnostics errors in any modified file

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

Throw a ValidationError at synth time when kubectl private subnets
include PRIVATE_ISOLATED subnets. Isolated subnets have no internet
access by definition, so the kubectl Lambda cannot reach the EKS API,
STS, or other AWS service endpoints required for kubectl operations
(including the CoreDNS compute type patch for FargateCluster).

Previously this resulted in a silent 15-minute Lambda timeout during
deployment. Now users get a clear error at synth time with guidance
to use PRIVATE_WITH_EGRESS subnets or configure VPC endpoints.

Fix applied to both aws-eks and aws-eks-v2 modules.

Closes #26613
@github-actions github-actions bot added bug This issue is a bug. effort/medium Medium work item – several days of effort p1 labels Mar 10, 2026
@aws-cdk-automation aws-cdk-automation requested a review from a team March 10, 2026 16:13
@github-actions github-actions bot added the star-contributor [Pilot] contributed between 25-49 PRs to the CDK label Mar 10, 2026
@Abogical Abogical added the pr-linter/exempt-integ-test The PR linter will not require integ test changes label Mar 10, 2026
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Mar 10, 2026
@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Mar 10, 2026
@Abogical Abogical added the pr/do-not-merge This PR should not be merged at this time. label Mar 11, 2026
@Abogical Abogical removed the pr/do-not-merge This PR should not be merged at this time. label Mar 11, 2026
Comment on lines +1876 to +1886
const isolatedSubnetIds = new Set(this.vpc.isolatedSubnets.map(s => s.subnetId));
const hasIsolatedSubnets = privateSubnets.some(s => isolatedSubnetIds.has(s.subnetId));
if (hasIsolatedSubnets) {
throw new ValidationError(
'Isolated subnets cannot be used for kubectl private subnets. Isolated subnets have no internet access, '
+ 'which is required for the kubectl Lambda to reach the EKS API, STS, and other AWS service endpoints. '
+ 'Use PRIVATE_WITH_EGRESS subnets with a NAT Gateway instead, or configure VPC endpoints for STS, EKS, and ECR. '
+ 'See https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html',
this,
);
}
Copy link
Copy Markdown
Contributor

@vishaalmehrishi vishaalmehrishi Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation throws unconditionally for isolated subnets, but the error message itself suggests VPC endpoints as a valid alternative. Users who have properly configured VPC endpoints for STS, EKS, ECR, and S3 in their isolated subnets will be blocked at synth time with no workaround.

The DNS validation directly above (line 1867) uses this.vpc instanceof ec2.Vpc to avoid false positives with imported VPCs. Consider applying the same guard here — when the VPC is CDK-created, we know isolated subnets have no egress; when it's imported, we can't be sure.

Per the AWS private clusters documentation, isolated subnets with VPC endpoints are a supported configuration.

if (this.vpc instanceof ec2.Vpc) {
  const isolatedSubnetIds = new Set(this.vpc.isolatedSubnets.map(s => s.subnetId));
  const hasIsolatedSubnets = privateSubnets.some(s => isolatedSubnetIds.has(s.subnetId));
  if (hasIsolatedSubnets) {
    throw new ValidationError(
      'Isolated subnets cannot be used for kubectl private subnets. ...',
      this,
    );
  }
}

+ 'which is required for the kubectl Lambda to reach the EKS API, STS, and other AWS service endpoints. '
+ 'Use PRIVATE_WITH_EGRESS subnets with a NAT Gateway instead, or configure VPC endpoints for STS, EKS, and ECR. '
+ 'See https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html',
this,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message mentions "VPC endpoints for STS, EKS, and ECR" but per the AWS private clusters documentation, ECR also requires an S3 gateway endpoint (com.amazonaws.region-code.s3) for pulling container images. Consider updating to "STS, EKS, ECR, and S3" for completeness.

@vishaalmehrishi vishaalmehrishi self-assigned this Mar 18, 2026
// reach the EKS API, STS, or other AWS service endpoints required for
// kubectl operations (including the CoreDNS compute type patch).
// See https://github.com/aws/aws-cdk/issues/26613
const isolatedSubnetIds = new Set(this.vpc.isolatedSubnets.map(s => s.subnetId));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the aws-eks module — consider adding the this.vpc instanceof ec2.Vpc guard here for consistency with the DNS validation at line 1384 and to avoid blocking users with VPC endpoints on imported VPCs.

Per the AWS private clusters documentation, isolated subnets with VPC endpoints are a supported configuration.

Wrap the isolated subnet validation with 'this.vpc instanceof ec2.Vpc'
in both aws-eks and aws-eks-v2 modules to avoid false positives for
imported VPCs that may have VPC endpoints configured. This is consistent
with the existing DNS validation pattern.

Also update the error message to include S3 in the VPC endpoint list,
since ECR requires an S3 gateway endpoint per the AWS private clusters
documentation.

Add test cases for imported VPCs with isolated subnets in both modules.
Update the error message to reference all required AWS services and
link directly to the private clusters documentation for the complete
list of VPC endpoints needed.
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 19, 2026

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 19, 2026

Merge Queue Status

  • Entered queue2026-03-19 11:10 UTC · Rule: default-squash
  • Checks passed · in-place
  • Merged2026-03-19 12:46 UTC · at 28b830af6f97afef43a67fb5b273ed40fd36a25b

This pull request spent 1 hour 36 minutes 16 seconds in the queue, including 43 minutes 39 seconds running CI.

Required conditions to merge

@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Mar 19, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 19, 2026

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 19, 2026

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 73e5006 into main Mar 19, 2026
18 of 19 checks passed
@mergify mergify bot deleted the fix/issue-26613-isolated-subnet-validation branch March 19, 2026 12:46
@github-actions
Copy link
Copy Markdown
Contributor

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

bug This issue is a bug. contribution/core This is a PR that came from AWS. effort/medium Medium work item – several days of effort p1 pr-linter/exempt-integ-test The PR linter will not require integ test changes star-contributor [Pilot] contributed between 25-49 PRs to the CDK

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aws_eks: Error creating FargateCluster in cn-north-1 due to CoreDnsComputeTypePatch creation error

3 participants