fix(eks): throw error when kubectl subnets are isolated#37217
fix(eks): throw error when kubectl subnets are isolated#37217mergify[bot] merged 7 commits intomainfrom
Conversation
Throw a ValidationError at synth time when kubectl private subnets include PRIVATE_ISOLATED subnets. Isolated subnets have no internet access by definition, so the kubectl Lambda cannot reach the EKS API, STS, or other AWS service endpoints required for kubectl operations (including the CoreDNS compute type patch for FargateCluster). Previously this resulted in a silent 15-minute Lambda timeout during deployment. Now users get a clear error at synth time with guidance to use PRIVATE_WITH_EGRESS subnets or configure VPC endpoints. Fix applied to both aws-eks and aws-eks-v2 modules. Closes #26613
| const isolatedSubnetIds = new Set(this.vpc.isolatedSubnets.map(s => s.subnetId)); | ||
| const hasIsolatedSubnets = privateSubnets.some(s => isolatedSubnetIds.has(s.subnetId)); | ||
| if (hasIsolatedSubnets) { | ||
| throw new ValidationError( | ||
| 'Isolated subnets cannot be used for kubectl private subnets. Isolated subnets have no internet access, ' | ||
| + 'which is required for the kubectl Lambda to reach the EKS API, STS, and other AWS service endpoints. ' | ||
| + 'Use PRIVATE_WITH_EGRESS subnets with a NAT Gateway instead, or configure VPC endpoints for STS, EKS, and ECR. ' | ||
| + 'See https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html', | ||
| this, | ||
| ); | ||
| } |
There was a problem hiding this comment.
This validation throws unconditionally for isolated subnets, but the error message itself suggests VPC endpoints as a valid alternative. Users who have properly configured VPC endpoints for STS, EKS, ECR, and S3 in their isolated subnets will be blocked at synth time with no workaround.
The DNS validation directly above (line 1867) uses this.vpc instanceof ec2.Vpc to avoid false positives with imported VPCs. Consider applying the same guard here — when the VPC is CDK-created, we know isolated subnets have no egress; when it's imported, we can't be sure.
Per the AWS private clusters documentation, isolated subnets with VPC endpoints are a supported configuration.
if (this.vpc instanceof ec2.Vpc) {
const isolatedSubnetIds = new Set(this.vpc.isolatedSubnets.map(s => s.subnetId));
const hasIsolatedSubnets = privateSubnets.some(s => isolatedSubnetIds.has(s.subnetId));
if (hasIsolatedSubnets) {
throw new ValidationError(
'Isolated subnets cannot be used for kubectl private subnets. ...',
this,
);
}
}| + 'which is required for the kubectl Lambda to reach the EKS API, STS, and other AWS service endpoints. ' | ||
| + 'Use PRIVATE_WITH_EGRESS subnets with a NAT Gateway instead, or configure VPC endpoints for STS, EKS, and ECR. ' | ||
| + 'See https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html', | ||
| this, |
There was a problem hiding this comment.
The error message mentions "VPC endpoints for STS, EKS, and ECR" but per the AWS private clusters documentation, ECR also requires an S3 gateway endpoint (com.amazonaws.region-code.s3) for pulling container images. Consider updating to "STS, EKS, ECR, and S3" for completeness.
| // reach the EKS API, STS, or other AWS service endpoints required for | ||
| // kubectl operations (including the CoreDNS compute type patch). | ||
| // See https://github.com/aws/aws-cdk/issues/26613 | ||
| const isolatedSubnetIds = new Set(this.vpc.isolatedSubnets.map(s => s.subnetId)); |
There was a problem hiding this comment.
Same as the aws-eks module — consider adding the this.vpc instanceof ec2.Vpc guard here for consistency with the DNS validation at line 1384 and to avoid blocking users with VPC endpoints on imported VPCs.
Per the AWS private clusters documentation, isolated subnets with VPC endpoints are a supported configuration.
Wrap the isolated subnet validation with 'this.vpc instanceof ec2.Vpc' in both aws-eks and aws-eks-v2 modules to avoid false positives for imported VPCs that may have VPC endpoints configured. This is consistent with the existing DNS validation pattern. Also update the error message to include S3 in the VPC endpoint list, since ECR requires an S3 gateway endpoint per the AWS private clusters documentation. Add test cases for imported VPCs with isolated subnets in both modules.
Update the error message to reference all required AWS services and link directly to the private clusters documentation for the complete list of VPC endpoints needed.
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
Merge Queue Status
This pull request spent 1 hour 36 minutes 16 seconds in the queue, including 43 minutes 39 seconds running CI. Required conditions to merge
|
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
|
Comments on closed issues and PRs are hard for our team to see. |
Issue # (if applicable)
Closes #26613.
Reason for this change
When creating an
eks.FargateCluster(or any EKS cluster withcoreDnsComputeType: FARGATE) in a VPC with onlyPRIVATE_ISOLATEDsubnets and private endpoint access enabled, the kubectl Lambda is placed in those isolated subnets. Since isolated subnets have no internet access by definition (no NAT Gateway, no Internet Gateway route), the Lambda cannot reach the EKS API, STS, or other AWS service endpoints. This causes theCoreDnsComputeTypePatchcustom resource (and any other kubectl operation) to silently time out after 15 minutes, resulting in a confusing CloudFormation deployment failure.Description of changes
Added a
ValidationErrorat synth time in bothaws-eksandaws-eks-v2modules that detects when the selected kubectl private subnets includePRIVATE_ISOLATEDsubnets. The check runs in theClusterconstructor, right afterkubectlPrivateSubnetsis assigned, by comparing the selected subnets againstvpc.isolatedSubnets.The error message tells users exactly what is wrong and how to fix it:
PRIVATE_WITH_EGRESSsubnets with a NAT GatewayWhy a hard error instead of a warning:
PRIVATE_ISOLATEDsubnets are created by CDK with no egress route. If CDK created the VPC and the kubectl subnets are isolated, we know with certainty there is no egress and the deployment will fail. Failing fast at synth time is better than a 15-minute Lambda timeout.Files changed:
packages/aws-cdk-lib/aws-eks/lib/cluster.ts— validation afterkubectlPrivateSubnetsassignmentpackages/aws-cdk-lib/aws-eks-v2/lib/cluster.ts— same validation afterkubectlSubnetsassignmentpackages/aws-cdk-lib/aws-eks/test/cluster.test.ts— 2 new testspackages/aws-cdk-lib/aws-eks-v2/test/cluster.test.ts— 2 new testsDescribe any new or updated permissions being added
No new or updated IAM permissions. This is a synth-time validation only.
Description of how you validated changes
throws when kubectl private subnets include isolated subnets— verifies theValidationErroris throwndoes not throw when kubectl private subnets are PRIVATE_WITH_EGRESS— verifies no error for valid subnetsaws-ekstests passaws-eks-v2tests passChecklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license