Skip to content

feat(sagemaker): add containerStartupHealthCheckTimeoutInSeconds support for EndpointConfig#35626

Merged
mergify[bot] merged 9 commits intoaws:mainfrom
amandladev:feature/add-container-startup-healthcheck-timeout
Jan 20, 2026
Merged

feat(sagemaker): add containerStartupHealthCheckTimeoutInSeconds support for EndpointConfig#35626
mergify[bot] merged 9 commits intoaws:mainfrom
amandladev:feature/add-container-startup-healthcheck-timeout

Conversation

@amandladev
Copy link
Copy Markdown
Contributor

@amandladev amandladev commented Sep 30, 2025

Implements container startup health check timeout configuration for SageMaker endpoint production variants as available in CloudFormation but missing in CDK constructs.

Issue #35566

  • Add containerStartupHealthCheckTimeout property to InstanceProductionVariantProps interface
  • Add comprehensive validation for timeout range (60-3600 seconds)
  • Add CloudFormation template generation for ContainerStartupHealthCheckTimeoutInSeconds property
  • Include test coverage for validation scenarios and edge cases
  • Update README documentation with usage examples and constraints

Reason for this change

AWS SageMaker EndpointConfig supports ContainerStartupHealthCheckTimeoutInSeconds in CloudFormation to configure health check timeout for inference containers, but this property is not exposed in the CDK SageMaker L2 constructs. Users with models that require longer initialization time cannot configure appropriate health check timeouts, leading to premature health check failures.

Description of changes

Implements AWS SageMaker container startup health check timeout support in CDK SageMaker L2 constructs, enabling users to configure appropriate health check timeouts for inference containers:

  • New containerStartupHealthCheckTimeout property in InstanceProductionVariantProps interface with AWS-compliant validation:
    Range: 60-3600 seconds (1 minute to 1 hour)
    Type: cdk.Duration for intuitive time specification
    Optional property maintaining backward compatibility
  • Enhanced addInstanceProductionVariant() method with comprehensive input validation
  • Automatic conversion from cdk.Duration to seconds for CloudFormation compatibility
  • Synthesis-time validation with clear, actionable error messages
  • CloudFormation integration mapping to ContainerStartupHealthCheckTimeoutInSeconds property

Usage Example:

import * as cdk from 'aws-cdk-lib';
import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const model: sagemaker.IModel;

// Create endpoint configuration with health check timeout
const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
  instanceProductionVariants: [{
    variantName: 'my-variant',
    model: model,
    containerStartupHealthCheckTimeout: cdk.Duration.minutes(5), // 5 minutes timeout
  }],
});

Describe any new or updated permissions being added

N/A - No new IAM permissions required. Leverages existing SageMaker endpoint configuration permissions.

Description of how you validated changes

Unit tests: Added 5 comprehensive container startup health check timeout tests covering all validation scenarios:

  • Property inclusion in CloudFormation template when provided
  • Property absence in CloudFormation template when not provided
  • Range validation for minimum value (60 seconds)
  • Range validation for maximum value (3600 seconds)
  • Acceptance of valid timeout values at boundaries
  • Duration to seconds conversion verification

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK p2 labels Sep 30, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team September 30, 2025 03:59
@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Sep 30, 2025
@kumsmrit kumsmrit self-assigned this Oct 30, 2025
@amandladev amandladev force-pushed the feature/add-container-startup-healthcheck-timeout branch 3 times, most recently from 5f7170a to dee1fb1 Compare November 12, 2025 22:50
@amandladev amandladev force-pushed the feature/add-container-startup-healthcheck-timeout branch from dee1fb1 to 2a74cef Compare November 13, 2025 14:42
@kumsmrit kumsmrit removed their assignment Dec 1, 2025
@abidhasan-aws abidhasan-aws self-assigned this Jan 19, 2026
@abidhasan-aws abidhasan-aws removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Jan 19, 2026
Copy link
Copy Markdown
Contributor

@abidhasan-aws abidhasan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amandladev,
Thanks for your contribution. I have added a few comments.

You might also need to resolve conflict.
:)

* The timeout value, in seconds, for your inference container to pass health check.
* @default - none
*/
readonly containerStartupHealthCheckTimeout?: cdk.Duration;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove containerStartupHealthCheckTimeout from ProductionVariantProps and keep it only in InstanceProductionVariantProps.

ContainerStartupHealthCheckTimeoutInSeconds is an instance-only property, it's not supported for serverless endpoints.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@mergify mergify bot dismissed abidhasan-aws’s stale review January 19, 2026 21:08

Pull request has been modified.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 19, 2026

⚠️ Experimental Feature: This security report is currently in experimental phase. Results may include false positives and the rules are being actively refined.
Please try merge from main to avoid findings unrelated to the PR.


TestsPassed ✅SkippedFailed
Security Guardian Results24 ran24 passed
TestResult
No test annotations available

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 19, 2026

⚠️ Experimental Feature: This security report is currently in experimental phase. Results may include false positives and the rules are being actively refined.
Please try merge from main to avoid findings unrelated to the PR.


TestsPassed ✅SkippedFailed
Security Guardian Results with resolved templates24 ran24 passed
TestResult
No test annotations available

@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Jan 19, 2026
@amandladev
Copy link
Copy Markdown
Contributor Author

Hello @abidhasan-aws !

Thanks a lot for the feedback! I’ve applied the suggested changes and I’m around if there’s anything else to tweak.
Have a great week 😄

@abidhasan-aws abidhasan-aws added the pr/needs-integration-tests-deployment Requires the PR to deploy the integration test snapshots. label Jan 20, 2026
@abidhasan-aws abidhasan-aws temporarily deployed to deployment-integ-test January 20, 2026 16:16 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@abidhasan-aws abidhasan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM , Thanks :)

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jan 20, 2026

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jan 20, 2026

Merge Queue Status

✅ The pull request has been merged at 2ec085e

This pull request spent 5 hours 25 minutes 55 seconds in the queue, including 32 minutes 41 seconds running CI.
The checks were run in-place.

Required conditions to merge

@aws-cdk-automation aws-cdk-automation removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Jan 20, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jan 20, 2026

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@Abogical Abogical removed the pr/needs-integration-tests-deployment Requires the PR to deploy the integration test snapshots. label Jan 20, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jan 20, 2026

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jan 20, 2026

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 47d707a into aws:main Jan 20, 2026
23 of 24 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK p2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants