fix(sagemaker): Add proactive credential refresh for SSO-based SageMaker Space connections#8755
Open
amzn-kvk wants to merge 8 commits into
Open
fix(sagemaker): Add proactive credential refresh for SSO-based SageMaker Space connections#8755amzn-kvk wants to merge 8 commits into
amzn-kvk wants to merge 8 commits into
Conversation
|
⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done |
|
|
✅ I finished the code review, and didn't find any security or code quality issues. |
ddbcf01 to
dc450a2
Compare
Contributor
|
/retryBuilds |
yuriivv
approved these changes
Apr 21, 2026
af06e7d to
ebebf4d
Compare
When connecting to a SageMaker Space via IAM Identity Center (SSO), the Toolkit snapshots STS credentials once at connection time and never refreshes them. After ~1 hour (default STS credential TTL), the SSM tunnel drops and reconnection fails because the detached server reads expired credentials from the mapping file. This is an asymmetry in the codebase: - IAM connections: detached server calls fromIni() which resolves credentials dynamically on each request - SMUS connections: persistSmusProjectCreds() calls startProactiveCredentialRefresh() which periodically writes fresh credentials to the mapping file - SSO connections: persistLocalCredentials() writes once and returns with no refresh mechanism These tests document the gap. All three will pass once startProactiveCredentialRefresh() is wired up for SSO connections using the SSO token cache and GetRoleCredentials as the credential source.
Add SsoCredentialRefresher that follows the proven SMUS startProactiveCredentialRefresh() pattern: - 10s check interval using setTimeout (handles sleep/resume) - 5min safety buffer before credential expiry - Writes fresh credentials to ~/.aws/.sagemaker-space-profiles persistLocalCredentials() now starts the refresher for SSO connections after the initial credential write. This prevents the ~1h disconnect caused by stale STS credentials in the mapping file. The detached server and SSH ProxyCommand are unchanged - they continue reading credentials from the mapping file, which now stays fresh.
…isposables Register a disposable in the SageMaker activation subscriptions that calls stopAllSsoCredentialRefreshers() on extension deactivation, ensuring all refresh timers are cleaned up on reload or uninstall.
40a9b96 to
0105627
Compare
Move markPending() before open(url) in getSessionAsync handler to prevent browser callback from arriving before pending status is written. Add downgrade guard in SessionStore.markPending() to never overwrite a 'fresh' status with 'pending'. The race condition caused the PS1 reconnect script to poll indefinitely with 204 responses on Windows, where the browser auth redirect completes faster than the async file I/O in markPending. 🤖 Assisted by AI aws#8755
…er stop Add structured logging to detached server with timestamp, method, path, port, and response status. Sensitive params (token, ws_url, session) are redacted. Helps diagnose reconnection issues without exposing secrets. Handle EPERM error gracefully in stopLocalServer() on Windows where the server process may be elevated or in a different security context. 🤖 Assisted by AI aws#8755
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When connecting to a SageMaker Space via IAM Identity Center (SSO), the session disconnects after ~1 hour. Active Jupyter kernels are killed and the user must re-authenticate via browser and reload the window.
The root cause is that
persistLocalCredentials()snapshots STS credentials once at connection time and writes them as static strings to~/.aws/.sagemaker-space-profiles. The detached server'sresolveCredentialsFor()returns these values as-is with no expiry check or re-derivation. After ~1 hour (default STS credential TTL), the SSM tunnel drops and reconnection fails because the mapping file contains expired credentials.This only affects SSO connections:
fromIni({ profile })which resolves dynamically.persistSmusProjectCreds()callsstartProactiveCredentialRefresh()which periodically writes fresh credentials to the mapping file.Solution
Add
SsoCredentialRefresherthat follows the proven SMUSProjectRoleCredentialsProvider.startProactiveCredentialRefresh()pattern:setTimeout(handles sleep/resume correctly)~/.aws/.sagemaker-space-profilesviasetSpaceSsoProfile()persistLocalCredentials()now starts the refresher for SSO connections after the initial credential write. The detached server and SSH ProxyCommand are unchanged - they continue reading from the mapping file, which now stays fresh.Files changed:
credentialMapping.ts- AddedSsoCredentialRefresherclass; wired intopersistLocalCredentials()for SSOssoCredentialRefresh.test.ts- 4 tests covering refresh on expiry, skip when fresh, end-to-end read, and SSO/SMUS parityfeature/xbranches will not be squash-merged at release time.