Describe the bug
BucketDeployment appears to unconditionally run its full Lambda-side deployment pipeline even when the deployment cannot possibly contain deploy-time substitutions.
In my case, the deployment sources are only Source.asset() and Source.bucket(). There is no Source.jsonData(), Source.data(), deployTime, or other deploy-time-substituted source involved. Even so, the custom resource still:
- Cold-starts a Lambda.
- Downloads the entire asset zip from the CDK staging bucket into
/tmp.
- Extracts the entire archive into
/tmp.
- Scans every extracted file line-by-line looking for CloudFormation token placeholders / substitution markers.
- Uploads the extracted tree to the destination bucket via
aws s3 sync.
- Optionally performs CloudFront invalidation.
- Signals CloudFormation completion.
That scan step is especially problematic. For deployments that contain large binary assets, it performs completely useless work by opening and scanning files that cannot contain meaningful text substitutions in the first place.
My concrete case is a web UI bundle that includes large WASM binaries, auxiliary .data files, and many static assets. The effective payload is roughly 11-28 MB per zone. The deployment is binary-heavy enough that the mandatory "look for token markers in everything" step becomes a major and unnecessary part of the deployment time.
Observed impact:
- each
CustomCDKBucketDeployment timeline entry takes roughly 15-30 seconds
- 2-3
BucketDeployment constructs can turn into 6+ CustomCDKBucketDeployment timeline entries on updates where the asset hash changes
- on replacement, the work is effectively paid twice (
create new + delete old)
- the work happens inside a Python 3.13 Lambda custom resource rather than in CI, so the deployment path is constrained by Lambda cold start,
/tmp staging, archive extraction, and the custom resource lifecycle
This is not just "S3 upload is slow". The design forces a sequential download -> extract -> scan -> sync pipeline inside the custom resource, and the scan is performed even when there is nothing to substitute.
Regression Issue
Last Known Working CDK Library Version
No response
Expected Behavior
If a BucketDeployment does not use any deploy-time-substitution source, CDK should skip deploy-time marker analysis entirely.
At minimum, there should be an explicit opt-out such as skipDeployTimeSubstitutionScan, assumeNoDeployTimeValues, or equivalent, so users can avoid paying for a feature they are not using.
More specifically, for deployments composed only of fully synth-resolved inputs such as Source.asset() and Source.bucket():
- the custom resource should know there are no deploy-time values to replace
- it should not iterate through every extracted file searching for token markers
- it should especially avoid scanning obviously binary payloads like
.wasm and .data
- the implementation should avoid unnecessary Lambda-side staging work where possible
Current Behavior
The custom resource appears to always perform deploy-time marker scanning after extraction, regardless of whether the deployment can actually contain deploy-time substitutions.
In my case, that means the handler still opens and scans every file in the extracted asset tree, including large .wasm and .data files, even though:
- the deployment sources are already fully resolved at synth time
- there are no CloudFormation token placeholders to replace
- the scan finds nothing and moves on
This is pure overhead.
The issue becomes very visible with binary-heavy web bundles. Scanning megabytes of WASM/data blobs line-by-line for marker strings is wasted work, and it compounds with the rest of the custom resource pipeline:
- Lambda cold start
- full zip download
- full archive extraction
- token scan over the extracted tree
aws s3 sync upload
The Python code path is effectively sequential. The only meaningful concurrency seems to come from aws s3 sync itself (roughly 10 concurrent transfers). Everything else is serialized inside the custom resource.
Reproduction Steps
Minimal reproduction:
import * as cdk from 'aws-cdk-lib';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as s3deploy from 'aws-cdk-lib/aws-s3-deployment';
import { Construct } from 'constructs';
export class ReproStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const bucket = new s3.Bucket(this, 'SiteBucket');
new s3deploy.BucketDeployment(this, 'DeploySite', {
destinationBucket: bucket,
sources: [
s3deploy.Source.asset('./dist'),
],
});
}
}
Populate ./dist with binary-heavy assets, for example:
- one multi-MB
app.wasm
- one multi-MB
app.data
- a few hundred SVG/static files
- standard JS/CSS/HTML output
Important: do not use Source.jsonData(), Source.data(), or any deploy-time token-based content generation. The point is that the deployment is fully synth-resolved.
Deploy the stack and inspect the Custom::CDKBucketDeployment resource timing / logs. Even though no deploy-time substitution is needed, the custom resource still does the full archive extraction and token-scan path over the entire extracted tree.
For an even clearer reproduction, compare:
- a small text-only asset directory,
- an otherwise similar directory that additionally contains several multi-MB
.wasm / .data files.
The second case pays a much larger penalty despite there being no deploy-time substitutions to resolve in either case.
Possible Solution
Any of the following would be a substantial improvement:
- Detect at synth time whether any source can require deploy-time substitution, and pass that fact into the custom resource so it can skip the scan entirely when impossible.
- Expose an explicit escape hatch flag so users can assert "there are no deploy-time values in this deployment".
- Avoid scanning obviously binary files when substitution is enabled, or switch to a more targeted marker-replacement strategy rather than scanning every extracted file.
- Longer term, reconsider whether the current Lambda-side
download -> extract -> scan -> sync design is appropriate for large static/binary web deployments at all.
The most important point is that users should not be forced to pay for unconditional deploy-time marker scanning when they are only deploying ordinary static assets.
Additional Information/Context
The deploy-time scan exists for a legitimate feature: sources such as Source.jsonData() can contain CloudFormation tokens that are not known until deploy time. In that case, a post-synth substitution step makes sense.
For example:
new s3deploy.BucketDeployment(this, 'Deploy', {
destinationBucket: myBucket,
sources: [
s3deploy.Source.jsonData('config.json', {
apiUrl: api.url,
bucketName: someBucket.bucketName,
}),
],
});
In a case like that, scanning for substitution markers is understandable.
The problem is that the current behavior seems to apply that same cost to deployments that do not use any of those features.
In other words, BucketDeployment is optimized around the most dynamic case and makes every user pay for it, including users doing simple static asset publication.
AWS CDK Library version (aws-cdk-lib)
2.243.0
AWS CDK CLI version
2.1109.0
Node.js Version
24.14.0
OS
Ubuntu 24.04
Language
TypeScript
Language Version
No response
Other information
No response
Describe the bug
BucketDeploymentappears to unconditionally run its full Lambda-side deployment pipeline even when the deployment cannot possibly contain deploy-time substitutions.In my case, the deployment sources are only
Source.asset()andSource.bucket(). There is noSource.jsonData(),Source.data(),deployTime, or other deploy-time-substituted source involved. Even so, the custom resource still:/tmp./tmp.aws s3 sync.That scan step is especially problematic. For deployments that contain large binary assets, it performs completely useless work by opening and scanning files that cannot contain meaningful text substitutions in the first place.
My concrete case is a web UI bundle that includes large WASM binaries, auxiliary
.datafiles, and many static assets. The effective payload is roughly 11-28 MB per zone. The deployment is binary-heavy enough that the mandatory "look for token markers in everything" step becomes a major and unnecessary part of the deployment time.Observed impact:
CustomCDKBucketDeploymenttimeline entry takes roughly 15-30 secondsBucketDeploymentconstructs can turn into 6+CustomCDKBucketDeploymenttimeline entries on updates where the asset hash changescreate new+delete old)/tmpstaging, archive extraction, and the custom resource lifecycleThis is not just "S3 upload is slow". The design forces a sequential
download -> extract -> scan -> syncpipeline inside the custom resource, and the scan is performed even when there is nothing to substitute.Regression Issue
Last Known Working CDK Library Version
No response
Expected Behavior
If a
BucketDeploymentdoes not use any deploy-time-substitution source, CDK should skip deploy-time marker analysis entirely.At minimum, there should be an explicit opt-out such as
skipDeployTimeSubstitutionScan,assumeNoDeployTimeValues, or equivalent, so users can avoid paying for a feature they are not using.More specifically, for deployments composed only of fully synth-resolved inputs such as
Source.asset()andSource.bucket():.wasmand.dataCurrent Behavior
The custom resource appears to always perform deploy-time marker scanning after extraction, regardless of whether the deployment can actually contain deploy-time substitutions.
In my case, that means the handler still opens and scans every file in the extracted asset tree, including large
.wasmand.datafiles, even though:This is pure overhead.
The issue becomes very visible with binary-heavy web bundles. Scanning megabytes of WASM/data blobs line-by-line for marker strings is wasted work, and it compounds with the rest of the custom resource pipeline:
aws s3 syncuploadThe Python code path is effectively sequential. The only meaningful concurrency seems to come from
aws s3 syncitself (roughly 10 concurrent transfers). Everything else is serialized inside the custom resource.Reproduction Steps
Minimal reproduction:
Populate
./distwith binary-heavy assets, for example:app.wasmapp.dataImportant: do not use
Source.jsonData(),Source.data(), or any deploy-time token-based content generation. The point is that the deployment is fully synth-resolved.Deploy the stack and inspect the
Custom::CDKBucketDeploymentresource timing / logs. Even though no deploy-time substitution is needed, the custom resource still does the full archive extraction and token-scan path over the entire extracted tree.For an even clearer reproduction, compare:
.wasm/.datafiles.The second case pays a much larger penalty despite there being no deploy-time substitutions to resolve in either case.
Possible Solution
Any of the following would be a substantial improvement:
download -> extract -> scan -> syncdesign is appropriate for large static/binary web deployments at all.The most important point is that users should not be forced to pay for unconditional deploy-time marker scanning when they are only deploying ordinary static assets.
Additional Information/Context
The deploy-time scan exists for a legitimate feature: sources such as
Source.jsonData()can contain CloudFormation tokens that are not known until deploy time. In that case, a post-synth substitution step makes sense.For example:
In a case like that, scanning for substitution markers is understandable.
The problem is that the current behavior seems to apply that same cost to deployments that do not use any of those features.
In other words,
BucketDeploymentis optimized around the most dynamic case and makes every user pay for it, including users doing simple static asset publication.AWS CDK Library version (aws-cdk-lib)
2.243.0
AWS CDK CLI version
2.1109.0
Node.js Version
24.14.0
OS
Ubuntu 24.04
Language
TypeScript
Language Version
No response
Other information
No response