(aws-s3-deployment): `BucketDeployment` always downloads, extracts, and scans every file for deploy-time substitutions, causing severe unnecessary slowdowns for binary-heavy assets

### Describe the bug

`BucketDeployment` appears to unconditionally run its full Lambda-side deployment pipeline even when the deployment cannot possibly contain deploy-time substitutions.

In my case, the deployment sources are only `Source.asset()` and `Source.bucket()`. There is no `Source.jsonData()`, `Source.data()`, `deployTime`, or other deploy-time-substituted source involved. Even so, the custom resource still:

1. Cold-starts a Lambda.
2. Downloads the entire asset zip from the CDK staging bucket into `/tmp`.
3. Extracts the entire archive into `/tmp`.
4. Scans every extracted file line-by-line looking for CloudFormation token placeholders / substitution markers.
5. Uploads the extracted tree to the destination bucket via `aws s3 sync`.
6. Optionally performs CloudFront invalidation.
7. Signals CloudFormation completion.

That scan step is especially problematic. For deployments that contain large binary assets, it performs completely useless work by opening and scanning files that cannot contain meaningful text substitutions in the first place.

My concrete case is a web UI bundle that includes large WASM binaries, auxiliary `.data` files, and many static assets. The effective payload is roughly 11-28 MB per zone. The deployment is binary-heavy enough that the mandatory "look for token markers in everything" step becomes a major and unnecessary part of the deployment time.

Observed impact:

- each `CustomCDKBucketDeployment` timeline entry takes roughly 15-30 seconds
- 2-3 `BucketDeployment` constructs can turn into 6+ `CustomCDKBucketDeployment` timeline entries on updates where the asset hash changes
- on replacement, the work is effectively paid twice (`create new` + `delete old`)
- the work happens inside a Python 3.13 Lambda custom resource rather than in CI, so the deployment path is constrained by Lambda cold start,`/tmp` staging, archive extraction, and the custom resource lifecycle

This is not just "S3 upload is slow". The design forces a sequential `download -> extract -> scan -> sync` pipeline inside the custom resource, and the scan is performed even when there is nothing to substitute.

### Regression Issue

- [ ] Select this option if this issue appears to be a regression.

### Last Known Working CDK Library Version

_No response_

### Expected Behavior

If a `BucketDeployment` does not use any deploy-time-substitution source, CDK should skip deploy-time marker analysis entirely.

At minimum, there should be an explicit opt-out such as `skipDeployTimeSubstitutionScan`, `assumeNoDeployTimeValues`, or equivalent, so users can avoid paying for a feature they are not using.

More specifically, for deployments composed only of fully synth-resolved inputs such as `Source.asset()` and `Source.bucket()`:

- the custom resource should know there are no deploy-time values to replace
- it should not iterate through every extracted file searching for token markers
- it should especially avoid scanning obviously binary payloads like `.wasm` and `.data`
- the implementation should avoid unnecessary Lambda-side staging work where possible

### Current Behavior

The custom resource appears to always perform deploy-time marker scanning after extraction, regardless of whether the deployment can actually contain deploy-time substitutions.

In my case, that means the handler still opens and scans every file in the extracted asset tree, including large `.wasm` and `.data` files, even though:

- the deployment sources are already fully resolved at synth time
- there are no CloudFormation token placeholders to replace
- the scan finds nothing and moves on

This is pure overhead.

The issue becomes very visible with binary-heavy web bundles. Scanning megabytes of WASM/data blobs line-by-line for marker strings is wasted work, and it compounds with the rest of the custom resource pipeline:

- Lambda cold start
- full zip download
- full archive extraction
- token scan over the extracted tree
- `aws s3 sync` upload

The Python code path is effectively sequential. The only meaningful concurrency seems to come from `aws s3 sync` itself (roughly 10 concurrent transfers). Everything else is serialized inside the custom resource.

### Reproduction Steps

Minimal reproduction:

```ts
import * as cdk from 'aws-cdk-lib';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as s3deploy from 'aws-cdk-lib/aws-s3-deployment';
import { Construct } from 'constructs';

export class ReproStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const bucket = new s3.Bucket(this, 'SiteBucket');

    new s3deploy.BucketDeployment(this, 'DeploySite', {
      destinationBucket: bucket,
      sources: [
        s3deploy.Source.asset('./dist'),
      ],
    });
  }
}
```

Populate `./dist` with binary-heavy assets, for example:

- one multi-MB `app.wasm`
- one multi-MB `app.data`
- a few hundred SVG/static files
- standard JS/CSS/HTML output

Important: do not use `Source.jsonData()`, `Source.data()`, or any deploy-time token-based content generation. The point is that the deployment is fully synth-resolved.

Deploy the stack and inspect the `Custom::CDKBucketDeployment` resource timing / logs. Even though no deploy-time substitution is needed, the custom resource still does the full archive extraction and token-scan path over the entire extracted tree.

For an even clearer reproduction, compare:

1. a small text-only asset directory,
2. an otherwise similar directory that additionally contains several multi-MB `.wasm` / `.data` files.

The second case pays a much larger penalty despite there being no deploy-time substitutions to resolve in either case.


### Possible Solution

Any of the following would be a substantial improvement:

1. Detect at synth time whether any source can require deploy-time substitution, and pass that fact into the custom resource so it can skip the scan entirely when impossible.
2. Expose an explicit escape hatch flag so users can assert "there are no deploy-time values in this deployment".
3. Avoid scanning obviously binary files when substitution is enabled, or switch to a more targeted marker-replacement strategy rather than scanning every extracted file.
4. Longer term, reconsider whether the current Lambda-side `download -> extract -> scan -> sync` design is appropriate for large static/binary web deployments at all.

The most important point is that users should not be forced to pay for unconditional deploy-time marker scanning when they are only deploying ordinary static assets.


### Additional Information/Context

The deploy-time scan exists for a legitimate feature: sources such as `Source.jsonData()` can contain CloudFormation tokens that are not known until deploy time. In that case, a post-synth substitution step makes sense.

For example:

```ts
new s3deploy.BucketDeployment(this, 'Deploy', {
  destinationBucket: myBucket,
  sources: [
    s3deploy.Source.jsonData('config.json', {
      apiUrl: api.url,
      bucketName: someBucket.bucketName,
    }),
  ],
});
```

In a case like that, scanning for substitution markers is understandable.

The problem is that the current behavior seems to apply that same cost to deployments that do not use any of those features.

In other words, `BucketDeployment` is optimized around the most dynamic case and makes every user pay for it, including users doing simple static asset publication.

### AWS CDK Library version (aws-cdk-lib)

2.243.0

### AWS CDK CLI version

2.1109.0

### Node.js Version

24.14.0

### OS

Ubuntu 24.04

### Language

TypeScript

### Language Version

_No response_

### Other information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(aws-s3-deployment): `BucketDeployment` always downloads, extracts, and scans every file for deploy-time substitutions, causing severe unnecessary slowdowns for binary-heavy assets #37234

Describe the bug

Regression Issue

Last Known Working CDK Library Version

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

AWS CDK Library version (aws-cdk-lib)

AWS CDK CLI version

Node.js Version

OS

Language

Language Version

Other information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(aws-s3-deployment): BucketDeployment always downloads, extracts, and scans every file for deploy-time substitutions, causing severe unnecessary slowdowns for binary-heavy assets #37234

Description

Describe the bug

Regression Issue

Last Known Working CDK Library Version

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

AWS CDK Library version (aws-cdk-lib)

AWS CDK CLI version

Node.js Version

OS

Language

Language Version

Other information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

(aws-s3-deployment): `BucketDeployment` always downloads, extracts, and scans every file for deploy-time substitutions, causing severe unnecessary slowdowns for binary-heavy assets #37234