Skip to content

ci: add codebuild deployment integ runner PR workflow.#35459

Merged
mergify[bot] merged 26 commits intomainfrom
deployment-integ-workflow-squashed
Oct 7, 2025
Merged

ci: add codebuild deployment integ runner PR workflow.#35459
mergify[bot] merged 26 commits intomainfrom
deployment-integ-workflow-squashed

Conversation

@Abogical
Copy link
Copy Markdown
Member

@Abogical Abogical commented Sep 10, 2025

Reason for this change

  • Deploying integration tests has been time-consuming and expensive for both maintainers and contributors of aws-cdk. Snapshot update and deployment may take from 10 minutes to hours depending on the snapshots updates made for each pull request. This can make any changes to aws-cdk, no matter how simple they are, be needlessly time-consuming and error-prone.
  • AWS CDK's PR workflows does not validate that the snapshots of contributors are deployable. Although this is not common, snapshots updates can be made via a dry run, and there is a possibility that those snapshot updates won't be deployable. No PR Github action at the moment prevents this scenario.

Description of changes

This adds a PR action which will use our Internal environment to test snapshot updates by attempting to deploy them. This uses the same service used by aws-cdk-cli (called Atmosphere) to deploy only the snapshot updates for the PR. This PR will need manual approval by a maintainer to run for the PR to work.

Mergify rules will also be updated such that if a request to run the integ tests are made, then this workflow should pass successfully in order for the PR to be merged.

Description of how you validated changes

This workflow was validated with my fork of aws-cdk. See this example run: https://github.com/Abogical/aws-cdk/actions/runs/17458740897/job/49578262885

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team September 10, 2025 12:55
@github-actions github-actions bot added p2 admired-contributor [Pilot] contributed between 13-24 PRs to the CDK labels Sep 10, 2025
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Sep 10, 2025
@Abogical Abogical added the pr/do-not-merge This PR should not be merged at this time. label Sep 11, 2025
@Abogical Abogical marked this pull request as ready for review September 11, 2025 14:35
@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Sep 11, 2025
@Abogical
Copy link
Copy Markdown
Member Author

Abogical commented Sep 11, 2025

Waiting for the backend infrastructure to be setup before this is merged. Hence the do-not-merge label.

@Abogical
Copy link
Copy Markdown
Member Author

Backend infrastructure is setup. Removing do-not-merge label.

@Abogical Abogical removed the pr/do-not-merge This PR should not be merged at this time. label Sep 11, 2025
@gasolima gasolima added the pr/do-not-merge This PR should not be merged at this time. label Sep 11, 2025

export const runInteg = async (paths: string[], allocation: Allocation) => {
if (paths.length == 0) {
console.log('No snapshots changes were made, skipping deployment integ test.');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't throw an error here? cause if we run the workflow that mean someone know that there's integration tests that need to be deployed

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would not be considered an error of the PR or the workflow. We may notify the maintainers of this via a job summary stating that no snapshots has been deployed:

Related: https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-commands#adding-a-job-summary

const spawnProcess = spawn('yarn', ['integ-runner', '--directory', 'packages', '--force', ...paths], {
stdio: ['ignore', 'inherit', 'inherit'],
env: {
...process.env,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other envs needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just allows passing env variables to be passed on to the integ runner so that we can extend the workflow easily in the future.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about this from security perspective, not sure what can process.env contain, don't want to run a command with every thing in our environments.

Can we have here something like

env: {
  AWS_ACCESS_KEY_ID: props.awsAccessKeyId ?? process.env.AWS_ACCESS_KEY_ID
, etc...
}

import { spawn } from 'child_process';

export const gitDiff = async (): Promise<string> => {
const spawnProcess = spawn('git', ['diff', process.env.BASE_COMMIT ?? 'main', process.env.HEAD_COMMIT ?? 'HEAD', '--name-only'], {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When are we using this default values(main & HEAD)?

~/.s3buildcache
key: s3buildcache-${{ runner.os }}

- name: Configure system settings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of configuration we're setting here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just increasing memory so that workflow does not fail due to out of memory error.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add this in the naming? If it's not possible as comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment has been added.

sudo sysctl -w vm.max_map_count=2251954

- name: Build
run: /bin/bash ./build.sh --ci --skip-test --skip-prereqs --skip-compat
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting to put this in a package.json file, then here only have something like yarn build-github

run: /bin/bash ./build.sh --ci --skip-test --skip-prereqs --skip-compat

- name: Run deployment integration tests
run: node tools/@aws-cdk/deployment-integ/bin/deployment-integ.js
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment: I'm expecting to put this in a package.json file, then here only have something like yarn deploy-integration-tests


```bash
# Install dependencies
npm install
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we shouldn't use npm, right?

val => val.match(/^.*integ\.[^/]*\.js/)?.[0] || null,
).filter(val => val !== null))];

export const main = async ({ endpoint, pool }: {endpoint: string; pool: string}) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have more descriptive name than main?

import { Allocation } from '@cdklabs/cdk-atmosphere-client';
import { spawn } from 'child_process';

export const runInteg = async (paths: string[], allocation: Allocation) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like from this function to not have anything related to Atmosphere so people can run the same script locally using their credentials. Does that make sense?

working-directory: base
run: yarn --cwd tools/@aws-cdk/integration-test-deployment build

- name: Install dependencies for framework-integ (HEAD)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework-integ is integration tests, right? can we make call it integration test also for line 103.


try {
const env = {
...process.env,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid doing this? you can extract whatever variables you want manually

}
};

export const runInteg = async (env: NodeJS.ProcessEnv) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it better to make it deployIntegrationTests?

Copy link
Copy Markdown
Contributor

@abidhasan-aws abidhasan-aws Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also will add a parameter to run deploy all integt est as per our last conversation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have here max 5 files? just for readability and easy spotting problems

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed some, but reducing more defeats the purpose of the test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you differentiate here between calling Atmosphere vs ur setup? can u state how each one can be done.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can state how our setup can be used but, atmosphere cannot be used by anyone since it requires permission. For now only the pr build role will have the permission to assume role for atmosphere..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmosphere cannot be used by anyone since it requires permission.

mention that in the doc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

jobs:
integration_test_deployment:
runs-on: codebuild-aws-cdk-github-actions-deployment-integ-runner-${{ github.run_id }}-${{ github.run_attempt }}
environment: deployment-integ-test
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here warning to never change this unless rediscussing the topic with Appsec?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A code comment saying that never change this unless rediscussing the topic with Appsec, i believe this line if we removed will remove the approval step, which we were always having this assumption in our talks, and whenever we want to change this assumption we need to go back and see the different threats.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general now it's way more clear! Good job 👏

on:
workflow_dispatch: {}
merge_group: {}
pull_request_target:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Since I was the one who suggested using pull_request_target instead of pull_request, I wanted to clarify the reasoning — some of the behavior is a bit unintuitive.

pull_request_target runs with write permissions and access to repository secrets, whereas pull_request does not have these for external forks.

However, the key difference is where the workflow runs from.

With pull_request, the workflow is executed from the PR’s commit (HEAD). That means a malicious contributor could modify the workflow in their fork, and it would execute automatically as soon as they open a PR — before we ever review or approve it.

With pull_request_target, the workflow runs from the base branch of our repository (e.g., main). This ensures that if someone tries to sneak malicious code into a workflow file in their PR, it won’t run until we explicitly approve and merge it.

Note: “write permission and secrets access” only apply to GitHub-managed credentials. Our main concern here is the OIDC role — we want to prevent anyone from being able to exploit that role. GitHub doesn’t enforce that restriction for us.

So, while pull_request_target technically grants more powerful permissions if misused, it is still the safer option. It prevents unreviewed, potentially malicious workflows from running automatically, which is a bigger risk in our case.

};

export const deployIntegrationTests = async (env: NodeJS.ProcessEnv) => {
const changedSnapshotPaths = await getChangedSnapshots();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a followup PR to make this function possible to deploy all the tests by specific parameter send?

@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Sep 26, 2025
@acornea
Copy link
Copy Markdown

acornea commented Sep 30, 2025

LGTM

@gasolima gasolima removed the pr/do-not-merge This PR should not be merged at this time. label Oct 7, 2025
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Oct 7, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot added the queued label Oct 7, 2025
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Oct 7, 2025

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 919b14b into main Oct 7, 2025
19 checks passed
@mergify mergify bot removed the queued label Oct 7, 2025
@mergify mergify bot deleted the deployment-integ-workflow-squashed branch October 7, 2025 09:59
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Oct 7, 2025

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

admired-contributor [Pilot] contributed between 13-24 PRs to the CDK contribution/core This is a PR that came from AWS. p2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants