Skip to content

ROSAENG-58088: ci: add fake_cluster to rosa-sts-ad profile#1178

Closed
amandahla wants to merge 1 commit into
terraform-redhat:mainfrom
amandahla:test-classic-fake-cluster
Closed

ROSAENG-58088: ci: add fake_cluster to rosa-sts-ad profile#1178
amandahla wants to merge 1 commit into
terraform-redhat:mainfrom
amandahla:test-classic-fake-cluster

Conversation

@amandahla

@amandahla amandahla commented Jun 3, 2026

Copy link
Copy Markdown
Member

PR Summary

Adds fake_cluster=true support to the rosa-sts-ad and rosa-sts-pl CI profiles, replacing real AWS cluster provisioning in PR presubmit jobs with OCM-registered fake clusters. Setup time drops from 20+ minutes (plus frequent NAT Gateway quota failures) to under 5 minutes, and
all previously failing Day2 tests now pass because the cluster exists in OCM.

Detailed Description of the Issue

The e2e-presubmits-rosa-sts-advanced-critical-high-presubmit and e2e-presubmits-rosa-sts-private-critical-high-presubmit prow jobs created real ROSA Classic clusters with full AWS infrastructure (VPC with 3 NAT Gateways, KMS key, proxy EC2 instance, security groups). This
caused two recurring problems:

  1. NAT Gateway quota exhaustion — the ap-northeast-1 CI account has a soft limit of 5 NAT Gateways per region; leaked resources from prior failed runs blocked new jobs entirely (0 tests reached execution).
  2. VPC quota exhaustion — the us-east-1 CI account has a VPC limit; the private presubmit hit this limit and failed before any test ran.
  3. Slow feedback — full cluster provisioning added 20+ minutes to every PR, of which ~8 minutes was the OCM polling wait alone.

The terraform provider does not interact with AWS directly — it translates Terraform config into OCM API calls. AWS provisioning is OCM's responsibility and is already covered by uhc-clusters-service full-cycle tests. Running provider tests against real AWS duplicates that
coverage unnecessarily.

Setting fake_cluster=true in cluster custom_properties tells Hive to skip real provisioning while OCM still validates all STS roles, subnets, and configuration in preflight. All OCM API operations (create, read, update resources) work normally against a fake cluster, so
Day1Post and Day2 test assertions are identical to the real-cluster case.

Known Limitations and Risks

rosa-sts-private-critical-high-presubmit — VPC Quota Risk

OCM validates real subnets in preflight (before fake_cluster is checked by Hive), so the private profile still requires a real VPC. The us-east-1 CI account has a VPC soft limit; leaked VPCs from previous failed PR runs can exhaust this quota and block the job.

Workarounds:

  • Request a VPC quota increase for us-east-1 in the oex-aws-qe account.
  • Move rosa-sts-private-critical-high-presubmit to a nightly periodic job — the private-link assertion is low-risk and does not need to block every PR.
  • Make the job optional: true so failures do not block merge.

rosa-sts-advanced-critical-high-presubmit — Trust Bundle Validation

The fake proxy implementation injects a dummy PEM certificate as the trust bundle. If OCM validates X.509 content strictly in preflight (not yet confirmed), the trust bundle constant in FakeClusterTrustBundle may need replacing with a real self-signed certificate.

Related Issues and PRs

  • Jira: ROSAENG-58088
  • Fixes: #
  • Related PR(s):
  • Related design/docs:

Type of Change

  • feat - adds a new user-facing capability.
  • fix - resolves an incorrect behavior or bug.
  • docs - updates documentation only.
  • style - formatting or naming changes with no logic impact.
  • refactor - code restructuring with no behavior change.
  • test - adds or updates tests only.
  • chore - maintenance work (tooling, housekeeping, non-product code).
  • build - changes build system, packaging, or dependencies for build output.
  • ci - changes CI pipelines, jobs, or automation workflows.
  • perf - improves performance without changing intended behavior.

Previous Behavior

  • rosa-sts-ad and rosa-sts-pl profiles created real ROSA Classic clusters with full AWS infrastructure (VPC + NAT Gateways, KMS key, proxy EC2 instance, security groups).
  • Jobs frequently failed at CreateClusterByProfile due to NAT Gateway or VPC quota exhaustion, resulting in 0 tests running.
  • When quota was available, cluster provisioning took 20+ minutes before any test could execute.
  • Profile-level CustomProperties were silently overwritten by global defaults.
  • Tests 88408 (autoscaling) and 67607 (proxy) were not covered in the PR gate.

Behavior After This Change

  • Both profiles set custom_properties: {fake_cluster: "true"}, skipping real Hive/HyperShift provisioning.
  • The VPC Terraform manifest gains a no_nat_gateway path (IGW + public/private subnets, no NAT Gateways) that satisfies OCM's 6-subnet Multi-AZ requirement without consuming NAT Gateway quota.
  • KMS key and proxy PrepareProxy() provisioning are skipped for fake clusters; account roles and OIDC still run (OCM validates STS roles and subnets in preflight).
  • For fake clusters with proxy: true, dummy proxy values are injected directly into cluster args — OCM stores them and the test reads them back from OCM.
  • For rosa-sts-pl (private-link), only private subnets are passed and Private/PrivateLink flags are set correctly.
  • Profile-level CustomProperties are merged on top of global defaults via helper.MergeMaps.
  • The rosa-sts-ad profile adds autoscaling_enabled: true and proxy: true so tests 88408 and 67607 now run instead of skipping.
  • A rosa-sts-ad-real and rosa-sts-pl-real profile are added for full real-AWS validation when needed.
  • The security-group module is pinned to ~> 5.0 to fix a pre-existing v6 breakage (ingress_cidr_blocks rename).
  • Setup completes in ~5 minutes; the suite passes with 13/19 tests for both profiles (remainder always skipped in real CI too).

How to Test (Step-by-Step)

Preconditions

  • AWS credentials for ap-northeast-1 (advanced) and us-east-1 (private).
  • OCM staging token (rosa token).
  • fake_cluster support present in the target OCM service.

Test Steps

  1. Trigger the prow job via /test rosa-sts-advanced-critical-high-presubmit on a PR.
  2. Observe cluster creation completes without NAT Gateway errors in ~5 minutes.
  3. Verify the suite reports SUCCESS! with 13 passing tests.
  4. Optionally trigger /test rosa-sts-private-critical-high-presubmit and confirm the same result.

Expected Results

Both jobs pass. No NAT Gateways are created. fake_cluster=true is visible in the cluster's custom_properties in the OCM API.

Proof of the Fix

  • Logs/CLI output: Prow job 2062919008268587008SUCCESS! 3m47.987639185s PASS, 13 passed, 0 failed.

Test Coverage

ci/prow/e2e-presubmits-rosa-sts-advanced-critical-high-presubmit

Profile rosa-sts-ad: fips=true, etcd=true, proxy=true, labeling=true, tagging=true, autoscaling=true, private_link=false, imdsv2=required, additional_sg=4, worker_disk=200, fake_cluster=true

ID Description Labels Real CI Fake cluster Notes
63134 is successfully installed Day1Post, High ✅ runs ✅ PASS Same assertion
63140 fips correctly set Day1Post, High ✅ runs ✅ PASS Same assertion
63133 private_link correctly set Day1Post, High ✅ runs ✅ PASS Same assertion
63143 etcd-encryption correctly set Day1Post, High ✅ runs ✅ PASS Same assertion — etcd stored without KMS key
63950 imdsv2 correctly set (Classic) Day1Post, Critical ✅ runs ✅ PASS Same assertion
68423 compute_labels correctly set Day1Post, High ✅ runs ✅ PASS Same assertion
63777 AWS tags are set Day1Post, High ✅ runs ✅ PASS Same assertion
75107 multiarch correctly set Day1Post, High ✅ runs ✅ PASS Same assertion
69145 additional security groups Day1Post, Critical ✅ runs ✅ PASS Same assertion
69143 worker disk size correctly set Day1Post, Critical ✅ runs ✅ PASS Same assertion
88408 autoscaling correctly set Day1Post, High ✅ runs ✅ PASS Same assertion — autoscaling config stored in OCM
67607 proxy correctly set Day1Post, High ✅ runs ✅ PASS Same assertion — dummy proxy values stored in OCM
69137 cluster autoscaler day2 Day2, High ✅ runs ✅ PASS Cluster exists in OCM
63316 delete account roles Day2, Critical ✅ runs ✅ PASS Cluster exists in OCM
70128 kubelet config Day2, High ✅ runs ✅ PASS Cluster exists in OCM
72485 etcd-encryption key correctly set Day1Post, High 🚫 always skipped (HCP-only) 🚫 always skipped
74096 resources will wait for cluster ready Day1Post, Critical 🚫 always skipped (full-resources only) 🚫 always skipped
75372 imdsv2 correctly set (HCP) Day1Post, Critical 🚫 always skipped (HCP-only) 🚫 always skipped
85748 break glass credential Day2, Critical 🚫 always skipped (HCP-only) 🚫 always skipped

Summary: 15 ✅ PASS / 0 ⚠️ gap / 4 🚫 always skipped / 0 ❌


ci/prow/e2e-presubmits-rosa-sts-private-critical-high-presubmit

Profile rosa-sts-pl: fips=false, etcd=false, proxy=false, labeling=false, tagging=false, private_link=true, autoscaling=false, imdsv2=optional, additional_sg=0, fake_cluster=true

ID Description Labels Real CI Fake cluster Notes
63134 is successfully installed Day1Post, High ✅ runs ✅ PASS Same assertion
63133 private_link correctly set Day1Post, High ✅ runs ✅ PASS Same assertion — PrivateLink flag set on fake cluster
63140 fips correctly set Day1Post, High ✅ runs ✅ PASS Same assertion — both false
63143 etcd-encryption correctly set Day1Post, High ✅ runs ✅ PASS Same assertion — both false
63950 imdsv2 correctly set (Classic) Day1Post, Critical ✅ runs ✅ PASS Same assertion — default optional
68423 compute_labels correctly set Day1Post, High ✅ runs ✅ PASS Same assertion — labels==nil
63777 AWS tags are set Day1Post, High ✅ runs ✅ PASS Same assertion — 2 built-in tags only
75107 multiarch correctly set Day1Post, High ✅ runs ✅ PASS Same assertion
69145 additional security groups Day1Post, Critical ✅ runs ✅ PASS Same assertion — empty SGs
69143 worker disk size correctly set Day1Post, Critical ✅ runs ✅ PASS Same assertion — default 300
69137 cluster autoscaler day2 Day2, High ✅ runs ✅ PASS Cluster exists in OCM
63316 delete account roles Day2, Critical ✅ runs ✅ PASS Cluster exists in OCM
70128 kubelet config Day2, High ✅ runs ✅ PASS Cluster exists in OCM
88408 autoscaling correctly set Day1Post, High 🚫 always skipped (autoscaling=false in profile) 🚫 always skipped
67607 proxy correctly set Day1Post, High 🚫 always skipped (proxy=false in profile) 🚫 always skipped
72485 etcd-encryption key correctly set Day1Post, High 🚫 always skipped (HCP-only) 🚫 always skipped
74096 resources will wait for cluster ready Day1Post, Critical 🚫 always skipped (full-resources only) 🚫 always skipped
75372 imdsv2 correctly set (HCP) Day1Post, Critical 🚫 always skipped (HCP-only) 🚫 always skipped
85748 break glass credential Day2, Critical 🚫 always skipped (HCP-only) 🚫 always skipped

Summary: 13 ✅ PASS / 0 ⚠️ gap / 6 🚫 always skipped / 0 ❌


Breaking Changes

  • No breaking changes

Developer Verification Checklist

  • Commit subject/title follows [JIRA-TICKET] | [TYPE][(scope)][!]: <MESSAGE>.
  • PR description clearly explains both what changed and why.
  • Relevant Jira/GitHub issues and related PRs are linked.
  • make install-hooks has been run in this clone.
  • Tests were added/updated where appropriate.
  • I manually tested the change.
  • make pre-push-checks passes.
  • make fmt-check passes.
  • make build passes.
  • Documentation was added/updated where appropriate.
  • Any risk, limitation, or follow-up work is documented.

@openshift-ci openshift-ci Bot requested review from aaraj7 and jerichokeyne June 3, 2026 21:41
@amandahla amandahla marked this pull request as draft June 3, 2026 21:41
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds profile-level custom properties: a CustomProperties field on Profile, the handler merges profile properties into cluster creation args, the rosa-sts-ad test profile sets fake_cluster: "true", and a test Terraform security-group submodule is pinned to ~> 5.0.

Changes

Custom Properties Profile Feature

Layer / File(s) Summary
Profile struct custom properties field
tests/utils/profilehandler/profile.go
The Profile struct adds a new CustomProperties field of type map[string]string with ini and json tags to carry custom key/value pairs from profiles.
Handler: fake-cluster gating, BYOVPC and etcd/KMS handling
tests/utils/profilehandler/handler.go
Adds isFakeCluster helper; BYOVPC provisioning/cleanup branches now skip when a profile marks the cluster as fake; etcd/KMS key preparation is skipped for fake clusters and clusterArgs.Etcd is set when ctx.profile.Etcd is true.
Handler merges profile custom properties
tests/utils/profilehandler/handler.go
GenerateClusterCreationArgs updates clusterArgs.CustomProperties by copying global defaults and overlaying profile-level CustomProperties.
Test profile configuration with custom properties
tests/ci/profiles/tf_classic_cluster_profiles.yml
The rosa-sts-ad profile is configured with a custom_properties block containing fake_cluster: "true".
Terraform security-group module version pin
tests/tf-manifests/aws/security-groups/main.tf
Pins the terraform-aws-modules/security-group/aws//modules/http-80 submodule with version = \"~> 5.0\" in the web_server_sg module reference.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Pr Checklist Claims Vs Evidence (Generic) ✅ Passed PR body contains no markdown checklist items (- [ ] or - [x]); check not applicable per instructions.
Title check ✅ Passed The pull request title clearly and accurately describes the main change: adding a fake_cluster property to the rosa-sts-ad profile in the CI configuration.
Description check ✅ Passed The PR description comprehensively covers all required template sections with detailed context, problem statement, implementation details, testing approach, and risk analysis.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@amandahla amandahla marked this pull request as ready for review June 3, 2026 21:41
@amandahla amandahla force-pushed the test-classic-fake-cluster branch from 76a3023 to 0439334 Compare June 3, 2026 21:42
@amandahla amandahla force-pushed the test-classic-fake-cluster branch 3 times, most recently from ce008b4 to 48cff57 Compare June 3, 2026 21:55

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/utils/profilehandler/profile.go (1)

4-57: 💤 Low value

Consider breaking long lines for linter compliance.

The static analysis tool flags multiple lines exceeding 120 characters (lines 9, 10, 15, 16, 33, 38, 46, 57). While these appear to be alignment-only reformatting changes, the linter violations could be addressed by:

  • Splitting tags across multiple lines
  • Using shorter field names (less desirable)
  • Adjusting struct tag formatting

Since this affects existing fields and doesn't change functionality, this can be deferred if alignment consistency is preferred.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/utils/profilehandler/profile.go` around lines 4 - 57, Multiple struct
field lines (e.g., MajorVersion, Version, Zones, StorageLB, Etcd,
WorkerDiskSize, AdditionalSGNumber, DontWaitForCluster) exceed the 120-char
linter limit due to long inline comments/tags; fix by moving the trailing inline
comments off the field line into their own preceding or following comment lines
(or shortening those comments) so the field name + tag line stays <120 chars
without changing tags or field names (move comments for MajorVersion, Version,
Zones, StorageLB, Etcd, WorkerDiskSize, AdditionalSGNumber, DontWaitForCluster).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/utils/profilehandler/profile.go`:
- Around line 4-57: Multiple struct field lines (e.g., MajorVersion, Version,
Zones, StorageLB, Etcd, WorkerDiskSize, AdditionalSGNumber, DontWaitForCluster)
exceed the 120-char linter limit due to long inline comments/tags; fix by moving
the trailing inline comments off the field line into their own preceding or
following comment lines (or shortening those comments) so the field name + tag
line stays <120 chars without changing tags or field names (move comments for
MajorVersion, Version, Zones, StorageLB, Etcd, WorkerDiskSize,
AdditionalSGNumber, DontWaitForCluster).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2b0b950b-10bc-4566-b5c5-98b537d391eb

📥 Commits

Reviewing files that changed from the base of the PR and between bf83d01 and ce008b4.

📒 Files selected for processing (3)
  • tests/ci/profiles/tf_classic_cluster_profiles.yml
  • tests/utils/profilehandler/handler.go
  • tests/utils/profilehandler/profile.go

@amandahla

Copy link
Copy Markdown
Member Author

/test e2e-presubmits-rosa-sts-advanced-critical-high-presubmit

@amandahla amandahla force-pushed the test-classic-fake-cluster branch from 48cff57 to f3c7732 Compare June 3, 2026 23:52
@amandahla

Copy link
Copy Markdown
Member Author

/test e2e-presubmits-rosa-sts-advanced-critical-high-presubmit

@amandahla amandahla force-pushed the test-classic-fake-cluster branch 2 times, most recently from 8a225fa to f139e9c Compare June 5, 2026 12:37

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/utils/profilehandler/handler.go (1)

235-237: ⚡ Quick win

Make fake-cluster detection tolerant to case and whitespace.

Line 236 only matches exact "true", so values like "True" or " true " silently disable fake-cluster behavior.

Suggested fix
 func (ctx *profileContext) isFakeCluster() bool {
-	return ctx.profile.CustomProperties["fake_cluster"] == "true"
+	return strings.EqualFold(strings.TrimSpace(ctx.profile.CustomProperties["fake_cluster"]), "true")
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/utils/profilehandler/handler.go` around lines 235 - 237, The
isFakeCluster method currently only returns true for an exact "true" value;
update it to trim whitespace and do a case-insensitive comparison against "true"
so values like " True " or "TRUE" are accepted. In the
profileContext.isFakeCluster function, use strings.TrimSpace on
ctx.profile.CustomProperties["fake_cluster"] and compare with strings.EqualFold
(or strings.ToLower) to "true"; add the necessary import for the strings
package. This preserves the same semantics but makes fake-cluster detection
tolerant to case and surrounding whitespace.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/utils/profilehandler/handler.go`:
- Around line 894-898: The code dereferences clusterArgs.AccountRolePrefix
inside the block that calls ctx.PrepareKMSKey without ensuring AccountRolePrefix
is set, which can panic when Etcd/KMSKey is true but STS (and thus
AccountRolePrefix) is unset; update the guard on the if that uses
ctx.profile.Etcd || ctx.profile.KMSKey && !ctx.isFakeCluster() to also require
clusterArgs.AccountRolePrefix != nil (or equivalently check STS flag) before
calling ctx.PrepareKMSKey, and only dereference *clusterArgs.AccountRolePrefix
inside that guarded block so PrepareKMSKey is invoked with safe, initialized
arguments (functions/variables to locate: ctx.PrepareKMSKey,
clusterArgs.AccountName/ClusterName, clusterArgs.AccountRolePrefix,
ctx.profile.UnifiedAccRolesPath, ctx.profile.Etcd, ctx.profile.KMSKey,
ctx.isFakeCluster()).

---

Nitpick comments:
In `@tests/utils/profilehandler/handler.go`:
- Around line 235-237: The isFakeCluster method currently only returns true for
an exact "true" value; update it to trim whitespace and do a case-insensitive
comparison against "true" so values like " True " or "TRUE" are accepted. In the
profileContext.isFakeCluster function, use strings.TrimSpace on
ctx.profile.CustomProperties["fake_cluster"] and compare with strings.EqualFold
(or strings.ToLower) to "true"; add the necessary import for the strings
package. This preserves the same semantics but makes fake-cluster detection
tolerant to case and surrounding whitespace.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: terraform-redhat/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1ea99c0b-b48d-40a1-8e46-53787995ef91

📥 Commits

Reviewing files that changed from the base of the PR and between 8a225fa and f139e9c.

📒 Files selected for processing (4)
  • tests/ci/profiles/tf_classic_cluster_profiles.yml
  • tests/tf-manifests/aws/security-groups/main.tf
  • tests/utils/profilehandler/handler.go
  • tests/utils/profilehandler/profile.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/ci/profiles/tf_classic_cluster_profiles.yml
  • tests/tf-manifests/aws/security-groups/main.tf
  • tests/utils/profilehandler/profile.go

Comment thread tests/utils/profilehandler/handler.go
@amandahla amandahla force-pushed the test-classic-fake-cluster branch from f139e9c to 3e68ac1 Compare June 5, 2026 14:01
@openshift-ci

openshift-ci Bot commented Jun 5, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign davidleerh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@amandahla amandahla force-pushed the test-classic-fake-cluster branch 3 times, most recently from 203eef9 to 868d5ac Compare June 5, 2026 16:21
@amandahla amandahla changed the title OCM-00000 | ci: add fake_cluster to rosa-sts-ad profile ROSAENG-58088: ci: add fake_cluster to rosa-sts-ad profile Jun 5, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 5, 2026

Copy link
Copy Markdown

@amandahla: This pull request references ROSAENG-58088 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

PR Summary

Adds fake_cluster support to the rosa-sts-ad CI profile and updates the profile handler to skip real AWS resource provisioning (VPC setup, KMS key creation) when testing against a fake cluster.

Detailed Description of the Issue

The rosa-sts-ad profile runs with BYOVPC and etcd/KMS flags enabled. When testing against OCM staging using a fake cluster, GenerateClusterCreationArgs attempted to provision real VPCs and KMS keys—steps that fail without real AWS infrastructure. This change adds an
isFakeCluster() guard (keyed on custom_properties.fake_cluster == "true") so the profile can complete end-to-end without those provisioning steps.

A secondary bug is also fixed: profile-level CustomProperties were silently dropped in favor of global defaults. They are now merged so both sources are forwarded to cluster args.

Profile rosa-sts-ad: fips=true, etcd=true, proxy=false, labeling=true, tagging=true, private_link=false, autoscaling=false, imdsv2=required, additional_sg=4, worker_disk=200, fake_cluster=true

ID Description Labels Real CI Fake cluster Notes
63134 is successfully installed Day1Post, High ✅ runs ✅ PASS Same assertion — OCM sets these for fake clusters
63140 fips correctly set Day1Post, High ✅ runs ✅ PASS Same assertion — FIPS flag stored in OCM
63133 private_link correctly set Day1Post, High ✅ runs ✅ PASS Same assertion
63143 etcd-encryption correctly set Day1Post, High ✅ runs ✅ PASS Same assertion — etcd flag stored in OCM without KMS key
63950 imdsv2 correctly set (Classic) Day1Post, Critical ✅ runs ✅ PASS Same assertion
68423 compute_labels correctly set Day1Post, High ✅ runs ✅ PASS Same assertion — labels stored in OCM
63777 AWS tags are set Day1Post, High ✅ runs ✅ PASS Same assertion — tags stored in OCM
75107 multiarch correctly set Day1Post, High ✅ runs ✅ PASS Same assertion
69145 additional security groups Day1Post, Critical ✅ runs ✅ PASS Same assertion
69143 worker disk size correctly set Day1Post, Critical ✅ runs ✅ PASS Same assertion — size stored in OCM
69137 cluster autoscaler day2 Day2, High ✅ runs ✅ PASS Was ❌ — now passes — cluster exists in OCM
63316 delete account roles Day2, Critical ✅ runs ✅ PASS Was ❌ — now passes — uses cluster ID from workspace
70128 kubelet config Day2, High ✅ runs ✅ PASS Was ❌ — now passes — cluster exists in OCM
88408 autoscaling correctly set Day1Post, High ✅ runs ⚠️ SKIP profile.IsAutoscaling()=false
67607 proxy correctly set Day1Post, High ✅ runs ⚠️ SKIP proxy:false in profile
72485 etcd-encryption key correctly set Day1Post, High 🚫 always skipped (HCP-only) 🚫 always skipped
74096 resources will wait for cluster ready Day1Post, Critical 🚫 always skipped (full-resources only) 🚫 always skipped
75372 imdsv2 correctly set (HCP) Day1Post, Critical 🚫 always skipped (HCP-only) 🚫 always skipped
85748 break glass credential Day2, Critical 🚫 always skipped (HCP-only) 🚫 always skipped

Summary: 13 ✅ PASS / 2 ⚠️ gap / 4 🚫 always skipped / 0 ❌


Missing assertions vs real cluster

Same assertions, same results:
All 10 Day1Post tests assert against OCM-stored cluster configuration (fips, etcd,
tags, labels, imdsv2, disk size) — identical whether cluster is real or fake.
All 3 Day2 tests (69137, 63316, 70128) test provider operations against OCM; OCM
does not distinguish real from fake for these. Assertions are identical.

Genuinely missing (the 2 gaps):

ID What is not tested Why
67607 Proxy values on cluster (HTTPProxy, HTTPSProxy, NoProxy, trust bundle) Proxy needs a real EC2 instance — structurally impossible with a fake cluster
88408 Node pool autoscaling min/max replicas autoscaling_enabled not set — ClusterAutoscaler (69137) IS tested, node autoscaling is not

Related Issues and PRs

  • Jira: OCM-00000
  • Fixes: #
  • Related PR(s):
  • Related design/docs:

Type of Change

  • feat - adds a new user-facing capability.
  • fix - resolves an incorrect behavior or bug.
  • docs - updates documentation only.
  • style - formatting or naming changes with no logic impact.
  • refactor - code restructuring with no behavior change.
  • test - adds or updates tests only.
  • chore - maintenance work (tooling, housekeeping, non-product code).
  • build - changes build system, packaging, or dependencies for build output.
  • ci - changes CI pipelines, jobs, or automation workflows.
  • perf - improves performance without changing intended behavior.

Previous Behavior

  • The rosa-sts-ad profile always triggered BYOVPC provisioning and KMS key creation during cluster creation args generation.
  • The Etcd flag and KMS key provisioning were coupled; neither could be set without the other.
  • Profile-level CustomProperties were overwritten by global defaults.

Behavior After This Change

  • The rosa-sts-ad profile sets fake_cluster: "true" in custom_properties, enabling the new guard.
  • isFakeCluster() returns true for any profile with that flag; GenerateClusterCreationArgs and DestroyRHCSClusterResources skip BYOVPC setup and KMS key provisioning in that case.
  • Etcd is now set independently of KMS key provisioning; the KMS block is gated on (profile.Etcd || profile.KMSKey) && !isFakeCluster().
  • Profile-level CustomProperties are merged on top of global defaults via helper.MergeMaps before being assigned to clusterArgs.CustomProperties.

How to Test (Step-by-Step)

Preconditions

  • OCM staging environment access with a valid token.
  • fake_cluster support present in the target OCM service.

Test Steps

  1. Set OCM_TOKEN and configure the provider to target OCM staging.
  2. Run the rosa-sts-ad profile through the CI test suite.
  3. Confirm no VPC or KMS key provisioning is attempted.
  4. Confirm fake_cluster: "true" appears in the cluster's custom_properties in OCM staging.

Expected Results

The rosa-sts-ad profile completes without AWS provisioning errors and registers the fake cluster in OCM staging with the correct custom properties.

Proof of the Fix

  • Screenshots:
  • Videos:
  • Logs/CLI output:
  • Other artifacts:

Breaking Changes

  • No breaking changes
  • Yes, this PR introduces a breaking change (describe impact and migration plan below)

Breaking Change Details / Migration Plan

Developer Verification Checklist

  • Commit subject/title follows [JIRA-TICKET] | [TYPE][(scope)][!]: <MESSAGE>.
  • PR description clearly explains both what changed and why.
  • Relevant Jira/GitHub issues and related PRs are linked.
  • make install-hooks has been run in this clone.
  • Tests were added/updated where appropriate.
  • I manually tested the change.
  • make pre-push-checks passes.
  • make fmt-check passes.
  • make build passes.
  • Documentation was added/updated where appropriate.
  • Any risk, limitation, or follow-up work is documented.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@amandahla amandahla force-pushed the test-classic-fake-cluster branch from 868d5ac to 233a859 Compare June 5, 2026 17:19
@amandahla amandahla force-pushed the test-classic-fake-cluster branch from df0e524 to df29c55 Compare June 5, 2026 17:46
@amandahla

Copy link
Copy Markdown
Member Author

/test e2e-presubmits-rosa-sts-private-critical-high-presubmit

1 similar comment
@amandahla

Copy link
Copy Markdown
Member Author

/test e2e-presubmits-rosa-sts-private-critical-high-presubmit

@amandahla amandahla force-pushed the test-classic-fake-cluster branch 2 times, most recently from 9bf020f to dcab47f Compare June 5, 2026 21:08
@amandahla

Copy link
Copy Markdown
Member Author

/test e2e-presubmits-rosa-sts-advanced-critical-high-presubmit

Signed-off-by: Amanda Hager Lopes de Andrade Katz <amanda.katz@redhat.com>
@amandahla amandahla force-pushed the test-classic-fake-cluster branch from dcab47f to cbbd226 Compare June 8, 2026 14:03
@openshift-ci

openshift-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown

@amandahla: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-presubmits-rosa-hcp-private-critical-high-presubmit cbbd226 link true /test e2e-presubmits-rosa-hcp-private-critical-high-presubmit
ci/prow/e2e-presubmits-rosa-sts-advanced-critical-high-presubmit cbbd226 link true /test e2e-presubmits-rosa-sts-advanced-critical-high-presubmit
ci/prow/e2e-presubmits-rosa-hcp-advanced-critical-high-presubmit cbbd226 link true /test e2e-presubmits-rosa-hcp-advanced-critical-high-presubmit
ci/prow/e2e-presubmits-rosa-sts-private-critical-high-presubmit cbbd226 link true /test e2e-presubmits-rosa-sts-private-critical-high-presubmit

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@amandahla

Copy link
Copy Markdown
Member Author

Closing since after our meeting we decided on follow a different approach by improving our unit tests + reviewing heavy e2e tests (moving them to periodic, for example).

@amandahla amandahla closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants