Skip to content

BE-1454 - Split CloudWatch rate limits per account region#13

Open
omerkarj wants to merge 5 commits intomasterfrom
omerk/be-1454-split-rate-limits
Open

BE-1454 - Split CloudWatch rate limits per account region#13
omerkarj wants to merge 5 commits intomasterfrom
omerk/be-1454-split-rate-limits

Conversation

@omerkarj
Copy link
Copy Markdown

@omerkarj omerkarj commented May 6, 2026

Summary

  • Split CloudWatch API rate limiters into independent (account_id, region) buckets while preserving the existing per-API limit values.
  • Pass the resolved AWS account ID into CloudWatch client construction so discovery, custom namespace, static, ListMetrics, GetMetricData, and GetMetricStatistics use the correct bucket.
  • Add account_id to rate-limit telemetry and cover independent account/region buckets with a regression test.

Test plan

  • go test ./pkg/clients/cloudwatch
  • go test ./pkg/clients/... ./pkg/job/... ./pkg/promutil
  • go test ./...

Made with Cursor

Summary by CodeRabbit

  • Refactor

    • CloudWatch rate limiting now uses per-account-and-region buckets for better isolation and accuracy.
  • Observability

    • Rate-limit metrics now include account_id for per-account monitoring and more precise labeling.
  • Breaking Changes

    • CloudWatch client factory and client creation APIs now require an account identifier parameter.
  • Tests

    • Test suite updated to exercise account-scoped rate limiting, cancellation behavior, and updated metric labeling.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 903d9619-7e15-4749-b9b4-09b04d7f1011

📥 Commits

Reviewing files that changed from the base of the PR and between 9c31851 and dc063fc.

📒 Files selected for processing (5)
  • pkg/clients/cloudwatch/rate_limiter_test.go
  • pkg/clients/v1/factory.go
  • pkg/clients/v1/factory_test.go
  • pkg/clients/v2/factory.go
  • pkg/clients/v2/factory_test.go

Walkthrough

Refactors CloudWatch rate limiting to per-account-and-region limiter buckets, adds Bucketed config, propagates accountID through factories and NewRateLimitedClient, extends SimpleRateLimitedClient with account/role, updates Prometheus labels to include account_id, and updates tests and call sites.

Changes

Account-Scoped Rate Limiting

Layer / File(s) Summary
Data Shape
pkg/clients/cloudwatch/rate_limiter.go
RateLimiterConfig adds Bucketed bool. GlobalRateLimiter now stores per-API *rateLimiterBuckets. Introduces rateLimiterBucketKey. SimpleRateLimitedClient gains accountID and role. NewRateLimitedClient signature adds accountID.
Bucket Management
pkg/clients/cloudwatch/rate_limiter.go
Adds rateLimiterBuckets type, newRateLimiterBuckets, and (*rateLimiterBuckets).get for concurrency-safe lazy creation and caching of per-account-region rate.Limiter instances.
Core Rate-Limiting Logic
pkg/clients/cloudwatch/rate_limiter.go
Rate-limited methods (ListMetrics, GetMetricData, GetMetricStatistics) call a new limit implementation that resolves a per-bucket limiter via accountID+region, applies Allow()/Wait(), and records Prometheus metrics including account_id and role.
Factory API
pkg/clients/factory.go, pkg/clients/v1/factory.go, pkg/clients/v2/factory.go
GetCloudwatchClient signature updated to include accountID string between region and role. Factories forward accountID into NewRateLimitedClient when wrapping clients with rate limiting.
Call Sites
pkg/job/scrape.go
All CloudWatch client creation call sites updated to pass accountID to GetCloudwatchClient.
Metrics Definition
pkg/promutil/prometheus.go
CloudwatchRateLimitWaitCounter and CloudwatchRateLimitAllowedCounter label vectors extended to include account_id.
Tests & Verification
pkg/clients/cloudwatch/rate_limiter_test.go, pkg/clients/v1/factory_test.go, pkg/clients/v2/factory_test.go
Tests updated for new factory/client signatures and metric labels; new tests verify independent per-account/region buckets, global-bucket behavior when disabled, and per-account metric labeling.
sequenceDiagram
    participant Client as Client
    participant SRLC as SimpleRateLimitedClient
    participant GL as GlobalRateLimiter
    participant RB as rateLimiterBuckets
    participant RL as rate.Limiter
    participant P as Prometheus

    Client->>SRLC: API call (ListMetrics/GetMetricData/GetMetricStatistics) [region, accountID, role]
    SRLC->>GL: limit(op, accountID, region, role)
    GL->>RB: get(accountID, region)
    RB->>RL: create or return limiter
    SRLC->>RL: Allow()/Wait()
    RL-->>SRLC: permit/grant
    SRLC->>Client: forward call to underlying client
    SRLC->>P: increment allowed/wait counters (labels include account_id, region, role)
Loading

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: splitting CloudWatch rate limits into per-account and per-region buckets, which is the core refactoring described throughout the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch omerk/be-1454-split-rate-limits

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/clients/v2/factory.go (1)

170-190: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply the rate-limit wrapper after cache lookup, not before caching.

Line 177 returns the cached client before the account-aware wrapper is applied, so a warmed cache bypasses rate limiting, and the first cold miss permanently bakes one wrapper/account into cachedClients.cloudwatch. Keep the cached value as the raw CloudWatch client and build the rate-limited wrapper per GetCloudwatchClient(...) call.

Suggested fix
 func (c *CachingFactory) GetCloudwatchClient(region string, accountID string, role model.Role, concurrency cloudwatch_client.ConcurrencyConfig, globalRateLimiter *cloudwatch_client.GlobalRateLimiter) cloudwatch_client.Client {
 	if !c.refreshed.Load() {
 		// if we have not refreshed then we need to lock in case we are accessing concurrently
 		c.mu.Lock()
 		defer c.mu.Unlock()
 	}
 
-	if cachedClient := c.clients[role][region].cloudwatch; cachedClient != nil {
-		return cloudwatch_client.NewLimitedConcurrencyClient(cachedClient, concurrency.NewLimiter())
-	}
+	baseClient := c.clients[role][region].cloudwatch
+	if baseClient == nil {
+		baseClient = cloudwatch_v2.NewClient(c.logger, c.createCloudwatchClient(c.clients[role][region].awsConfig))
+		c.clients[role][region].cloudwatch = baseClient
+	}
 
-	// Create new client and cache it
-	client := cloudwatch_v2.NewClient(c.logger, c.createCloudwatchClient(c.clients[role][region].awsConfig))
-
 	// Apply global rate limiter if provided
+	client := baseClient
 	if globalRateLimiter != nil {
 		client = cloudwatch_client.NewRateLimitedClient(client, globalRateLimiter, region, accountID, role.RoleArn)
 	}
 
-	c.clients[role][region].cloudwatch = client
 	return cloudwatch_client.NewLimitedConcurrencyClient(client, concurrency.NewLimiter())
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/clients/v2/factory.go` around lines 170 - 190, GetCloudwatchClient
currently returns or caches a wrapped client which bakes a specific
account-aware rate limiter into c.clients[role][region].cloudwatch; change it so
the cache stores only the raw cloudwatch_v2 client
(c.clients[role][region].cloudwatch should be the unwrapped client), and always
apply the account-aware rate-limit wrapper
(cloudwatch_client.NewRateLimitedClient) and the concurrency wrapper
(cloudwatch_client.NewLimitedConcurrencyClient) per call after retrieving the
cached raw client. Concretely, in GetCloudwatchClient: when reading
c.clients[role][region].cloudwatch treat it as the raw client and wrap it with
NewRateLimitedClient (if globalRateLimiter != nil) and then
NewLimitedConcurrencyClient before returning, and when creating and storing a
new client use cloudwatch_v2.NewClient(...) to cache the raw client (do not
cache a rate-limited client).
pkg/clients/v1/factory.go (1)

290-309: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t cache the account-aware wrapper here.

Line 298 returns the cached client before the new wrapper is applied, so accountID and globalRateLimiter only matter on a cold miss. That means Refresh() hot paths skip rate limiting entirely, and cold misses lock the first wrapper state into the cache. Cache the raw CloudWatch client, then apply NewRateLimitedClient(...) per call before the concurrency wrapper.

Suggested fix
 func (c *CachingFactory) GetCloudwatchClient(region string, accountID string, role model.Role, concurrency cloudwatch_client.ConcurrencyConfig, globalRateLimiter *cloudwatch_client.GlobalRateLimiter) cloudwatch_client.Client {
 	if !c.refreshed.Load() {
 
 		// if we have not refreshed then we need to lock in case we are accessing concurrently
 		c.mu.Lock()
 		defer c.mu.Unlock()
 	}
 
-	if cachedClient := c.clients[role][region].cloudwatch; cachedClient != nil {
-		return cloudwatch_client.NewLimitedConcurrencyClient(cachedClient, concurrency.NewLimiter())
-	}
-	client := createCloudWatchClient(c.logger, c.session, &region, role, c.fips)
+	baseClient := c.clients[role][region].cloudwatch
+	if baseClient == nil {
+		baseClient = createCloudWatchClient(c.logger, c.session, &region, role, c.fips)
+		c.clients[role][region].cloudwatch = baseClient
+	}
 
 	// Apply global rate limiter if provided
+	client := baseClient
 	if globalRateLimiter != nil {
 		client = cloudwatch_client.NewRateLimitedClient(client, globalRateLimiter, region, accountID, role.RoleArn)
 	}
 
-	c.clients[role][region].cloudwatch = client
 	return cloudwatch_client.NewLimitedConcurrencyClient(client, concurrency.NewLimiter())
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/clients/v1/factory.go` around lines 290 - 309, GetCloudwatchClient is
caching the already-wrapped client so globalRateLimiter/accountID are only
applied on cold misses; instead store the raw CloudWatch client in
c.clients[role][region].cloudwatch (the value set by createCloudWatchClient) and
on every call, wrap that cached raw client with
cloudwatch_client.NewRateLimitedClient(...) when globalRateLimiter != nil, then
always return cloudwatch_client.NewLimitedConcurrencyClient(...) around the
per-call (possibly rate-limited) client; adjust the cachedClient check to detect
only the raw cached client and apply NewRateLimitedClient and
NewLimitedConcurrencyClient per invocation, keeping existing locking
(refreshed.Load, c.mu.Lock) and use createCloudWatchClient to populate the cache
when nil.
🧹 Nitpick comments (1)
pkg/promutil/prometheus.go (1)

88-95: Document the metric-schema break.

Adding account_id changes the series identity for both existing metric names. Dashboards, alerts, and recording rules on these counters will need a migration note or compatibility plan.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/promutil/prometheus.go` around lines 88 - 95, Adding the new "account_id"
label to CloudwatchRateLimitWaitCounter and CloudwatchRateLimitAllowedCounter
changes series identity and will break dashboards/alerts; update the metric
schema documentation and provide a compatibility plan: either rename the new
counters (e.g., add a "_v2" suffix) and register both old and new metrics, or
keep the old metric shape by exposing a legacy label value (e.g., account_id="")
while introducing the new metric name; update README/CHANGELOG with the
migration note and list of affected metric names
(yace_cloudwatch_rate_limit_waits_total,
yace_cloudwatch_rate_limit_allowed_total), and adjust any recording
rules/dashboards/alerting docs to point to the new metric or explain how to
transition.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/clients/cloudwatch/rate_limiter_test.go`:
- Around line 284-291: The test reuses a single shortCtx (context.WithTimeout
50ms) for two independent ListMetrics calls which can make the second assertion
fail due to elapsed time; update the test to create a fresh context.WithTimeout
(and corresponding cancel) for each call to
sameRegionDifferentAccountClient.ListMetrics and
sameAccountDifferentRegionClient.ListMetrics so each assertion has its own
independent deadline and is properly cancelled after the call.

---

Outside diff comments:
In `@pkg/clients/v1/factory.go`:
- Around line 290-309: GetCloudwatchClient is caching the already-wrapped client
so globalRateLimiter/accountID are only applied on cold misses; instead store
the raw CloudWatch client in c.clients[role][region].cloudwatch (the value set
by createCloudWatchClient) and on every call, wrap that cached raw client with
cloudwatch_client.NewRateLimitedClient(...) when globalRateLimiter != nil, then
always return cloudwatch_client.NewLimitedConcurrencyClient(...) around the
per-call (possibly rate-limited) client; adjust the cachedClient check to detect
only the raw cached client and apply NewRateLimitedClient and
NewLimitedConcurrencyClient per invocation, keeping existing locking
(refreshed.Load, c.mu.Lock) and use createCloudWatchClient to populate the cache
when nil.

In `@pkg/clients/v2/factory.go`:
- Around line 170-190: GetCloudwatchClient currently returns or caches a wrapped
client which bakes a specific account-aware rate limiter into
c.clients[role][region].cloudwatch; change it so the cache stores only the raw
cloudwatch_v2 client (c.clients[role][region].cloudwatch should be the unwrapped
client), and always apply the account-aware rate-limit wrapper
(cloudwatch_client.NewRateLimitedClient) and the concurrency wrapper
(cloudwatch_client.NewLimitedConcurrencyClient) per call after retrieving the
cached raw client. Concretely, in GetCloudwatchClient: when reading
c.clients[role][region].cloudwatch treat it as the raw client and wrap it with
NewRateLimitedClient (if globalRateLimiter != nil) and then
NewLimitedConcurrencyClient before returning, and when creating and storing a
new client use cloudwatch_v2.NewClient(...) to cache the raw client (do not
cache a rate-limited client).

---

Nitpick comments:
In `@pkg/promutil/prometheus.go`:
- Around line 88-95: Adding the new "account_id" label to
CloudwatchRateLimitWaitCounter and CloudwatchRateLimitAllowedCounter changes
series identity and will break dashboards/alerts; update the metric schema
documentation and provide a compatibility plan: either rename the new counters
(e.g., add a "_v2" suffix) and register both old and new metrics, or keep the
old metric shape by exposing a legacy label value (e.g., account_id="") while
introducing the new metric name; update README/CHANGELOG with the migration note
and list of affected metric names (yace_cloudwatch_rate_limit_waits_total,
yace_cloudwatch_rate_limit_allowed_total), and adjust any recording
rules/dashboards/alerting docs to point to the new metric or explain how to
transition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d3c4ccec-754d-442a-8f7c-83d3e41cbfb4

📥 Commits

Reviewing files that changed from the base of the PR and between 5cf1705 and d5f11ab.

📒 Files selected for processing (9)
  • pkg/clients/cloudwatch/rate_limiter.go
  • pkg/clients/cloudwatch/rate_limiter_test.go
  • pkg/clients/factory.go
  • pkg/clients/v1/factory.go
  • pkg/clients/v1/factory_test.go
  • pkg/clients/v2/factory.go
  • pkg/clients/v2/factory_test.go
  • pkg/job/scrape.go
  • pkg/promutil/prometheus.go

Comment thread pkg/clients/cloudwatch/rate_limiter_test.go Outdated
omerkarj and others added 2 commits May 6, 2026 09:23
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/clients/cloudwatch/rate_limiter_test.go (1)

301-324: ⚡ Quick win

Also pin the same-account/same-region/different-role case.

This test proves bucket isolation by account and region, but the new wrapper now also carries role. One more assertion here would lock in that role affects metrics only, not limiter bucket selection.

Possible extension
 	firstBucketClient := NewRateLimitedClient(&mockClient{}, limiter, "us-east-1", "111111111111", "test-role")
 	sameRegionDifferentAccountClient := NewRateLimitedClient(&mockClient{}, limiter, "us-east-1", "222222222222", "test-role")
 	sameAccountDifferentRegionClient := NewRateLimitedClient(&mockClient{}, limiter, "us-west-2", "111111111111", "test-role")
+	sameBucketDifferentRoleClient := NewRateLimitedClient(&mockClient{}, limiter, "us-east-1", "111111111111", "other-role")
@@
 	err = sameAccountDifferentRegionClient.ListMetrics(shortCtx2, "test", nil, false, nil)
 	require.NoError(t, err)
+
+	sharedCtx, sharedCancel := context.WithTimeout(context.Background(), 50*time.Millisecond)
+	defer sharedCancel()
+	err = sameBucketDifferentRoleClient.ListMetrics(sharedCtx, "test", nil, false, nil)
+	require.Error(t, err)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/clients/cloudwatch/rate_limiter_test.go` around lines 301 - 324, Add a
same-account/same-region/different-role case to
TestRateLimitingUsesIndependentAccountRegionBuckets: create a client via
NewRateLimitedClient using the same region ("us-east-1") and account
("111111111111") as firstBucketClient but with a different role string, call
ListMetrics with the shortCtx used for the other checks, and assert it is
rate-limited (expect a non-nil error) so the test verifies that role does not
change bucket selection — only account+region do.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/clients/cloudwatch/rate_limiter_test.go`:
- Around line 301-324: Add a same-account/same-region/different-role case to
TestRateLimitingUsesIndependentAccountRegionBuckets: create a client via
NewRateLimitedClient using the same region ("us-east-1") and account
("111111111111") as firstBucketClient but with a different role string, call
ListMetrics with the shortCtx used for the other checks, and assert it is
rate-limited (expect a non-nil error) so the test verifies that role does not
change bucket selection — only account+region do.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3ebd74e5-d831-48fa-b55a-b03ff78e43a6

📥 Commits

Reviewing files that changed from the base of the PR and between d5f11ab and 28b3a57.

📒 Files selected for processing (2)
  • pkg/clients/cloudwatch/rate_limiter.go
  • pkg/clients/cloudwatch/rate_limiter_test.go

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/clients/cloudwatch/rate_limiter.go`:
- Around line 151-160: The cache currently stores a role/region-scoped wrapped
client while NewRateLimitedClient(now capturing accountID) requires per-account
labeling; update the factories that populate c.clients[role][region] so they
either include accountID in the cache key (e.g.,
c.clients[role][region][accountID]) or stop caching the rate-limited wrapper and
instead cache the raw SDK client and call NewRateLimitedClient(rawClient,
globalLimiter, region, accountID, role) per request to build the
SimpleRateLimitedClient with the correct accountID and limiter bucket; change
the code in the factory methods that create the client entries to implement one
of these approaches and ensure the cached value is now the raw SDK client if you
choose the latter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3e20c28d-15b4-4cde-8eed-e418f5f12ebf

📥 Commits

Reviewing files that changed from the base of the PR and between 28b3a57 and e3210ce.

📒 Files selected for processing (2)
  • pkg/clients/cloudwatch/rate_limiter.go
  • pkg/clients/cloudwatch/rate_limiter_test.go

Comment thread pkg/clients/cloudwatch/rate_limiter.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/clients/cloudwatch/rate_limiter.go (1)

134-152: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't cache an account-bound wrapper under a role/region key.

Lines 134-152 now bind accountID into SimpleRateLimitedClient, but pkg/clients/v1/factory.go Lines 290-310 and pkg/clients/v2/factory.go Lines 170-191 still cache that wrapper as c.clients[role][region].cloudwatch. The first account to populate a slot will be reused for later accounts in the same role/region, so those requests hit the wrong limiter bucket and export the wrong account_id. Cache by account too, or cache the raw SDK client and rebuild the rate-limited wrapper per request.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/clients/cloudwatch/rate_limiter.go` around lines 134 - 152, The wrapper
SimpleRateLimitedClient created by NewRateLimitedClient currently embeds
accountID but the factories cache it under c.clients[role][region].cloudwatch,
causing the first account to be reused for other accounts; fix by changing the
caching strategy so the rate-limited wrapper is keyed by account as well (e.g.
include accountID in the cache key) or alternatively stop caching the wrapper
and instead cache the raw SDK client and call NewRateLimitedClient per request
to rebuild a fresh SimpleRateLimitedClient with the correct accountID; update
the factory cache accesses that set c.clients[role][region].cloudwatch (and any
equivalent v1/v2 factory code paths) to use the account-aware approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@pkg/clients/cloudwatch/rate_limiter.go`:
- Around line 134-152: The wrapper SimpleRateLimitedClient created by
NewRateLimitedClient currently embeds accountID but the factories cache it under
c.clients[role][region].cloudwatch, causing the first account to be reused for
other accounts; fix by changing the caching strategy so the rate-limited wrapper
is keyed by account as well (e.g. include accountID in the cache key) or
alternatively stop caching the wrapper and instead cache the raw SDK client and
call NewRateLimitedClient per request to rebuild a fresh SimpleRateLimitedClient
with the correct accountID; update the factory cache accesses that set
c.clients[role][region].cloudwatch (and any equivalent v1/v2 factory code paths)
to use the account-aware approach.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5825532d-2cc0-4d2d-983d-f11fc2d0c9e9

📥 Commits

Reviewing files that changed from the base of the PR and between e3210ce and 9c31851.

📒 Files selected for processing (2)
  • pkg/clients/cloudwatch/rate_limiter.go
  • pkg/clients/cloudwatch/rate_limiter_test.go

Co-authored-by: Cursor <cursoragent@cursor.com>
@omerkarj
Copy link
Copy Markdown
Author

omerkarj commented May 6, 2026

Metrics schema note: leaving the account_id label on the existing YACE internal rate-limit counters. These counters are internal exporter telemetry, and groundcover-private currently defaults bucketed behavior off, so I do not think adding parallel compatibility metrics or a _v2 rename is worth the extra surface area for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant