Skip to content

[OpenTelemetry] Metrics - Raise min CircularBufferBuckets capacity to 2#7078

Merged
rajkumar-rangaraj merged 2 commits intoopen-telemetry:mainfrom
ysolomchenko:TryIncrement-scale-reduction-can-hang-for-capacity=1-buckets
Apr 15, 2026
Merged

[OpenTelemetry] Metrics - Raise min CircularBufferBuckets capacity to 2#7078
rajkumar-rangaraj merged 2 commits intoopen-telemetry:mainfrom
ysolomchenko:TryIncrement-scale-reduction-can-hang-for-capacity=1-buckets

Conversation

@ysolomchenko
Copy link
Copy Markdown
Contributor

@ysolomchenko ysolomchenko commented Apr 14, 2026

Fixes # N/A
Design discussion issue # N/A

Found by Codex/security scans.

Changes

Base2ExponentialBucketHistogram bucket initialization now rejects invalid maxBuckets values smaller than 2, preventing a possible hang during metrics recording.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

@github-actions github-actions bot added the pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package label Apr 14, 2026
@ysolomchenko ysolomchenko marked this pull request as ready for review April 14, 2026 13:23
@ysolomchenko ysolomchenko requested a review from a team as a code owner April 14, 2026 13:23
@ysolomchenko ysolomchenko changed the title [OpenTelemetry] Raise min CircularBufferBuckets capacity to 2 [OpenTelemetry] Metrics - Raise min CircularBufferBuckets capacity to 2 Apr 14, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.74%. Comparing base (53d590c) to head (af3c762).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7078      +/-   ##
==========================================
+ Coverage   88.72%   88.74%   +0.02%     
==========================================
  Files         270      270              
  Lines       12928    12928              
==========================================
+ Hits        11470    11473       +3     
+ Misses       1458     1455       -3     
Flag Coverage Δ
unittests-Project-Experimental 88.67% <100.00%> (+<0.01%) ⬆️
unittests-Project-Stable 88.70% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/OpenTelemetry/Metrics/CircularBufferBuckets.cs 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

@reyang
Copy link
Copy Markdown
Member

reyang commented Apr 14, 2026

preventing a possible hang during metrics recording

@ysolomchenko details?

Copy link
Copy Markdown
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear what was the problem, and how the change proposed in this PR will solve the problem.

@ysolomchenko
Copy link
Copy Markdown
Contributor Author

ysolomchenko commented Apr 15, 2026

It is unclear what was the problem, and how the change proposed in this PR will solve the problem.

The problem is that CalculateScaleReduction can enter an infinite loop for a valid input: capacity = 1.

If the range spans negative to non-negative values (e.g., -1 and 0), the loop converges to begin = -1 and end = 0. Further right shifts don’t change these values, so diff stays 1 and the loop condition remains true forever.

The PR fixes this by ensuring that this non-converging case cannot occur (e.g., by disallowing capacity = 1), thus preventing the infinite loop in TryIncrement.

For end users, nothing changes because existing usage already enforces capacity >= 2.

This PR simply adds an extra safeguard to handle the edge case (capacity = 1) and prevent a potential infinite loop, even though it is not reachable in current usage.

@reyang
Copy link
Copy Markdown
Member

reyang commented Apr 15, 2026

It is unclear what was the problem, and how the change proposed in this PR will solve the problem.

The problem is that CalculateScaleReduction can enter an infinite loop for a valid input: capacity = 1.

If the range spans negative to non-negative values (e.g., -1 and 0), the loop converges to begin = -1 and end = 0. Further right shifts don’t change these values, so diff stays 1 and the loop condition remains true forever.

The PR fixes this by ensuring that this non-converging case cannot occur (e.g., by disallowing capacity = 1), thus preventing the infinite loop in TryIncrement.

For end users, nothing changes because existing usage already enforces capacity >= 2.

This PR simply adds an extra safeguard to handle the edge case (capacity = 1) and prevent a potential infinite loop, even though it is not reachable in current usage.

Thanks @ysolomchenko! I have two suggestions:

  1. In the PR description or issue, explain the exact problem, following the spirit of https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/.github/ISSUE_TEMPLATE/bug_report.yml and be specific about - how to reproduce the issue, what's the expected result and what's the actual result.
  2. Add a test case which would fail before the change and would pass after the change, so we avoid the regression in the future.

@martincostello
Copy link
Copy Markdown
Member

In the PR description or issue, explain the exact problem, following the spirit of https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/.github/ISSUE_TEMPLATE/bug_report.yml and be specific about - how to reproduce the issue, what's the expected result and what's the actual result.

I think there's some intentional vagueness being applied to pull requests being raised at the moment ("Found by Codex/security scans.").

@reyang
Copy link
Copy Markdown
Member

reyang commented Apr 15, 2026

In the PR description or issue, explain the exact problem, following the spirit of https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/.github/ISSUE_TEMPLATE/bug_report.yml and be specific about - how to reproduce the issue, what's the expected result and what's the actual result.

I think there's some intentional vagueness being applied to pull requests being raised at the moment ("Found by Codex/security scans.").

I disagree wearing my SIG security maintainer hat.
If this is indeed a security issue, we should use security advisory and follow https://github.com/open-telemetry/sig-security/blob/main/security-response.md, and depending on how sensitive it is, the fix might need to happen in a private fork.

@martincostello
Copy link
Copy Markdown
Member

martincostello commented Apr 15, 2026

I can't speak to that specifically - the first I heard about this particular PR was when it was opened.

@reyang
Copy link
Copy Markdown
Member

reyang commented Apr 15, 2026

I can't speak to that specifically - the first I heard about this particular PR was when it was opened.

No worries, I'm here to help.

Here is my recommendation:

  1. If this seems to be a real security issue, we (maintainers + folks who reported the issue + folks who has the best context to provide a fix) should understand the exact problem, determine the severity (e.g. CVSS score) and decide how to fix it (publicly, privately) and communicate it (publish CVE directly, catch a monthly release train without declaring CVE, work with major customers to apply a private fix before making it public).
  2. If this is not a real security issue, but is an improvement for general goodness, we should be very explicit about it in PR/issue/changelog.

@Kielek
Copy link
Copy Markdown
Member

Kielek commented Apr 15, 2026

@reyang, it is not a security issue. It was detected during the Codex/security scans.
All production constructors usage are Guarded by 2 (except maybe some reflection stuff, but such scenarios are not supported).

Guard.ThrowIfOutOfRange(maxBuckets, min: 2);
this.Scale = scale;
this.PositiveBuckets = new CircularBufferBuckets(maxBuckets);
this.NegativeBuckets = new CircularBufferBuckets(maxBuckets);
}

IMO tests after changes are good enough, as it cover -1,0,1 with exception expectations.. It should prevent us with any unihtentional changes in the future.

@ysolomchenko, could you please extend description with such information? With this, we could proceed and merge changes IMO.

@reyang
Copy link
Copy Markdown
Member

reyang commented Apr 15, 2026

@reyang, it is not a security issue. It was detected during the Codex/security scans. All production constructors usage are Guarded by 2 (except maybe some reflection stuff, but such scenarios are not supported).

Guard.ThrowIfOutOfRange(maxBuckets, min: 2);
this.Scale = scale;
this.PositiveBuckets = new CircularBufferBuckets(maxBuckets);
this.NegativeBuckets = new CircularBufferBuckets(maxBuckets);
}

IMO tests after changes are good enough, as it cover -1,0,1 with exception expectations.. It should prevent us with any unihtentional changes in the future.

@ysolomchenko, could you please extend description with such information? With this, we could proceed and merge changes IMO.

Got it, thank you very much!

Copy link
Copy Markdown
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@rajkumar-rangaraj rajkumar-rangaraj added this pull request to the merge queue Apr 15, 2026
Merged via the queue into open-telemetry:main with commit ea20342 Apr 15, 2026
63 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

Thank you for your contribution @ysolomchenko! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey.

tarekgh added a commit to tarekgh/runtime that referenced this pull request Apr 15, 2026
Defense-in-depth: CalculateScaleReduction can infinite-loop when
capacity is 1 and the index range spans negative-to-non-negative
values (e.g. begin=-1, end=0), because right-shifting converges to
a fixed point where diff remains >= capacity forever.

This is not reachable in current usage because the only call site
(Base2ExponentialHistogramAggregator) already enforces maxBuckets >= 2,
and there is a Debug.Assert(capacity >= 2) in CalculateScaleReduction.
This change hardens CircularBufferBuckets itself so future callers
cannot accidentally trigger the infinite loop.

Inspired by open-telemetry/opentelemetry-dotnet#7078.

Co-authored-by: Copilot <[email protected]>
tarekgh added a commit to dotnet/runtime that referenced this pull request Apr 16, 2026
## Summary

Defense-in-depth change to `CircularBufferBuckets`: raise the minimum
accepted `capacity` from 1 to 2.

## Motivation

`CalculateScaleReduction` (inside `TryIncrement`) can infinite-loop when
`capacity == 1` and the index range spans negative-to-non-negative
values (e.g. `begin = -1`, `end = 0`). In that scenario, right-shifting
converges to a fixed point where `diff` remains `>= capacity` forever.

**This is not reachable in current usage** the only caller
(`Base2ExponentialHistogramAggregator`) already enforces `maxBuckets >=
2` via `MinBuckets`, and there is an existing `Debug.Assert(capacity >=
2)` inside `CalculateScaleReduction`. This change simply hardens
`CircularBufferBuckets` itself so future callers cannot accidentally
trigger the issue.

Inspired by
[open-telemetry/opentelemetry-dotnet#7078](open-telemetry/opentelemetry-dotnet#7078).

## Changes

- **`CircularBufferBuckets.cs`**: Changed constructor guard from
`capacity < 1` to `capacity < 2`, updated error message.
- **`Base2ExponentialHistogramAggregatorTests.cs`**: Added
`CircularBufferBucketsRejectsInvalidCapacity` (Theory with 0, 1, -1) and
`CircularBufferBucketsAcceptsCapacityOfTwo` tests.

Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants