Skip to content

[OpenTelemetry] Optimize trace sampling#7057

Merged
martincostello merged 5 commits intoopen-telemetry:mainfrom
martincostello:optimize-trace-sampling
Apr 18, 2026
Merged

[OpenTelemetry] Optimize trace sampling#7057
martincostello merged 5 commits intoopen-telemetry:mainfrom
martincostello:optimize-trace-sampling

Conversation

@martincostello
Copy link
Copy Markdown
Member

@martincostello martincostello commented Apr 10, 2026

Changes

While looking at some profiles for an OTel instrumented application of mine, I noticed that TraceProviderSdk.ComputeActivitySamplingResult() came up in the top 10 OTel-related samples.

This PR adopts a Copilot-authored suggestion to avoid allocating an array in SamplingResult when there's no attributes so that TraceProviderSdk can skip enumeration.

Benchmark results

Copilot Summary

This PR is faster in every case, with allocation improvements in one case and unchanged allocations in the others:

Method main Mean pr-7057 Mean Time delta main Alloc pr-7057 Alloc Alloc delta
NoAttributes 122.1 ns 110.2 ns -9.8% 328 B 328 B 0.0%
WithAttributeArray 181.9 ns 166.5 ns -8.5% 672 B 640 B -4.8%
WithAttributeList 277.0 ns 275.5 ns -0.5% 752 B 752 B 0.0%
Drop 104.7 ns 101.0 ns -3.5% 328 B 328 B 0.0%
ParentBasedSampled 114.3 ns 109.5 ns -4.2% 328 B 328 B 0.0%

This PR improves latency by about 0.5% to 9.8% across all cases. Allocation is mostly unchanged, with a small improvement in WithAttributeArray (672 B -> 640 B).

Expand to see

main + the new benchmarks

BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8117/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.201
  [Host]     : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
NoAttributes 122.1 ns 1.09 ns 1.02 ns 1.00 0.01 0.0260 328 B 1.00
WithAttributeArray 181.9 ns 1.75 ns 1.63 ns 1.49 0.02 0.0534 672 B 2.05
WithAttributeList 277.0 ns 2.85 ns 2.53 ns 2.27 0.03 0.0596 752 B 2.29
Drop 104.7 ns 0.74 ns 0.69 ns 0.86 0.01 0.0261 328 B 1.00
ParentBasedSampled 114.3 ns 0.56 ns 0.47 ns 0.94 0.01 0.0261 328 B 1.00

This PR

BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8117/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.201
  [Host]     : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
NoAttributes 110.2 ns 0.65 ns 0.61 ns 1.00 0.01 0.0261 328 B 1.00
WithAttributeArray 166.5 ns 1.11 ns 0.99 ns 1.51 0.01 0.0508 640 B 1.95
WithAttributeList 275.5 ns 1.69 ns 1.50 ns 2.50 0.02 0.0596 752 B 2.29
Drop 101.0 ns 0.75 ns 0.70 ns 0.92 0.01 0.0261 328 B 1.00
ParentBasedSampled 109.5 ns 2.18 ns 2.04 ns 0.99 0.02 0.0261 328 B 1.00

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

@github-actions github-actions bot added pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package perf Performance related labels Apr 10, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.80%. Comparing base (a2b6372) to head (b800efc).
⚠️ Report is 35 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7057      +/-   ##
==========================================
+ Coverage   88.54%   88.80%   +0.25%     
==========================================
  Files         270      270              
  Lines       12884    13110     +226     
==========================================
+ Hits        11408    11642     +234     
+ Misses       1476     1468       -8     
Flag Coverage Δ
unittests-Project-Experimental 88.52% <100.00%> (+0.07%) ⬆️
unittests-Project-Stable 88.60% <100.00%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/OpenTelemetry/Trace/Sampler/SamplingResult.cs 100.00% <100.00%> (ø)
src/OpenTelemetry/Trace/TracerProviderSdk.cs 99.35% <100.00%> (+<0.01%) ⬆️

... and 13 files with indirect coverage changes

@martincostello martincostello marked this pull request as ready for review April 10, 2026 14:29
@martincostello martincostello requested a review from a team as a code owner April 10, 2026 14:29
Copilot AI review requested due to automatic review settings April 10, 2026 14:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes trace sampling in the OpenTelemetry .NET SDK by avoiding unnecessary attribute enumeration/allocations on the hot path when applying sampler-provided tags.

Changes:

  • Update TracerProviderSdk sampling/tag application to skip attribute iteration when none are provided, and add an array fast-path for tag application.
  • Adjust SamplingResult to store sampler attributes as nullable internally (enabling the skip in TracerProviderSdk) while preserving the public Attributes API.
  • Add BenchmarkDotNet benchmarks to measure sampling result/tagging scenarios (no attributes, array-backed attributes, enumerable-backed attributes, drop, and parent-based).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
test/Benchmarks/Trace/SamplingResultBenchmarks.cs Adds benchmarks covering common sampling/tagging scenarios to validate perf impact.
src/OpenTelemetry/Trace/TracerProviderSdk.cs Skips attribute processing when none are provided; adds array fast-path for applying sampling tags.
src/OpenTelemetry/Trace/Sampler/SamplingResult.cs Stores attributes internally as nullable and exposes an internal accessor used to avoid iteration on the no-attributes path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/OpenTelemetry/Trace/Sampler/SamplingResult.cs
Comment thread src/OpenTelemetry/Trace/Sampler/SamplingResult.cs Outdated
@martincostello martincostello marked this pull request as draft April 13, 2026 13:07
@martincostello

This comment was marked as resolved.

Optimize `SamplingResult` for cases when there are no attributes to avoid allocating an enumerator and add benchmarks.
The benchmark doesn't call the method directly, so it can stay private.
Remove inaccurate comment.
@martincostello martincostello force-pushed the optimize-trace-sampling branch from 0ab86a9 to de0e117 Compare April 13, 2026 14:26
@martincostello martincostello marked this pull request as ready for review April 13, 2026 15:25
Comment thread src/OpenTelemetry/Trace/TracerProviderSdk.cs
if (activitySamplingResult > ActivitySamplingResult.PropagationData)
{
foreach (var att in samplingResult.Attributes)
if (samplingResult.AttributesOrNull is { } attributes)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ nice elimination of the alloc here!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • would we get same benefit by comparing Attributes to empty array (Array.Empty<KeyValuePair<string, object>>()) )first before attempting to iterate? That'd be simpler from code complexity, even though it may not be future proof..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure - but the compiler can use special private types for collection expressions so "not null" is probably best than assuming it's the exact object returned by Array.Empty<T>().

@@ -0,0 +1,134 @@
// Copyright The OpenTelemetry Authors
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we put in the same SamplerBenchmarks?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't quite follow this comment.

Do you mean to just move the new benchmarks into SamplerBenchmarks?

Comment thread src/OpenTelemetry/Trace/TracerProviderSdk.cs Outdated
Copy link
Copy Markdown
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself is fine. I left couple of suggestions to reduce complexity. This is a crucial code path, so perf gains are always worth fighting for, but I also think we should keep complexity minimal, if feasible.

No blockers. Suggest getting additional maintainer review given this is in such crucial path.

Thanks for looking for performance improvements! Very excited for this (and other PRs towards perf)

Comment thread src/OpenTelemetry/Trace/TracerProviderSdk.cs Outdated
Copy link
Copy Markdown
Member

@Kielek Kielek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree with @cijothomas comments.

Remove special casing for arrays.
@martincostello
Copy link
Copy Markdown
Member Author

With the latest changes:

Copilot Summary

Overall: This PR is faster on 7/8 common benchmarks. The only regression is SamplerNotModifyingTraceState at 1.03x / +2.9% slower. Allocations are unchanged
everywhere at 1.00x / 0%.

PR/main ratios are shown below; for duration, < 1.00x is faster.

Suite Benchmark Duration (main -> PR) Duration ratio / change Alloc (main -> PR) Alloc ratio / change
SamplerBenchmarks SamplerNotModifyingTraceState 117.6 ns -> 121.0 ns 1.03x / +2.9% 328 B -> 328 B 1.00x / 0.0%
SamplerBenchmarks SamplerModifyingTraceState 125.1 ns -> 120.3 ns 0.96x / -3.8% 328 B -> 328 B 1.00x / 0.0%
SamplerBenchmarks SamplerAppendingTraceState 153.7 ns -> 129.6 ns 0.84x / -15.7% 384 B -> 384 B 1.00x / 0.0%
SamplingResultBenchmarks NoAttributes 121.4 ns -> 116.5 ns 0.96x / -4.0% 328 B -> 328 B 1.00x / 0.0%
SamplingResultBenchmarks WithAttributeArray 177.7 ns -> 175.3 ns 0.99x / -1.4% 672 B -> 672 B 1.00x / 0.0%
SamplingResultBenchmarks WithAttributeList 298.1 ns -> 282.9 ns 0.95x / -5.1% 752 B -> 752 B 1.00x / 0.0%
SamplingResultBenchmarks Drop 104.3 ns -> 101.2 ns 0.97x / -3.0% 328 B -> 328 B 1.00x / 0.0%
SamplingResultBenchmarks ParentBasedSampled 141.6 ns -> 111.4 ns 0.79x / -21.3% 328 B -> 328 B 1.00x / 0.0%

Suite-level summary

  • SamplerBenchmarks: geometric mean duration ratio 0.94x (about 5.9% faster overall), allocations 1.00x / 0%.
  • SamplingResultBenchmarks: geometric mean duration ratio 0.93x (about 7.3% faster overall), allocations 1.00x / 0%.
Expand to see

main + the new benchmarks

BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8117/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.201
  [Host]     : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
Method Mean Error StdDev Gen0 Allocated
SamplerNotModifyingTraceState 117.6 ns 0.60 ns 0.53 ns 0.0261 328 B
SamplerModifyingTraceState 125.1 ns 0.36 ns 0.32 ns 0.0260 328 B
SamplerAppendingTraceState 153.7 ns 1.04 ns 0.92 ns 0.0305 384 B
BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8117/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.201
  [Host]     : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
Method Mean Error StdDev Median Ratio RatioSD Gen0 Allocated Alloc Ratio
NoAttributes 121.4 ns 1.01 ns 0.90 ns 121.7 ns 1.00 0.01 0.0260 328 B 1.00
WithAttributeArray 177.7 ns 3.16 ns 5.86 ns 174.8 ns 1.46 0.05 0.0534 672 B 2.05
WithAttributeList 298.1 ns 1.81 ns 1.69 ns 298.8 ns 2.46 0.02 0.0596 752 B 2.29
Drop 104.3 ns 0.41 ns 0.39 ns 104.2 ns 0.86 0.01 0.0261 328 B 1.00
ParentBasedSampled 141.6 ns 0.94 ns 0.88 ns 141.5 ns 1.17 0.01 0.0260 328 B 1.00

This PR

BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8117/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.201
  [Host]     : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
Method Mean Error StdDev Gen0 Allocated
SamplerNotModifyingTraceState 121.0 ns 1.20 ns 1.06 ns 0.0260 328 B
SamplerModifyingTraceState 120.3 ns 0.89 ns 0.79 ns 0.0260 328 B
SamplerAppendingTraceState 129.6 ns 0.47 ns 0.44 ns 0.0305 384 B
BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8117/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.201
  [Host]     : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
NoAttributes 116.5 ns 1.92 ns 1.80 ns 1.00 0.02 0.0261 328 B 1.00
WithAttributeArray 175.3 ns 1.60 ns 1.50 ns 1.51 0.03 0.0534 672 B 2.05
WithAttributeList 282.9 ns 2.91 ns 2.72 ns 2.43 0.04 0.0596 752 B 2.29
Drop 101.2 ns 0.87 ns 0.77 ns 0.87 0.01 0.0261 328 B 1.00
ParentBasedSampled 111.4 ns 0.97 ns 0.90 ns 0.96 0.02 0.0261 328 B 1.00

@martincostello
Copy link
Copy Markdown
Member Author

Not special casing the types (rather than using the IEnumerable<T>) removes any allocation gains.

@martincostello martincostello added this pull request to the merge queue Apr 18, 2026
Merged via the queue into open-telemetry:main with commit 810fba4 Apr 18, 2026
63 checks passed
@martincostello martincostello deleted the optimize-trace-sampling branch April 18, 2026 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf Performance related pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants