[OpenTelemetry] Optimize trace sampling#7057
[OpenTelemetry] Optimize trace sampling#7057martincostello merged 5 commits intoopen-telemetry:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7057 +/- ##
==========================================
+ Coverage 88.54% 88.80% +0.25%
==========================================
Files 270 270
Lines 12884 13110 +226
==========================================
+ Hits 11408 11642 +234
+ Misses 1476 1468 -8
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Pull request overview
This PR optimizes trace sampling in the OpenTelemetry .NET SDK by avoiding unnecessary attribute enumeration/allocations on the hot path when applying sampler-provided tags.
Changes:
- Update
TracerProviderSdksampling/tag application to skip attribute iteration when none are provided, and add an array fast-path for tag application. - Adjust
SamplingResultto store sampler attributes as nullable internally (enabling the skip inTracerProviderSdk) while preserving the publicAttributesAPI. - Add BenchmarkDotNet benchmarks to measure sampling result/tagging scenarios (no attributes, array-backed attributes, enumerable-backed attributes, drop, and parent-based).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| test/Benchmarks/Trace/SamplingResultBenchmarks.cs | Adds benchmarks covering common sampling/tagging scenarios to validate perf impact. |
| src/OpenTelemetry/Trace/TracerProviderSdk.cs | Skips attribute processing when none are provided; adds array fast-path for applying sampling tags. |
| src/OpenTelemetry/Trace/Sampler/SamplingResult.cs | Stores attributes internally as nullable and exposes an internal accessor used to avoid iteration on the no-attributes path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This comment was marked as resolved.
This comment was marked as resolved.
Optimize `SamplingResult` for cases when there are no attributes to avoid allocating an enumerator and add benchmarks.
Use normal dashes.
The benchmark doesn't call the method directly, so it can stay private.
Remove inaccurate comment.
0ab86a9 to
de0e117
Compare
| if (activitySamplingResult > ActivitySamplingResult.PropagationData) | ||
| { | ||
| foreach (var att in samplingResult.Attributes) | ||
| if (samplingResult.AttributesOrNull is { } attributes) |
There was a problem hiding this comment.
❤️ nice elimination of the alloc here!
There was a problem hiding this comment.
- would we get same benefit by comparing Attributes to empty array (
Array.Empty<KeyValuePair<string, object>>()))first before attempting to iterate? That'd be simpler from code complexity, even though it may not be future proof..
There was a problem hiding this comment.
Not sure - but the compiler can use special private types for collection expressions so "not null" is probably best than assuming it's the exact object returned by Array.Empty<T>().
| @@ -0,0 +1,134 @@ | |||
| // Copyright The OpenTelemetry Authors | |||
There was a problem hiding this comment.
nit: could we put in the same SamplerBenchmarks?
There was a problem hiding this comment.
Sorry, I don't quite follow this comment.
Do you mean to just move the new benchmarks into SamplerBenchmarks?
cijothomas
left a comment
There was a problem hiding this comment.
The change itself is fine. I left couple of suggestions to reduce complexity. This is a crucial code path, so perf gains are always worth fighting for, but I also think we should keep complexity minimal, if feasible.
No blockers. Suggest getting additional maintainer review given this is in such crucial path.
Thanks for looking for performance improvements! Very excited for this (and other PRs towards perf)
Kielek
left a comment
There was a problem hiding this comment.
I fully agree with @cijothomas comments.
Remove special casing for arrays.
|
With the latest changes: Copilot SummaryOverall: This PR is faster on 7/8 common benchmarks. The only regression is SamplerNotModifyingTraceState at 1.03x / +2.9% slower. Allocations are unchanged PR/main ratios are shown below; for duration, < 1.00x is faster.
Suite-level summary
Expand to see
|
| Method | Mean | Error | StdDev | Gen0 | Allocated |
|---|---|---|---|---|---|
| SamplerNotModifyingTraceState | 117.6 ns | 0.60 ns | 0.53 ns | 0.0261 | 328 B |
| SamplerModifyingTraceState | 125.1 ns | 0.36 ns | 0.32 ns | 0.0260 | 328 B |
| SamplerAppendingTraceState | 153.7 ns | 1.04 ns | 0.92 ns | 0.0305 | 384 B |
BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8117/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.201
[Host] : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
DefaultJob : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
| Method | Mean | Error | StdDev | Median | Ratio | RatioSD | Gen0 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|
| NoAttributes | 121.4 ns | 1.01 ns | 0.90 ns | 121.7 ns | 1.00 | 0.01 | 0.0260 | 328 B | 1.00 |
| WithAttributeArray | 177.7 ns | 3.16 ns | 5.86 ns | 174.8 ns | 1.46 | 0.05 | 0.0534 | 672 B | 2.05 |
| WithAttributeList | 298.1 ns | 1.81 ns | 1.69 ns | 298.8 ns | 2.46 | 0.02 | 0.0596 | 752 B | 2.29 |
| Drop | 104.3 ns | 0.41 ns | 0.39 ns | 104.2 ns | 0.86 | 0.01 | 0.0261 | 328 B | 1.00 |
| ParentBasedSampled | 141.6 ns | 0.94 ns | 0.88 ns | 141.5 ns | 1.17 | 0.01 | 0.0260 | 328 B | 1.00 |
This PR
BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8117/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.201
[Host] : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
DefaultJob : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
| Method | Mean | Error | StdDev | Gen0 | Allocated |
|---|---|---|---|---|---|
| SamplerNotModifyingTraceState | 121.0 ns | 1.20 ns | 1.06 ns | 0.0260 | 328 B |
| SamplerModifyingTraceState | 120.3 ns | 0.89 ns | 0.79 ns | 0.0260 | 328 B |
| SamplerAppendingTraceState | 129.6 ns | 0.47 ns | 0.44 ns | 0.0305 | 384 B |
BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8117/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.201
[Host] : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
DefaultJob : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|
| NoAttributes | 116.5 ns | 1.92 ns | 1.80 ns | 1.00 | 0.02 | 0.0261 | 328 B | 1.00 |
| WithAttributeArray | 175.3 ns | 1.60 ns | 1.50 ns | 1.51 | 0.03 | 0.0534 | 672 B | 2.05 |
| WithAttributeList | 282.9 ns | 2.91 ns | 2.72 ns | 2.43 | 0.04 | 0.0596 | 752 B | 2.29 |
| Drop | 101.2 ns | 0.87 ns | 0.77 ns | 0.87 | 0.01 | 0.0261 | 328 B | 1.00 |
| ParentBasedSampled | 111.4 ns | 0.97 ns | 0.90 ns | 0.96 | 0.02 | 0.0261 | 328 B | 1.00 |
|
Not special casing the types (rather than using the |
Changes
While looking at some profiles for an OTel instrumented application of mine, I noticed that
TraceProviderSdk.ComputeActivitySamplingResult()came up in the top 10 OTel-related samples.This PR adopts a Copilot-authored suggestion to avoid allocating an array in
SamplingResultwhen there's no attributes so thatTraceProviderSdkcan skip enumeration.Benchmark results
Copilot Summary
This PR is faster in every case, with allocation improvements in one case and unchanged allocations in the others:
mainMeanpr-7057MeanmainAllocpr-7057AllocThis PR improves latency by about 0.5% to 9.8% across all cases. Allocation is mostly unchanged, with a small improvement in
WithAttributeArray(672 B -> 640 B).Expand to see
main+ the new benchmarksThis PR
Merge requirement checklist
Unit tests added/updatedAppropriateCHANGELOG.mdfiles updated for non-trivial changesChanges in public API reviewed (if applicable)