|
| 1 | +--- |
| 2 | +title: "Inside Adobe's OpenTelemetry pipeline: simplicity at scale" |
| 3 | +linkTitle: "Inside Adobe's OpenTelemetry pipeline: simplicity at scale" |
| 4 | +date: 2026-04-08 |
| 5 | +author: >- |
| 6 | + [Johanna Öjeling](https://github.com/johannaojeling) (Grafana Labs), [Juliano |
| 7 | + Costa](https://github.com/julianocosta89) (Datadog), [Tristan |
| 8 | + Sloughter](https://github.com/tsloughter) (community), [Damien |
| 9 | + Mathieu](https://github.com/dmathieu) (Elastic), [Bogdan |
| 10 | + Stancu](https://github.com/bogdan-st) (Adobe) |
| 11 | +sig: Developer Experience SIG |
| 12 | +cSpell:ignore: devex Sloughter Öjeling |
| 13 | +--- |
| 14 | + |
| 15 | +As part of an ongoing series, the Developer Experience SIG interviews |
| 16 | +organizations about their real-world OpenTelemetry Collector deployments to |
| 17 | +share practical lessons with the broader community. This post features Adobe, a |
| 18 | +global software company whose observability team has built an |
| 19 | +OpenTelemetry-based telemetry pipeline designed for simplicity at massive scale, |
| 20 | +with thousands of collectors running per signal type across the company's |
| 21 | +infrastructure. |
| 22 | + |
| 23 | +## Organizational structure |
| 24 | + |
| 25 | +Adobe's central observability team is responsible for providing observability |
| 26 | +infrastructure across the company. However, as |
| 27 | +[Bogdan Stancu](https://github.com/bogdan-st), Senior Software Engineer, |
| 28 | +explained, Adobe's history of acquisitions means the landscape is not fully |
| 29 | +consolidated. Some large product groups have their own dedicated observability |
| 30 | +teams, while the central team serves as the primary provider. |
| 31 | + |
| 32 | +The OpenTelemetry-based pipeline was introduced as a new option alongside |
| 33 | +existing monitoring solutions, designed primarily for new applications and |
| 34 | +deployments. Adoption is voluntary, not mandated. Existing applications with |
| 35 | +established monitoring have not been migrated. |
| 36 | + |
| 37 | +## OpenTelemetry adoption |
| 38 | + |
| 39 | +The decision to adopt OpenTelemetry was driven by alignment between the |
| 40 | +project's capabilities and the team's goals. The observability team needed a |
| 41 | +solution that could serve Adobe's diverse technology landscape, support multiple |
| 42 | +backends, and remain simple for service teams to adopt. |
| 43 | + |
| 44 | +> "It matched everything that we wanted," Bogdan said. |
| 45 | +
|
| 46 | +The [OpenTelemetry Operator](/docs/platforms/kubernetes/operator/), the |
| 47 | +Collector's component model, and community Helm charts provided the building |
| 48 | +blocks for a platform-level observability offering that could scale without |
| 49 | +requiring deep OpenTelemetry expertise from individual service teams. |
| 50 | + |
| 51 | +## Architecture: a three-tier collector pipeline |
| 52 | + |
| 53 | +Adobe's collector architecture follows a three-tier design: a user-facing Helm |
| 54 | +chart containing two collectors, a centralized managed namespace with per-signal |
| 55 | +collector deployments, and the observability backends. |
| 56 | + |
| 57 | + |
| 58 | + |
| 59 | +### Tier 1: the user Helm chart |
| 60 | + |
| 61 | +The observability team provides a Helm chart that service teams deploy into |
| 62 | +their own namespaces. This chart creates two collectors: |
| 63 | + |
| 64 | +**Sidecar Collector (in the application pod)**: Runs alongside the application |
| 65 | +container and is intentionally locked down. Service teams cannot modify its |
| 66 | +configuration. It collects all telemetry: metrics, logs, traces, regardless of |
| 67 | +what the team has chosen to export downstream. The configuration is immutable to |
| 68 | +prevent application restarts caused by configuration changes. |
| 69 | + |
| 70 | +**Deployment Collector (standalone)**: Receives telemetry from the sidecar over |
| 71 | +OTLP and handles routing and export. Unlike the sidecar, this collector _is_ |
| 72 | +configurable through Helm values. The observability team provides sensible |
| 73 | +defaults, but service teams can customize exporters and add new destinations. |
| 74 | +When configuration changes, only the deployment collector restarts. The |
| 75 | +application pod and its sidecar remain untouched. |
| 76 | + |
| 77 | +### Tier 2: the managed namespace |
| 78 | + |
| 79 | +The deployment collectors forward telemetry to a centralized namespace managed |
| 80 | +entirely by the observability team. A key architectural decision here is |
| 81 | +signal-level isolation: the managed namespace runs a separate collector |
| 82 | +deployment for each telemetry type: one for metrics, one for logs, and one for |
| 83 | +traces. |
| 84 | + |
| 85 | +If a backend becomes rate-limited or starts rejecting data for one signal type, |
| 86 | +the others continue flowing uninterrupted. Despite handling thousands of |
| 87 | +collectors' worth of upstream traffic, these managed deployments have generally |
| 88 | +operated at default replica counts without requiring aggressive auto-scaling. |
| 89 | + |
| 90 | +Service teams configure their desired backend through Helm values, which sets an |
| 91 | +HTTP header on OTLP exports. The managed namespace collectors use this header |
| 92 | +with the |
| 93 | +[routing connector](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/6aff35ab5351482a4664f29a7d5428cedcf61a92/connector/routingconnector?from_branch=main) |
| 94 | +to direct telemetry to the correct exporter. |
| 95 | + |
| 96 | +### Tier 3: the observability backends |
| 97 | + |
| 98 | +The managed namespace collectors export telemetry to backend destinations |
| 99 | +managed by the observability team. Multiple backends are supported, and teams |
| 100 | +select their destination through the Helm chart's values file. |
| 101 | + |
| 102 | +## Auto-instrumentation: two lines and it works |
| 103 | + |
| 104 | +Adobe leverages the OpenTelemetry Operator for auto-instrumentation across the |
| 105 | +languages supported by OpenTelemetry. The Operator is deployed to every cluster, |
| 106 | +and service teams enable instrumentation by adding two annotations to their |
| 107 | +Kubernetes deployment manifests: |
| 108 | + |
| 109 | +```yaml |
| 110 | +instrumentation.opentelemetry.io/inject-java: 'true' |
| 111 | +sidecar.opentelemetry.io/inject: 'true' |
| 112 | +``` |
| 113 | +
|
| 114 | +> "People add two lines in their deployment. And it just works," Bogdan said. |
| 115 | +
|
| 116 | +Teams select their language in the Helm values, and the Operator handles the |
| 117 | +rest. While teams are free to add manual SDK instrumentation—the sidecar accepts |
| 118 | +all OTLP data—the observability team's supported path focuses on the |
| 119 | +auto-instrumentation experience. The Operator has handled the scale of managing |
| 120 | +sidecars and auto-instrumentation across the deployment fleet without issues. |
| 121 | +
|
| 122 | +This design philosophy runs through the entire platform: make the default path |
| 123 | +require as little effort as possible, while leaving the door open for advanced |
| 124 | +use cases. |
| 125 | +
|
| 126 | +## Custom distribution and components |
| 127 | +
|
| 128 | +Adobe builds its own OpenTelemetry Collector distribution to include only the |
| 129 | +components they use, avoiding unnecessary dependencies from Contrib. This custom |
| 130 | +distribution is the default in the Helm chart provided to service teams. |
| 131 | +However, teams can manually switch to the Contrib distribution if they need |
| 132 | +components not included in the custom build. |
| 133 | +
|
| 134 | +Adobe also maintains custom components, most notably an extension addressing a |
| 135 | +fundamental challenge in their chained collector architecture. |
| 136 | +
|
| 137 | +### The chain collector problem |
| 138 | +
|
| 139 | +When collectors are chained, error visibility becomes a problem. The OTLP |
| 140 | +transaction between the user's deployment collector and the managed namespace |
| 141 | +collector completes with a 200 response _before_ the managed namespace collector |
| 142 | +attempts to export to the backend. If the backend rejects the data, the error is |
| 143 | +only visible in the managed namespace collector's logs. |
| 144 | +
|
| 145 | +> "The user would just see 200s. Metrics exported, all good," Bogdan explained. |
| 146 | +> "Which we didn't want." |
| 147 | +
|
| 148 | +To address this, Bogdan built a custom extension that acts as a circuit breaker |
| 149 | +for backend authentication. The extension runs in the managed namespace |
| 150 | +collector's receiver, proactively sending mock authentication requests to the |
| 151 | +backend and caching results. If authentication fails, it returns a 401 to the |
| 152 | +upstream collector before the OTLP transaction completes, propagating the error |
| 153 | +back to where users can see it. |
| 154 | +
|
| 155 | +Building this extension was one of Bogdan's first Go projects. The experience of |
| 156 | +trying to contribute upstream sparked deeper involvement with the OpenTelemetry |
| 157 | +community. Looking ahead, Bogdan would welcome a more general back-pressure |
| 158 | +mechanism in the Collector, where exporter failures propagate upstream through |
| 159 | +chained collectors. |
| 160 | +
|
| 161 | +## Deployment and lifecycle management |
| 162 | +
|
| 163 | +The observability team upgrades their collector distribution and the |
| 164 | +OpenTelemetry Operator on a quarterly cadence. Upgrade issues have been rare. |
| 165 | +
|
| 166 | +When the Helm chart is updated, service teams pick up the new collector version |
| 167 | +on their next deployment. However, the observability team has encountered a |
| 168 | +compatibility challenge between the Operator and older collector versions: when |
| 169 | +the Operator is upgraded, it can modify the `OpenTelemetryCollector` custom |
| 170 | +resource to align with new configuration expectations. If a service team is |
| 171 | +running a significantly older collector version, these changes can be |
| 172 | +incompatible, preventing collectors from starting. |
| 173 | + |
| 174 | +The resolution is straightforward—upgrading the collector fixes the issue—but it |
| 175 | +has caused confusion for teams whose collectors suddenly break without any |
| 176 | +changes on their end. |
| 177 | + |
| 178 | +### Navigating component deprecations |
| 179 | + |
| 180 | +Adobe's deployment has also navigated component deprecations as OpenTelemetry |
| 181 | +evolves. The team originally used the routing processor to direct telemetry to |
| 182 | +different backends based on HTTP headers, but migrated to the routing connector |
| 183 | +when the processor was deprecated. |
| 184 | + |
| 185 | +While the migration required work, the team views this as an expected part of |
| 186 | +working with a rapidly evolving project. |
| 187 | + |
| 188 | +> "This is a risk we knew about, the whole OpenTelemetry landscape is changing |
| 189 | +> constantly and the benefits outweigh the 'issues' if you can call fast |
| 190 | +> development an issue," Bogdan explained. |
| 191 | + |
| 192 | +## What works well |
| 193 | + |
| 194 | +The overall experience has been positive. The Collector's component model, the |
| 195 | +auto-instrumentation experience via the Operator, and the Helm chart-based |
| 196 | +deployment model have all worked reliably. The plug-and-play nature of the |
| 197 | +platform, where teams go from zero to full observability with minimal |
| 198 | +configuration, has been positively received by adopting teams. |
| 199 | + |
| 200 | +## Advice for others |
| 201 | + |
| 202 | +Based on Adobe's experience building a platform-level observability pipeline: |
| 203 | + |
| 204 | +- **Treat OpenTelemetry as a platform to build on**: Don't expect it to solve |
| 205 | + all your problems out of the box. It's designed to be extended and customized |
| 206 | + for your specific needs. |
| 207 | +- **Don't be afraid to build custom components**: The Collector's architecture |
| 208 | + makes it straightforward to build extensions tailored to your needs. |
| 209 | +- **Design for user simplicity**: Make the default path require minimal effort. |
| 210 | + The teams consuming your platform are not observability experts. |
| 211 | +- **Plan for error visibility in chained collectors**: OTLP transaction success |
| 212 | + does not guarantee end-to-end delivery. Consider how errors will surface to |
| 213 | + users. |
| 214 | + |
| 215 | +## What's next |
| 216 | + |
| 217 | +Adobe's story illustrates how a central observability team can offer a scalable, |
| 218 | +self-service OpenTelemetry pipeline across a large and diverse organization. By |
| 219 | +combining the Operator, Helm charts, sidecars, and per-signal collector |
| 220 | +deployments, they've created a platform where service teams get observability |
| 221 | +with minimal effort, while the observability team retains control over |
| 222 | +centralized infrastructure. |
| 223 | + |
| 224 | +We'll continue sharing stories like this one, highlighting how different |
| 225 | +organizations tackle the challenges of running OpenTelemetry in production. |
| 226 | + |
| 227 | +Have your own OpenTelemetry story to share? Join us in the CNCF |
| 228 | +[#otel-devex](https://cloud-native.slack.com/archives/C01S42U83B2) Slack |
| 229 | +channel. We'd love to hear how you're using OpenTelemetry and how we can keep |
| 230 | +improving the developer experience together. |
0 commit comments