[blog] Add Adobe OTel Collector usage story (#9411)

johannaojeling · tiffany76 · web-flow · commit ac126f666dbd · 2026-04-08T14:29:48.000Z
Co-authored-by: Tiffany Hrabusa &lt;30397949+tiffany76@users.noreply.github.com&gt;
diff --git a/content/en/blog/2026/devex-adobe/adobe-architecture.png b/content/en/blog/2026/devex-adobe/adobe-architecture.png
diff --git a/content/en/blog/2026/devex-adobe/index.md b/content/en/blog/2026/devex-adobe/index.md
@@ -0,0 +1,230 @@
+---
+title: "Inside Adobe's OpenTelemetry pipeline: simplicity at scale"
+linkTitle: "Inside Adobe's OpenTelemetry pipeline: simplicity at scale"
+date: 2026-04-08
+author: >-
+  [Johanna Öjeling](https://github.com/johannaojeling) (Grafana Labs), [Juliano
+  Costa](https://github.com/julianocosta89) (Datadog), [Tristan
+  Sloughter](https://github.com/tsloughter) (community), [Damien
+  Mathieu](https://github.com/dmathieu) (Elastic), [Bogdan
+  Stancu](https://github.com/bogdan-st) (Adobe)
+sig: Developer Experience SIG
+cSpell:ignore: devex Sloughter Öjeling
+---
+
+As part of an ongoing series, the Developer Experience SIG interviews
+organizations about their real-world OpenTelemetry Collector deployments to
+share practical lessons with the broader community. This post features Adobe, a
+global software company whose observability team has built an
+OpenTelemetry-based telemetry pipeline designed for simplicity at massive scale,
+with thousands of collectors running per signal type across the company's
+infrastructure.
+
+## Organizational structure
+
+Adobe's central observability team is responsible for providing observability
+infrastructure across the company. However, as
+[Bogdan Stancu](https://github.com/bogdan-st), Senior Software Engineer,
+explained, Adobe's history of acquisitions means the landscape is not fully
+consolidated. Some large product groups have their own dedicated observability
+teams, while the central team serves as the primary provider.
+
+The OpenTelemetry-based pipeline was introduced as a new option alongside
+existing monitoring solutions, designed primarily for new applications and
+deployments. Adoption is voluntary, not mandated. Existing applications with
+established monitoring have not been migrated.
+
+## OpenTelemetry adoption
+
+The decision to adopt OpenTelemetry was driven by alignment between the
+project's capabilities and the team's goals. The observability team needed a
+solution that could serve Adobe's diverse technology landscape, support multiple
+backends, and remain simple for service teams to adopt.
+
+> "It matched everything that we wanted," Bogdan said.
+
+The [OpenTelemetry Operator](/docs/platforms/kubernetes/operator/), the
+Collector's component model, and community Helm charts provided the building
+blocks for a platform-level observability offering that could scale without
+requiring deep OpenTelemetry expertise from individual service teams.
+
+## Architecture: a three-tier collector pipeline
+
+Adobe's collector architecture follows a three-tier design: a user-facing Helm
+chart containing two collectors, a centralized managed namespace with per-signal
+collector deployments, and the observability backends.
+
+![Adobe architecture diagram](adobe-architecture.png)
+
+### Tier 1: the user Helm chart
+
+The observability team provides a Helm chart that service teams deploy into
+their own namespaces. This chart creates two collectors:
+
+**Sidecar Collector (in the application pod)**: Runs alongside the application
+container and is intentionally locked down. Service teams cannot modify its
+configuration. It collects all telemetry: metrics, logs, traces, regardless of
+what the team has chosen to export downstream. The configuration is immutable to
+prevent application restarts caused by configuration changes.
+
+**Deployment Collector (standalone)**: Receives telemetry from the sidecar over
+OTLP and handles routing and export. Unlike the sidecar, this collector _is_
+configurable through Helm values. The observability team provides sensible
+defaults, but service teams can customize exporters and add new destinations.
+When configuration changes, only the deployment collector restarts. The
+application pod and its sidecar remain untouched.
+
+### Tier 2: the managed namespace
+
+The deployment collectors forward telemetry to a centralized namespace managed
+entirely by the observability team. A key architectural decision here is
+signal-level isolation: the managed namespace runs a separate collector
+deployment for each telemetry type: one for metrics, one for logs, and one for
+traces.
+
+If a backend becomes rate-limited or starts rejecting data for one signal type,
+the others continue flowing uninterrupted. Despite handling thousands of
+collectors' worth of upstream traffic, these managed deployments have generally
+operated at default replica counts without requiring aggressive auto-scaling.
+
+Service teams configure their desired backend through Helm values, which sets an
+HTTP header on OTLP exports. The managed namespace collectors use this header
+with the
+[routing connector](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/6aff35ab5351482a4664f29a7d5428cedcf61a92/connector/routingconnector?from_branch=main)
+to direct telemetry to the correct exporter.
+
+### Tier 3: the observability backends
+
+The managed namespace collectors export telemetry to backend destinations
+managed by the observability team. Multiple backends are supported, and teams
+select their destination through the Helm chart's values file.
+
+## Auto-instrumentation: two lines and it works
+
+Adobe leverages the OpenTelemetry Operator for auto-instrumentation across the
+languages supported by OpenTelemetry. The Operator is deployed to every cluster,
+and service teams enable instrumentation by adding two annotations to their
+Kubernetes deployment manifests:
+
+```yaml
+instrumentation.opentelemetry.io/inject-java: 'true'
+sidecar.opentelemetry.io/inject: 'true'
+```
+
+> "People add two lines in their deployment. And it just works," Bogdan said.
+
+Teams select their language in the Helm values, and the Operator handles the
+rest. While teams are free to add manual SDK instrumentation—the sidecar accepts
+all OTLP data—the observability team's supported path focuses on the
+auto-instrumentation experience. The Operator has handled the scale of managing
+sidecars and auto-instrumentation across the deployment fleet without issues.
+
+This design philosophy runs through the entire platform: make the default path
+require as little effort as possible, while leaving the door open for advanced
+use cases.
+
+## Custom distribution and components
+
+Adobe builds its own OpenTelemetry Collector distribution to include only the
+components they use, avoiding unnecessary dependencies from Contrib. This custom
+distribution is the default in the Helm chart provided to service teams.
+However, teams can manually switch to the Contrib distribution if they need
+components not included in the custom build.
+
+Adobe also maintains custom components, most notably an extension addressing a
+fundamental challenge in their chained collector architecture.
+
+### The chain collector problem
+
+When collectors are chained, error visibility becomes a problem. The OTLP
+transaction between the user's deployment collector and the managed namespace
+collector completes with a 200 response _before_ the managed namespace collector
+attempts to export to the backend. If the backend rejects the data, the error is
+only visible in the managed namespace collector's logs.
+
+> "The user would just see 200s. Metrics exported, all good," Bogdan explained.
+> "Which we didn't want."
+
+To address this, Bogdan built a custom extension that acts as a circuit breaker
+for backend authentication. The extension runs in the managed namespace
+collector's receiver, proactively sending mock authentication requests to the
+backend and caching results. If authentication fails, it returns a 401 to the
+upstream collector before the OTLP transaction completes, propagating the error
+back to where users can see it.
+
+Building this extension was one of Bogdan's first Go projects. The experience of
+trying to contribute upstream sparked deeper involvement with the OpenTelemetry
+community. Looking ahead, Bogdan would welcome a more general back-pressure
+mechanism in the Collector, where exporter failures propagate upstream through
+chained collectors.
+
+## Deployment and lifecycle management
+
+The observability team upgrades their collector distribution and the
+OpenTelemetry Operator on a quarterly cadence. Upgrade issues have been rare.
+
+When the Helm chart is updated, service teams pick up the new collector version
+on their next deployment. However, the observability team has encountered a
+compatibility challenge between the Operator and older collector versions: when
+the Operator is upgraded, it can modify the `OpenTelemetryCollector` custom
+resource to align with new configuration expectations. If a service team is
+running a significantly older collector version, these changes can be
+incompatible, preventing collectors from starting.
+
+The resolution is straightforward—upgrading the collector fixes the issue—but it
+has caused confusion for teams whose collectors suddenly break without any
+changes on their end.
+
+### Navigating component deprecations
+
+Adobe's deployment has also navigated component deprecations as OpenTelemetry
+evolves. The team originally used the routing processor to direct telemetry to
+different backends based on HTTP headers, but migrated to the routing connector
+when the processor was deprecated.
+
+While the migration required work, the team views this as an expected part of
+working with a rapidly evolving project.
+
+> "This is a risk we knew about, the whole OpenTelemetry landscape is changing
+> constantly and the benefits outweigh the 'issues' if you can call fast
+> development an issue," Bogdan explained.
+
+## What works well
+
+The overall experience has been positive. The Collector's component model, the
+auto-instrumentation experience via the Operator, and the Helm chart-based
+deployment model have all worked reliably. The plug-and-play nature of the
+platform, where teams go from zero to full observability with minimal
+configuration, has been positively received by adopting teams.
+
+## Advice for others
+
+Based on Adobe's experience building a platform-level observability pipeline:
+
+- **Treat OpenTelemetry as a platform to build on**: Don't expect it to solve
+  all your problems out of the box. It's designed to be extended and customized
+  for your specific needs.
+- **Don't be afraid to build custom components**: The Collector's architecture
+  makes it straightforward to build extensions tailored to your needs.
+- **Design for user simplicity**: Make the default path require minimal effort.
+  The teams consuming your platform are not observability experts.
+- **Plan for error visibility in chained collectors**: OTLP transaction success
+  does not guarantee end-to-end delivery. Consider how errors will surface to
+  users.
+
+## What's next
+
+Adobe's story illustrates how a central observability team can offer a scalable,
+self-service OpenTelemetry pipeline across a large and diverse organization. By
+combining the Operator, Helm charts, sidecars, and per-signal collector
+deployments, they've created a platform where service teams get observability
+with minimal effort, while the observability team retains control over
+centralized infrastructure.
+
+We'll continue sharing stories like this one, highlighting how different
+organizations tackle the challenges of running OpenTelemetry in production.
+
+Have your own OpenTelemetry story to share? Join us in the CNCF
+[#otel-devex](https://cloud-native.slack.com/archives/C01S42U83B2) Slack
+channel. We'd love to hear how you're using OpenTelemetry and how we can keep
+improving the developer experience together.
diff --git a/static/refcache.json b/static/refcache.json
@@ -8023,6 +8023,10 @@
     "StatusCode": 206,
     "LastSeen": "2026-03-24T09:53:56.972186001Z"
   },
+  "https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/6aff35ab5351482a4664f29a7d5428cedcf61a92/connector/routingconnector?from_branch=main": {
+    "StatusCode": 206,
+    "LastSeen": "2026-03-16T10:06:04.86792+01:00"
+  },
   "https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/72087f655403778da46f4168dca2433fa0775098/receiver/filelogreceiver?from_branch=main": {
     "StatusCode": 206,
     "LastSeen": "2026-03-10T09:52:32.45408359Z"