Skip to content

Commit 08e603f

Browse files
committed
Merge remote-tracking branch 'origin/main' into otelbot/spec-integration-v1.55.0-dev
2 parents c5f9b16 + 7a15512 commit 08e603f

26 files changed

+1310
-32
lines changed

.cspell/en-words.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ emailservice
3838
EMEA
3939
erlang
4040
errorf
41+
extensionless
4142
featureflagservice
4243
flagd
4344
frauddetectionservice
79.7 KB
Loading
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
---
2+
title: "Inside Adobe's OpenTelemetry pipeline: simplicity at scale"
3+
linkTitle: "Inside Adobe's OpenTelemetry pipeline: simplicity at scale"
4+
date: 2026-04-08
5+
author: >-
6+
[Johanna Öjeling](https://github.com/johannaojeling) (Grafana Labs), [Juliano
7+
Costa](https://github.com/julianocosta89) (Datadog), [Tristan
8+
Sloughter](https://github.com/tsloughter) (community), [Damien
9+
Mathieu](https://github.com/dmathieu) (Elastic), [Bogdan
10+
Stancu](https://github.com/bogdan-st) (Adobe)
11+
sig: Developer Experience SIG
12+
cSpell:ignore: devex Sloughter Öjeling
13+
---
14+
15+
As part of an ongoing series, the Developer Experience SIG interviews
16+
organizations about their real-world OpenTelemetry Collector deployments to
17+
share practical lessons with the broader community. This post features Adobe, a
18+
global software company whose observability team has built an
19+
OpenTelemetry-based telemetry pipeline designed for simplicity at massive scale,
20+
with thousands of collectors running per signal type across the company's
21+
infrastructure.
22+
23+
## Organizational structure
24+
25+
Adobe's central observability team is responsible for providing observability
26+
infrastructure across the company. However, as
27+
[Bogdan Stancu](https://github.com/bogdan-st), Senior Software Engineer,
28+
explained, Adobe's history of acquisitions means the landscape is not fully
29+
consolidated. Some large product groups have their own dedicated observability
30+
teams, while the central team serves as the primary provider.
31+
32+
The OpenTelemetry-based pipeline was introduced as a new option alongside
33+
existing monitoring solutions, designed primarily for new applications and
34+
deployments. Adoption is voluntary, not mandated. Existing applications with
35+
established monitoring have not been migrated.
36+
37+
## OpenTelemetry adoption
38+
39+
The decision to adopt OpenTelemetry was driven by alignment between the
40+
project's capabilities and the team's goals. The observability team needed a
41+
solution that could serve Adobe's diverse technology landscape, support multiple
42+
backends, and remain simple for service teams to adopt.
43+
44+
> "It matched everything that we wanted," Bogdan said.
45+
46+
The [OpenTelemetry Operator](/docs/platforms/kubernetes/operator/), the
47+
Collector's component model, and community Helm charts provided the building
48+
blocks for a platform-level observability offering that could scale without
49+
requiring deep OpenTelemetry expertise from individual service teams.
50+
51+
## Architecture: a three-tier collector pipeline
52+
53+
Adobe's collector architecture follows a three-tier design: a user-facing Helm
54+
chart containing two collectors, a centralized managed namespace with per-signal
55+
collector deployments, and the observability backends.
56+
57+
![Adobe architecture diagram](adobe-architecture.png)
58+
59+
### Tier 1: the user Helm chart
60+
61+
The observability team provides a Helm chart that service teams deploy into
62+
their own namespaces. This chart creates two collectors:
63+
64+
**Sidecar Collector (in the application pod)**: Runs alongside the application
65+
container and is intentionally locked down. Service teams cannot modify its
66+
configuration. It collects all telemetry: metrics, logs, traces, regardless of
67+
what the team has chosen to export downstream. The configuration is immutable to
68+
prevent application restarts caused by configuration changes.
69+
70+
**Deployment Collector (standalone)**: Receives telemetry from the sidecar over
71+
OTLP and handles routing and export. Unlike the sidecar, this collector _is_
72+
configurable through Helm values. The observability team provides sensible
73+
defaults, but service teams can customize exporters and add new destinations.
74+
When configuration changes, only the deployment collector restarts. The
75+
application pod and its sidecar remain untouched.
76+
77+
### Tier 2: the managed namespace
78+
79+
The deployment collectors forward telemetry to a centralized namespace managed
80+
entirely by the observability team. A key architectural decision here is
81+
signal-level isolation: the managed namespace runs a separate collector
82+
deployment for each telemetry type: one for metrics, one for logs, and one for
83+
traces.
84+
85+
If a backend becomes rate-limited or starts rejecting data for one signal type,
86+
the others continue flowing uninterrupted. Despite handling thousands of
87+
collectors' worth of upstream traffic, these managed deployments have generally
88+
operated at default replica counts without requiring aggressive auto-scaling.
89+
90+
Service teams configure their desired backend through Helm values, which sets an
91+
HTTP header on OTLP exports. The managed namespace collectors use this header
92+
with the
93+
[routing connector](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/6aff35ab5351482a4664f29a7d5428cedcf61a92/connector/routingconnector?from_branch=main)
94+
to direct telemetry to the correct exporter.
95+
96+
### Tier 3: the observability backends
97+
98+
The managed namespace collectors export telemetry to backend destinations
99+
managed by the observability team. Multiple backends are supported, and teams
100+
select their destination through the Helm chart's values file.
101+
102+
## Auto-instrumentation: two lines and it works
103+
104+
Adobe leverages the OpenTelemetry Operator for auto-instrumentation across the
105+
languages supported by OpenTelemetry. The Operator is deployed to every cluster,
106+
and service teams enable instrumentation by adding two annotations to their
107+
Kubernetes deployment manifests:
108+
109+
```yaml
110+
instrumentation.opentelemetry.io/inject-java: 'true'
111+
sidecar.opentelemetry.io/inject: 'true'
112+
```
113+
114+
> "People add two lines in their deployment. And it just works," Bogdan said.
115+
116+
Teams select their language in the Helm values, and the Operator handles the
117+
rest. While teams are free to add manual SDK instrumentation—the sidecar accepts
118+
all OTLP data—the observability team's supported path focuses on the
119+
auto-instrumentation experience. The Operator has handled the scale of managing
120+
sidecars and auto-instrumentation across the deployment fleet without issues.
121+
122+
This design philosophy runs through the entire platform: make the default path
123+
require as little effort as possible, while leaving the door open for advanced
124+
use cases.
125+
126+
## Custom distribution and components
127+
128+
Adobe builds its own OpenTelemetry Collector distribution to include only the
129+
components they use, avoiding unnecessary dependencies from Contrib. This custom
130+
distribution is the default in the Helm chart provided to service teams.
131+
However, teams can manually switch to the Contrib distribution if they need
132+
components not included in the custom build.
133+
134+
Adobe also maintains custom components, most notably an extension addressing a
135+
fundamental challenge in their chained collector architecture.
136+
137+
### The chain collector problem
138+
139+
When collectors are chained, error visibility becomes a problem. The OTLP
140+
transaction between the user's deployment collector and the managed namespace
141+
collector completes with a 200 response _before_ the managed namespace collector
142+
attempts to export to the backend. If the backend rejects the data, the error is
143+
only visible in the managed namespace collector's logs.
144+
145+
> "The user would just see 200s. Metrics exported, all good," Bogdan explained.
146+
> "Which we didn't want."
147+
148+
To address this, Bogdan built a custom extension that acts as a circuit breaker
149+
for backend authentication. The extension runs in the managed namespace
150+
collector's receiver, proactively sending mock authentication requests to the
151+
backend and caching results. If authentication fails, it returns a 401 to the
152+
upstream collector before the OTLP transaction completes, propagating the error
153+
back to where users can see it.
154+
155+
Building this extension was one of Bogdan's first Go projects. The experience of
156+
trying to contribute upstream sparked deeper involvement with the OpenTelemetry
157+
community. Looking ahead, Bogdan would welcome a more general back-pressure
158+
mechanism in the Collector, where exporter failures propagate upstream through
159+
chained collectors.
160+
161+
## Deployment and lifecycle management
162+
163+
The observability team upgrades their collector distribution and the
164+
OpenTelemetry Operator on a quarterly cadence. Upgrade issues have been rare.
165+
166+
When the Helm chart is updated, service teams pick up the new collector version
167+
on their next deployment. However, the observability team has encountered a
168+
compatibility challenge between the Operator and older collector versions: when
169+
the Operator is upgraded, it can modify the `OpenTelemetryCollector` custom
170+
resource to align with new configuration expectations. If a service team is
171+
running a significantly older collector version, these changes can be
172+
incompatible, preventing collectors from starting.
173+
174+
The resolution is straightforward—upgrading the collector fixes the issue—but it
175+
has caused confusion for teams whose collectors suddenly break without any
176+
changes on their end.
177+
178+
### Navigating component deprecations
179+
180+
Adobe's deployment has also navigated component deprecations as OpenTelemetry
181+
evolves. The team originally used the routing processor to direct telemetry to
182+
different backends based on HTTP headers, but migrated to the routing connector
183+
when the processor was deprecated.
184+
185+
While the migration required work, the team views this as an expected part of
186+
working with a rapidly evolving project.
187+
188+
> "This is a risk we knew about, the whole OpenTelemetry landscape is changing
189+
> constantly and the benefits outweigh the 'issues' if you can call fast
190+
> development an issue," Bogdan explained.
191+
192+
## What works well
193+
194+
The overall experience has been positive. The Collector's component model, the
195+
auto-instrumentation experience via the Operator, and the Helm chart-based
196+
deployment model have all worked reliably. The plug-and-play nature of the
197+
platform, where teams go from zero to full observability with minimal
198+
configuration, has been positively received by adopting teams.
199+
200+
## Advice for others
201+
202+
Based on Adobe's experience building a platform-level observability pipeline:
203+
204+
- **Treat OpenTelemetry as a platform to build on**: Don't expect it to solve
205+
all your problems out of the box. It's designed to be extended and customized
206+
for your specific needs.
207+
- **Don't be afraid to build custom components**: The Collector's architecture
208+
makes it straightforward to build extensions tailored to your needs.
209+
- **Design for user simplicity**: Make the default path require minimal effort.
210+
The teams consuming your platform are not observability experts.
211+
- **Plan for error visibility in chained collectors**: OTLP transaction success
212+
does not guarantee end-to-end delivery. Consider how errors will surface to
213+
users.
214+
215+
## What's next
216+
217+
Adobe's story illustrates how a central observability team can offer a scalable,
218+
self-service OpenTelemetry pipeline across a large and diverse organization. By
219+
combining the Operator, Helm charts, sidecars, and per-signal collector
220+
deployments, they've created a platform where service teams get observability
221+
with minimal effort, while the observability team retains control over
222+
centralized infrastructure.
223+
224+
We'll continue sharing stories like this one, highlighting how different
225+
organizations tackle the challenges of running OpenTelemetry in production.
226+
227+
Have your own OpenTelemetry story to share? Join us in the CNCF
228+
[#otel-devex](https://cloud-native.slack.com/archives/C01S42U83B2) Slack
229+
channel. We'd love to hear how you're using OpenTelemetry and how we can keep
230+
improving the developer experience together.

content/en/search.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
---
22
title: Search Results
33
layout: search
4+
outputs: [HTML]
45
---

content/en/site/_index.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,17 +23,29 @@ Tentatively planned content organization:
2323

2424
- **About** — High-level information about the website project, including its
2525
purpose, ownership, and overall status.
26+
- **Needs, requirements, and features** — Stakeholder needs, requirements, and
27+
other relevant information broken down into features.
2628
- **Design** — Architectural design, Information Architecture (IA), layout, UX
2729
choices, theme related decisions, and other design-level artifacts.
2830
- **Implementation** — Code-level structure and conventions, Hugo/Docsy
2931
templates, SCSS/JS customizations, patches, and internal shims.
3032
- [**Build**](./build/) — Tooling, local development setup, CI/CD workflows,
3133
deployment environments, and automation details.
34+
- **Deployment** — Deployment-specific behavior for the OpenTelemetry website.
3235
- **Quality** — Link checking, accessibility standards, tests, review practices,
3336
and other quality-related processes.
3437
- **Roadmap** — Milestones, backlog, priorities, technical debt, and
3538
design/implementation decisions.
3639

40+
## Adding content
41+
42+
Keep pages short and high signal.
43+
44+
- Record decisions, rationale, constraints, and key rules.
45+
- Prefer concise summaries over long background sections.
46+
- Link to issues, plans, or code for detail instead of repeating them here.
47+
- Add only the content needed to explain how the site works and why.
48+
3749
## Site build information
3850

3951
{{% td/site-build-info/netlify "opentelemetry" %}}

content/en/site/build/npm-scripts.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -85,17 +85,19 @@ are internal helpers and are not intended to be run directly.
8585

8686
## Test and CI
8787

88-
| Script | Description |
89-
| -------------------------- | ----------------------------------------------------------------- |
90-
| `test` | Run the most commonly needed tests. |
91-
| `test:base` | Base tests. |
92-
| `test:all` | Run all tests: base checks plus collector-sync tests and lint. |
93-
| `test:collector-sync` | Collector-sync tests. |
94-
| `test-and-fix` | Run fix scripts (excluding i18n/refcache/submodule), then checks. |
95-
| `diff:check` | Warn if working tree has uncommitted changes. |
96-
| `diff:fail` | Fail if working tree has changes (e.g. after build). |
97-
| `netlify-build:preview` | `build:preview` then `diff:check`. |
98-
| `netlify-build:production` | `build:production` then `diff:check`. |
88+
| Script | Description |
89+
| -------------------------- | ------------------------------------------------------------------- |
90+
| `test` | Run the most commonly needed tests. |
91+
| `test:base` | Base tests. |
92+
| `test:all` | Runs `test:base`, `test:collector-sync`, and `test:edge-functions`. |
93+
| `test:collector-sync` | Collector-sync tests. |
94+
| `test:edge-functions` | Node test runner over `netlify/edge-functions/**/*.test.ts`. |
95+
| `test:edge-functions:live` | Optional `node:test` live suite; supports `--help`. |
96+
| `test-and-fix` | Run fix scripts (excluding i18n/refcache/submodule), then checks. |
97+
| `diff:check` | Warn if working tree has uncommitted changes. |
98+
| `diff:fail` | Fail if working tree has changes (e.g. after build). |
99+
| `netlify-build:preview` | `build:preview` then `diff:check`. |
100+
| `netlify-build:production` | `build:production` then `diff:check`. |
99101

100102
## Utilities
101103

content/en/site/design/_index.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
title: Design
3+
description: >-
4+
Architectural to lower-level design documentation for the OpenTelemetry
5+
website.
6+
weight: 30
7+
---
8+
9+
This section records design decisions for the OpenTelemetry website.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: Agent support
3+
description: >-
4+
Design notes for making OpenTelemetry website content easier for agents to
5+
consume.
6+
weight: 10
7+
---
8+
9+
Design notes for the broader [agent-friendly content delivery](/site/features/)
10+
feature.
11+
12+
## Markdown content negotiation
13+
14+
Use a Netlify Edge Function to serve Hugo's prebuilt `index.md` output when a
15+
request explicitly asks for or prefers `text/markdown`.
16+
17+
### Rationale
18+
19+
- Not every HTML page should have a Markdown equivalent.
20+
- HTTP negotiation belongs at the delivery layer.
21+
- The function can fall back to normal HTML when no Markdown artifact exists.
22+
23+
### Rules
24+
25+
- Only `GET` and `HEAD` are considered.
26+
- Requests for `.md` and other non-page resources bypass negotiation.
27+
- Page-like requests include:
28+
- slash paths
29+
- extensionless paths
30+
- `.../index.html` paths
31+
- Markdown is served when `text/markdown` is accepted with `q` greater than zero
32+
and its `q` is **greater than or equal to** the highest `q` for `text/html` /
33+
`application/xhtml+xml` (equal weights choose Markdown).
34+
- Wildcards such as `*/*` are ignored by design: only explicit markdown/html
35+
media types contribute q-values. This is a conservative choice that may be
36+
revisited later.
37+
- Missing Markdown falls back to the normal HTML response.
38+
- Negotiated responses set `Vary: Accept`.
39+
- `/search/` emits only HTML and therefore always falls back to HTML.
40+
41+
A note on path mapping:
42+
43+
- Pretty URLs like `/docs/` map to Hugo's `/docs/index.md` output;
44+
- `index.html` maps to the sibling `.md` file (for example `/docs/index.html`
45+
`/docs/index.md`).
46+
- Other `.html` paths are left to Netlify's normal redirects and routing: e.g.,
47+
Netlify redirects `/docs.html` to `/docs/`.
48+
49+
### Related implementation
50+
51+
- `config/_default/hugo.yaml` enables Markdown outputs for this site.
52+
- `content/en/search.md` opts the search page out with `outputs: [HTML]`.
53+
- `netlify.toml` wires the Edge Function ahead of other route handling.
54+
- `netlify/edge-functions/markdown-negotiation/index.ts` implements negotiation;
55+
`netlify/edge-functions/markdown-negotiation.ts` is the Netlify entry stub
56+
that re-exports it.

content/en/site/features.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
title: Features
3+
description: >-
4+
Brief summaries of notable site features with links to their primary
5+
references.
6+
weight: 20
7+
cSpell:ignore: docsy
8+
---
9+
10+
## Agent-friendly content delivery
11+
12+
Make site content easier for agents to discover and consume. Current work adds
13+
Markdown output for content pages and HTTP negotiation for
14+
`Accept: text/markdown`.
15+
16+
- Status: in progress
17+
- Design: [Agent support](../design/agent-support/)
18+
- Implementation: under `netlify/edge-functions/markdown-negotiation.ts` with
19+
folder for logic and tests.
20+
- References:
21+
[opentelemetry.io#9449](https://github.com/open-telemetry/opentelemetry.io/issues/9449),
22+
[docsy#2596](https://github.com/google/docsy/issues/2596)

0 commit comments

Comments
 (0)