Adopt `TracingChannel` for observability

I'd like to propose adding first-class [`TracingChannel`](https://nodejs.org/api/diagnostics_channel.html#class-tracingchannel) support to the Anthropic Node.js SDK, following the pattern established by [`undici`](https://github.com/nodejs/undici) in Node.js core and adopted across the npm ecosystem.

`TracingChannel` is a higher-level API built on top of `diagnostics_channel`, specifically designed for tracing async operations. It provides structured lifecycle channels (`start`, `end`, `error`, `asyncStart`, `asyncEnd`) and handles async context propagation correctly. This is the missing piece that makes monkey-patching approaches fragile in real-world async applications.

Current APM instrumentations use IITM (import-in-the-middle) for ESM and RITM (require-in-the-middle) for CJS to monkey-patch SDK internals. This has several fragility concerns:

- **Runtime lock-in:** both RITM and IITM rely on Node.js-specific module loader internals (`Module._resolveFilename`, `module.register()`). They don't work on Bun or Deno, which implement the Node.js API surface but not the module loader internals. The Anthropic SDK explicitly supports Node.js, Deno, and Bun, making monkey-patching especially inadequate.
- **ESM fragility:** IITM is built on Node.js's module customization hooks, which are still evolving and have been a persistent source of breakage in the OTel JS ecosystem.
- **Initialization ordering:** both require instrumentation to be set up before the SDK is first `require()`'d / `import`'d. Get the order wrong and instrumentation silently does nothing, which is very hard to debug in production.
- **Bundling and Externalization:** Users have to ensure their instrumented modules are externalized, which is becoming very difficult to guarantee with more and more frameworks bundling server-side code into single executables, binaries, or deployment files.

The current instrumentation landscape for the Anthropic SDK illustrates this problem well. There is no official `@opentelemetry/instrumentation-anthropic` for JavaScript. Instead, APM vendors have independently built their own solutions:

- **Sentry** uses IITM to intercept `require('@anthropic-ai/sdk')`, replaces the `Anthropic` constructor, and creates a **deep recursive Proxy** around the client instance to intercept method calls (`client.messages.create`, `client.messages.stream`, `client.completions.create`, etc.). This spans **~800 lines across 6 files**: constructor wrapping, deep Proxy creation, method interception, streaming event accumulation, attribute extraction, error mapping, and type definitions.
- **Traceloop's OpenLLMetry** patches `Messages.prototype.create` and `Completions.prototype.create` directly.
- **Arize AI's OpenInference** patches `Messages.prototype.create` with its own span creation logic.

Every vendor independently replicates the same logic: intercept construction or patch prototypes, extract model/token attributes, handle streaming chunk accumulation, map error types. With native TracingChannel support, all of this becomes a single subscription.

If the Anthropic SDK emits structured events through `TracingChannel`, instrumentation libraries become **subscribers**, not **patches**. Each tool listens independently with no ordering concerns, no clobbering, and no internal API dependency.

---

## Proposed Tracing Channels

All channels use the Node.js [`TracingChannel`](https://nodejs.org/api/diagnostics_channel.html#class-tracingchannel) API, which provides `start`, `end`, `asyncStart`, `asyncEnd`, and `error` sub-channels automatically. The channel design aims to support the [OpenTelemetry Semantic Conventions for Generative AI](https://opentelemetry.io/docs/specs/semconv/gen-ai/) systems, enabling APM vendors to produce standard `gen_ai.*` spans and attributes from the emitted events.

| TracingChannel | Tracks | Context fields |
|---|---|---|
| `@anthropic-ai/sdk:messages.create` | Non-streaming message creation (`messages.create` without `stream: true`) | `model`, `params` |
| `@anthropic-ai/sdk:messages.stream` | Streaming message creation (`messages.stream()` or `messages.create` with `stream: true`), from request initiation to stream completion | `model`, `params` |
| `@anthropic-ai/sdk:completions.create` | Legacy completions endpoint | `model`, `params` |

### Why Separate Channels

Each API method gets its own `TracingChannel`. This follows the `diagnostics_channel` design philosophy: many purpose-focused channels with their own subscriber sets, so dispatch is extremely cheap. Subscribers listen only to the operations they care about rather than filtering a firehose channel, which would add continuous overhead on every published message.

This also eliminates the need for a `method` discriminator field in the context — the channel name itself identifies the operation.

Operations like `messages.countTokens`, `models.get`, and `messages.batches.create` are administrative/utility calls that don't represent AI inference operations. APMs generally don't create GenAI spans for these. They are excluded to keep the channels focused on the operations that matter for tracing.

### Context Properties

Shared across all channels:

| Field | Source | OTel attribute it enables |
|---|---|---|
| `model` | `params.model` | `gen_ai.request.model` |
| `params` | Raw request parameters object | APMs extract: `gen_ai.request.temperature`, `gen_ai.request.top_p`, `gen_ai.request.top_k`, `gen_ai.request.max_tokens`, `gen_ai.input.messages`, `gen_ai.system_instructions`, `gen_ai.request.available_tools` |
| `result` | Raw response object (auto-set by TracingChannel on completion) | APMs extract: `gen_ai.response.id`, `gen_ai.response.model`, `gen_ai.response.finish_reasons`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.usage.cache_creation_input_tokens`, `gen_ai.usage.cache_read_input_tokens`, `gen_ai.response.text`, `gen_ai.response.tool_calls` |

### Why Raw Params and Response

The context passes raw `params` and the auto-set `result` (the API response) rather than pre-extracting individual attributes. This follows the pattern established by framework TracingChannel proposals (h3, Hono, Elysia) where raw objects are passed and APMs extract what they need. Benefits:

1. **Forward-compatible.** New API parameters and response fields (thinking, citations, new content block types) are automatically available to subscribers without SDK changes.
2. **No duplication.** `model` is a convenience accessor for the most common attribute. Everything else comes from the raw objects.
3. **Privacy is the subscriber's concern.** The SDK emits what it has. APMs decide what to record based on their own `recordInputs`/`recordOutputs` policies.

---

## Streaming

For non-streaming requests, `tracePromise` wraps the full operation: `start` fires before the request, `asyncEnd` fires when the response promise resolves.

For streaming (`messages.stream()` or `stream: true` on `messages.create`), the SDK returns a stream object immediately, but the work continues until all chunks arrive. The TracingChannel lifecycle should cover the full duration, from request initiation to stream completion. The `result` is populated with the final accumulated message (including total token usage, finish reasons, response ID, and content blocks) when the stream ends. This ensures APM spans reflect total generation time, not just time to first chunk.

This is particularly important for the Anthropic SDK because streaming responses arrive as a sequence of typed events (`message_start`, `content_block_start`, `content_block_delta`, `content_block_stop`, `message_delta`, `message_stop`). Today, every APM vendor independently implements streaming event accumulation logic. With TracingChannel, the SDK handles this internally and exposes the final accumulated result on `asyncEnd`.

---

## Example: What the SDK Emits

A simplified sketch of what the instrumentation looks like inside the SDK:

```ts
import dc from 'node:diagnostics_channel';
const messagesCreateChannel = dc.tracingChannel('@anthropic-ai/sdk:messages.create');
const messagesStreamChannel = dc.tracingChannel('@anthropic-ai/sdk:messages.stream');

// Inside messages.create (non-streaming)
async function create(params) {
  if (messagesCreateChannel.hasSubscribers === false) {
    return this._makeRequest(params);
  }

  const context = { model: params.model, params };
  return messagesCreateChannel.tracePromise(() => this._makeRequest(params), context);
}

// Inside messages.stream
async function stream(params) {
  if (messagesStreamChannel.hasSubscribers === false) {
    return this._makeStreamingRequest(params);
  }

  const context = { model: params.model, params };
  return messagesStreamChannel.tracePromise(() => this._makeStreamingRequest(params), context);
}
```

Each method gets its own channel. No Proxy, no constructor wrapping, no stream accumulation logic pushed onto consumers.

---

## How APM Tools Use This

### Today: Deep Proxy on the Client Constructor

Taking Sentry as an example, their Anthropic instrumentation uses IITM to intercept `require('@anthropic-ai/sdk')`, replaces the `Anthropic` constructor, and creates a deep recursive Proxy around the resulting client instance to intercept method calls at arbitrary nesting depth. This spans **~800 lines across 6 files**:

- [instrumentation.ts](https://github.com/getsentry/sentry-javascript/blob/develop/packages/node/src/integrations/tracing/anthropic-ai/instrumentation.ts) (~100 lines): IITM module patching, constructor wrapping, prototype chain preservation
- [index.ts](https://github.com/getsentry/sentry-javascript/blob/develop/packages/core/src/tracing/anthropic-ai/index.ts) (~280 lines): deep Proxy creation, method interception, span lifecycle management, request/response attribute extraction
- [streaming.ts](https://github.com/getsentry/sentry-javascript/blob/develop/packages/core/src/tracing/anthropic-ai/streaming.ts): async iterable wrapping, streaming event accumulation, tool call reconstruction across fragmented events
- [utils.ts](https://github.com/getsentry/sentry-javascript/blob/develop/packages/core/src/tracing/anthropic-ai/utils.ts) (~80 lines): message extraction, error type mapping, system prompt handling
- [constants.ts](https://github.com/getsentry/sentry-javascript/blob/develop/packages/core/src/tracing/anthropic-ai/constants.ts): method registry mapping API paths to operation types
- [types.ts](https://github.com/getsentry/sentry-javascript/blob/develop/packages/core/src/tracing/anthropic-ai/types.ts) (~130 lines): type definitions for responses, streaming events, content blocks, options

This approach has several problems:
- **Replaces the `Anthropic` constructor**, wrapping every client instance even if no APM is listening
- **Deep recursive Proxy on every property access.** Every `client.messages.create()` call goes through multiple levels of Proxy `get` traps
- **IITM dependency.** Only works on Node.js, not on Deno or Bun where the SDK also runs
- **Stream accumulation is fragile.** Each vendor independently implements streaming event processing, reconstructing tool calls from `content_block_start` / `content_block_delta` / `content_block_stop` sequences
- **Each APM vendor builds their own.** Sentry, Traceloop, Arize AI all independently replicate the same deep-proxy or prototype-patching pattern

### With TracingChannel: Subscribe to Structured Events

```ts
import dc from 'node:diagnostics_channel';

// Subscribe to each channel independently — only pay for what you listen to
const handlers = {
  start(ctx) {
    // ctx.model, ctx.params available
    ctx.span = tracer.startSpan(`chat ${ctx.model}`);
  },
  asyncEnd(ctx) {
    // ctx.result is auto-set by TracingChannel with the API response
    // (or accumulated final message for streaming)
    ctx.span?.end();
  },
  error(ctx) {
    ctx.span?.recordException(ctx.error);
  },
};

dc.tracingChannel('@anthropic-ai/sdk:messages.create').subscribe(handlers);
dc.tracingChannel('@anthropic-ai/sdk:messages.stream').subscribe(handlers);
```

**What changes for APM vendors:**

| Concern | Monkey-patching (today) | TracingChannel (proposed) |
|---|---|---|
| **Setup** | IITM intercepts `require('@anthropic-ai/sdk')` before first import | Subscribe to `diagnostics_channel` at any time. No ordering constraint |
| **Scope** | Replace constructor + deep Proxy on every client instance | One subscription per method of interest |
| **Method interception** | Recursive Proxy intercepts every property access on the client, even non-traced methods | No proxying. SDK emits events at execution time, subscribers observe |
| **Streaming** | Each vendor independently processes streaming events, accumulates content blocks, reconstructs tool calls from fragmented events | SDK handles accumulation internally; subscribers see a single span with the final message |
| **Multi-vendor** | Each vendor builds their own deep-proxy + stream accumulation logic | Independent subscribers, no interference |
| **Teardown** | Cannot cleanly remove a recursive Proxy from a constructor replacement | `unsubscribe()`, clean and reversible |
| **Runtime support** | IITM: Node.js only. SDK runs on Node.js, Deno, and Bun | Any runtime with `diagnostics_channel` |
| **Maintenance** | External packages must track SDK internal structure changes across Stainless regenerations | Native, maintained as part of the SDK |

---

## Implementation Notes

### Insertion Points

The Anthropic SDK is generated by [Stainless](https://www.stainless.com/). TracingChannel instrumentation should be added at the resource method level, where each API method (`messages.create`, `messages.stream`, `completions.create`) calls into the core HTTP client. This is where the operation type, model, and parameters are known.

Stainless supports custom code that persists across regeneration. TracingChannel support could be added as:
1. **A core middleware in the HTTP client pipeline**, triggered for specific resource methods
2. **Custom method wrappers** at the resource class level, configured through Stainless

The exact integration point is an implementation detail. The key requirement is that events are emitted at the right lifecycle moments with the right context.

### Async Model

All SDK methods return Promises. `tracePromise` is the correct wrapper for non-streaming operations. Streaming operations need manual lifecycle management (see the Streaming section above).

### shouldTrace Helper

```ts
const shouldTrace = (ch) => ch.hasSubscribers !== false;
```

This treats `undefined` (Node 18, where the aggregated `hasSubscribers` is broken) as "trace anyway" and `false` (Node 20+) as "skip". See [Node.js #54470](https://github.com/nodejs/node/issues/54470) for background.

### Zero-Cost Guarantee

Context objects should only be constructed inside a `hasSubscribers` guard:

```ts
if (shouldTrace(messagesCreateChannel)) {
  const context = { model: params.model, params };
  return messagesCreateChannel.tracePromise(fn, context);
} else {
  return fn();
}
```

When no APM subscribes, the overhead is a single boolean check per API call.

---

## Backward Compatibility

Zero-cost when no subscribers are registered. `hasSubscribers` is checked before constructing any context objects. Silently skipped on runtimes where `TracingChannel` is unavailable.

Since the Anthropic SDK supports Node.js, Deno, and Bun, the cross-runtime loading pattern is needed:

```ts
let dc;
try {
  if (typeof process !== 'undefined' && typeof process.getBuiltinModule === 'function') {
    dc = process.getBuiltinModule('node:diagnostics_channel');
  }
  if (!dc) {
    dc = require('node:diagnostics_channel');
  }
} catch {
  // diagnostics_channel not available on this runtime, no-op
}
```

- `typeof process` guard: safe in browsers and edge runtimes where `process` doesn't exist
- `getBuiltinModule` path: bundler-invisible (no static import to resolve), works in Node 22.3+, Deno, and Bun 1.2.7+
- `require` fallback: covers older Node, Bun, and Cloudflare Workers (with `nodejs_compat`)
- `try/catch`: swallows the error in browsers or any runtime without `diagnostics_channel`

---

## Prior Art

This approach follows the same pattern already adopted or in progress by other libraries:

**AI / ML:**
- **`openai`**: [openai/openai-node#1819](https://github.com/openai/openai-node/issues/1819), issue opened
- **`ai` (Vercel AI SDK)**: [vercel/ai#14410](https://github.com/vercel/ai/issues/14410), issue opened

**Frameworks:**
- **`undici`** (Node.js core): ships `TracingChannel` support since Node 20.12 ([`undici:request`](https://nodejs.org/api/diagnostics_channel.html#undici-channels))
- **`fastify`**: ships `TracingChannel` support natively (`tracing:fastify.request.handler`)
- **`h3`**: [h3js/h3#1251](https://github.com/h3js/h3/pull/1251) ✅ merged
- **`srvx`**: [h3js/srvx#141](https://github.com/h3js/srvx/pull/141) ✅ merged
- **`elysia`**: [elysiajs/elysia#1809](https://github.com/elysiajs/elysia/issues/1809), in discussion
- **`hono`**: [honojs/hono#4842](https://github.com/honojs/hono/issues/4842), issue opened
- **`koa`**: proposal drafted
- **`express`**: [pillarjs/router#196](https://github.com/pillarjs/router/pull/196), PR open

**Databases:**
- **`mysql2`**: [sidorares/node-mysql2#4178](https://github.com/sidorares/node-mysql2/pull/4178) ✅ merged
- **`node-redis`**: [redis/node-redis#3195](https://github.com/redis/node-redis/pull/3195) ✅ merged
- **`ioredis`**: [redis/ioredis#2089](https://github.com/redis/ioredis/pull/2089) ✅ merged
- **`pg` / `pg-pool`**: [brianc/node-postgres#3650](https://github.com/brianc/node-postgres/pull/3650), PR open
- **`knex`**: [knex/knex#6410](https://github.com/knex/knex/pull/6410), PR open
- **`mongodb`**: [NODE-7472](https://jira.mongodb.org/browse/NODE-7472), issue opened
- **`mongoose`**: [Automattic/mongoose#16105](https://github.com/Automattic/mongoose/issues/16105), issue opened
- **`tedious`**: [tediousjs/tedious#1727](https://github.com/tediousjs/tedious/issues/1727), issue opened
- **`@prisma/client`**: [prisma/prisma#29353](https://github.com/prisma/prisma/issues/29353), issue opened

**Other:**
- **`graphql`**: [graphql/graphql-js#4670](https://github.com/graphql/graphql-js/pull/4670), PR open
- **`unstorage`**: [unjs/unstorage#707](https://github.com/unjs/unstorage/pull/707) ✅ merged
- **`db0`**: [unjs/db0#193](https://github.com/unjs/db0/pull/193), PR open
- **`nitro`**: [nitrojs/nitro#4001](https://github.com/nitrojs/nitro/pull/4001) ✅ merged

---

Would love to hear if there's appetite for this. Happy to put together a PR with the implementation if so.


TracingChannel	Tracks	Context fields
`@anthropic-ai/sdk:messages.create`	Non-streaming message creation (`messages.create` without `stream: true`)	`model`, `params`
`@anthropic-ai/sdk:messages.stream`	Streaming message creation (`messages.stream()` or `messages.create` with `stream: true`), from request initiation to stream completion	`model`, `params`
`@anthropic-ai/sdk:completions.create`	Legacy completions endpoint	`model`, `params`

Field	Source	OTel attribute it enables
`model`	`params.model`	`gen_ai.request.model`
`params`	Raw request parameters object	APMs extract: `gen_ai.request.temperature`, `gen_ai.request.top_p`, `gen_ai.request.top_k`, `gen_ai.request.max_tokens`, `gen_ai.input.messages`, `gen_ai.system_instructions`, `gen_ai.request.available_tools`
`result`	Raw response object (auto-set by TracingChannel on completion)	APMs extract: `gen_ai.response.id`, `gen_ai.response.model`, `gen_ai.response.finish_reasons`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.usage.cache_creation_input_tokens`, `gen_ai.usage.cache_read_input_tokens`, `gen_ai.response.text`, `gen_ai.response.tool_calls`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopt `TracingChannel` for observability #1036

Proposed Tracing Channels

Why Separate Channels

Context Properties

Why Raw Params and Response

Streaming

Example: What the SDK Emits

How APM Tools Use This

Today: Deep Proxy on the Client Constructor

With TracingChannel: Subscribe to Structured Events

Implementation Notes

Insertion Points

Async Model

shouldTrace Helper

Zero-Cost Guarantee

Backward Compatibility

Prior Art

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Concern	Monkey-patching (today)	TracingChannel (proposed)
Setup	IITM intercepts `require('@anthropic-ai/sdk')` before first import	Subscribe to `diagnostics_channel` at any time. No ordering constraint
Scope	Replace constructor + deep Proxy on every client instance	One subscription per method of interest
Method interception	Recursive Proxy intercepts every property access on the client, even non-traced methods	No proxying. SDK emits events at execution time, subscribers observe
Streaming	Each vendor independently processes streaming events, accumulates content blocks, reconstructs tool calls from fragmented events	SDK handles accumulation internally; subscribers see a single span with the final message
Multi-vendor	Each vendor builds their own deep-proxy + stream accumulation logic	Independent subscribers, no interference
Teardown	Cannot cleanly remove a recursive Proxy from a constructor replacement	`unsubscribe()`, clean and reversible
Runtime support	IITM: Node.js only. SDK runs on Node.js, Deno, and Bun	Any runtime with `diagnostics_channel`
Maintenance	External packages must track SDK internal structure changes across Stainless regenerations	Native, maintained as part of the SDK

Adopt TracingChannel for observability #1036

Description

Proposed Tracing Channels

Why Separate Channels

Context Properties

Why Raw Params and Response

Streaming

Example: What the SDK Emits

How APM Tools Use This

Today: Deep Proxy on the Client Constructor

With TracingChannel: Subscribe to Structured Events

Implementation Notes

Insertion Points

Async Model

shouldTrace Helper

Zero-Cost Guarantee

Backward Compatibility

Prior Art

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Adopt `TracingChannel` for observability #1036