I'd like to propose adding first-class TracingChannel support to the Anthropic Node.js SDK, following the pattern established by undici in Node.js core and adopted across the npm ecosystem.
TracingChannel is a higher-level API built on top of diagnostics_channel, specifically designed for tracing async operations. It provides structured lifecycle channels (start, end, error, asyncStart, asyncEnd) and handles async context propagation correctly. This is the missing piece that makes monkey-patching approaches fragile in real-world async applications.
Current APM instrumentations use IITM (import-in-the-middle) for ESM and RITM (require-in-the-middle) for CJS to monkey-patch SDK internals. This has several fragility concerns:
- Runtime lock-in: both RITM and IITM rely on Node.js-specific module loader internals (
Module._resolveFilename, module.register()). They don't work on Bun or Deno, which implement the Node.js API surface but not the module loader internals. The Anthropic SDK explicitly supports Node.js, Deno, and Bun, making monkey-patching especially inadequate.
- ESM fragility: IITM is built on Node.js's module customization hooks, which are still evolving and have been a persistent source of breakage in the OTel JS ecosystem.
- Initialization ordering: both require instrumentation to be set up before the SDK is first
require()'d / import'd. Get the order wrong and instrumentation silently does nothing, which is very hard to debug in production.
- Bundling and Externalization: Users have to ensure their instrumented modules are externalized, which is becoming very difficult to guarantee with more and more frameworks bundling server-side code into single executables, binaries, or deployment files.
The current instrumentation landscape for the Anthropic SDK illustrates this problem well. There is no official @opentelemetry/instrumentation-anthropic for JavaScript. Instead, APM vendors have independently built their own solutions:
- Sentry uses IITM to intercept
require('@anthropic-ai/sdk'), replaces the Anthropic constructor, and creates a deep recursive Proxy around the client instance to intercept method calls (client.messages.create, client.messages.stream, client.completions.create, etc.). This spans ~800 lines across 6 files: constructor wrapping, deep Proxy creation, method interception, streaming event accumulation, attribute extraction, error mapping, and type definitions.
- Traceloop's OpenLLMetry patches
Messages.prototype.create and Completions.prototype.create directly.
- Arize AI's OpenInference patches
Messages.prototype.create with its own span creation logic.
Every vendor independently replicates the same logic: intercept construction or patch prototypes, extract model/token attributes, handle streaming chunk accumulation, map error types. With native TracingChannel support, all of this becomes a single subscription.
If the Anthropic SDK emits structured events through TracingChannel, instrumentation libraries become subscribers, not patches. Each tool listens independently with no ordering concerns, no clobbering, and no internal API dependency.
Proposed Tracing Channels
All channels use the Node.js TracingChannel API, which provides start, end, asyncStart, asyncEnd, and error sub-channels automatically. The channel design aims to support the OpenTelemetry Semantic Conventions for Generative AI systems, enabling APM vendors to produce standard gen_ai.* spans and attributes from the emitted events.
| TracingChannel |
Tracks |
Context fields |
@anthropic-ai/sdk:messages.create |
Non-streaming message creation (messages.create without stream: true) |
model, params |
@anthropic-ai/sdk:messages.stream |
Streaming message creation (messages.stream() or messages.create with stream: true), from request initiation to stream completion |
model, params |
@anthropic-ai/sdk:completions.create |
Legacy completions endpoint |
model, params |
Why Separate Channels
Each API method gets its own TracingChannel. This follows the diagnostics_channel design philosophy: many purpose-focused channels with their own subscriber sets, so dispatch is extremely cheap. Subscribers listen only to the operations they care about rather than filtering a firehose channel, which would add continuous overhead on every published message.
This also eliminates the need for a method discriminator field in the context — the channel name itself identifies the operation.
Operations like messages.countTokens, models.get, and messages.batches.create are administrative/utility calls that don't represent AI inference operations. APMs generally don't create GenAI spans for these. They are excluded to keep the channels focused on the operations that matter for tracing.
Context Properties
Shared across all channels:
| Field |
Source |
OTel attribute it enables |
model |
params.model |
gen_ai.request.model |
params |
Raw request parameters object |
APMs extract: gen_ai.request.temperature, gen_ai.request.top_p, gen_ai.request.top_k, gen_ai.request.max_tokens, gen_ai.input.messages, gen_ai.system_instructions, gen_ai.request.available_tools |
result |
Raw response object (auto-set by TracingChannel on completion) |
APMs extract: gen_ai.response.id, gen_ai.response.model, gen_ai.response.finish_reasons, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.cache_creation_input_tokens, gen_ai.usage.cache_read_input_tokens, gen_ai.response.text, gen_ai.response.tool_calls |
Why Raw Params and Response
The context passes raw params and the auto-set result (the API response) rather than pre-extracting individual attributes. This follows the pattern established by framework TracingChannel proposals (h3, Hono, Elysia) where raw objects are passed and APMs extract what they need. Benefits:
- Forward-compatible. New API parameters and response fields (thinking, citations, new content block types) are automatically available to subscribers without SDK changes.
- No duplication.
model is a convenience accessor for the most common attribute. Everything else comes from the raw objects.
- Privacy is the subscriber's concern. The SDK emits what it has. APMs decide what to record based on their own
recordInputs/recordOutputs policies.
Streaming
For non-streaming requests, tracePromise wraps the full operation: start fires before the request, asyncEnd fires when the response promise resolves.
For streaming (messages.stream() or stream: true on messages.create), the SDK returns a stream object immediately, but the work continues until all chunks arrive. The TracingChannel lifecycle should cover the full duration, from request initiation to stream completion. The result is populated with the final accumulated message (including total token usage, finish reasons, response ID, and content blocks) when the stream ends. This ensures APM spans reflect total generation time, not just time to first chunk.
This is particularly important for the Anthropic SDK because streaming responses arrive as a sequence of typed events (message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop). Today, every APM vendor independently implements streaming event accumulation logic. With TracingChannel, the SDK handles this internally and exposes the final accumulated result on asyncEnd.
Example: What the SDK Emits
A simplified sketch of what the instrumentation looks like inside the SDK:
import dc from 'node:diagnostics_channel';
const messagesCreateChannel = dc.tracingChannel('@anthropic-ai/sdk:messages.create');
const messagesStreamChannel = dc.tracingChannel('@anthropic-ai/sdk:messages.stream');
// Inside messages.create (non-streaming)
async function create(params) {
if (messagesCreateChannel.hasSubscribers === false) {
return this._makeRequest(params);
}
const context = { model: params.model, params };
return messagesCreateChannel.tracePromise(() => this._makeRequest(params), context);
}
// Inside messages.stream
async function stream(params) {
if (messagesStreamChannel.hasSubscribers === false) {
return this._makeStreamingRequest(params);
}
const context = { model: params.model, params };
return messagesStreamChannel.tracePromise(() => this._makeStreamingRequest(params), context);
}
Each method gets its own channel. No Proxy, no constructor wrapping, no stream accumulation logic pushed onto consumers.
How APM Tools Use This
Today: Deep Proxy on the Client Constructor
Taking Sentry as an example, their Anthropic instrumentation uses IITM to intercept require('@anthropic-ai/sdk'), replaces the Anthropic constructor, and creates a deep recursive Proxy around the resulting client instance to intercept method calls at arbitrary nesting depth. This spans ~800 lines across 6 files:
- instrumentation.ts (~100 lines): IITM module patching, constructor wrapping, prototype chain preservation
- index.ts (~280 lines): deep Proxy creation, method interception, span lifecycle management, request/response attribute extraction
- streaming.ts: async iterable wrapping, streaming event accumulation, tool call reconstruction across fragmented events
- utils.ts (~80 lines): message extraction, error type mapping, system prompt handling
- constants.ts: method registry mapping API paths to operation types
- types.ts (~130 lines): type definitions for responses, streaming events, content blocks, options
This approach has several problems:
- Replaces the
Anthropic constructor, wrapping every client instance even if no APM is listening
- Deep recursive Proxy on every property access. Every
client.messages.create() call goes through multiple levels of Proxy get traps
- IITM dependency. Only works on Node.js, not on Deno or Bun where the SDK also runs
- Stream accumulation is fragile. Each vendor independently implements streaming event processing, reconstructing tool calls from
content_block_start / content_block_delta / content_block_stop sequences
- Each APM vendor builds their own. Sentry, Traceloop, Arize AI all independently replicate the same deep-proxy or prototype-patching pattern
With TracingChannel: Subscribe to Structured Events
import dc from 'node:diagnostics_channel';
// Subscribe to each channel independently — only pay for what you listen to
const handlers = {
start(ctx) {
// ctx.model, ctx.params available
ctx.span = tracer.startSpan(`chat ${ctx.model}`);
},
asyncEnd(ctx) {
// ctx.result is auto-set by TracingChannel with the API response
// (or accumulated final message for streaming)
ctx.span?.end();
},
error(ctx) {
ctx.span?.recordException(ctx.error);
},
};
dc.tracingChannel('@anthropic-ai/sdk:messages.create').subscribe(handlers);
dc.tracingChannel('@anthropic-ai/sdk:messages.stream').subscribe(handlers);
What changes for APM vendors:
| Concern |
Monkey-patching (today) |
TracingChannel (proposed) |
| Setup |
IITM intercepts require('@anthropic-ai/sdk') before first import |
Subscribe to diagnostics_channel at any time. No ordering constraint |
| Scope |
Replace constructor + deep Proxy on every client instance |
One subscription per method of interest |
| Method interception |
Recursive Proxy intercepts every property access on the client, even non-traced methods |
No proxying. SDK emits events at execution time, subscribers observe |
| Streaming |
Each vendor independently processes streaming events, accumulates content blocks, reconstructs tool calls from fragmented events |
SDK handles accumulation internally; subscribers see a single span with the final message |
| Multi-vendor |
Each vendor builds their own deep-proxy + stream accumulation logic |
Independent subscribers, no interference |
| Teardown |
Cannot cleanly remove a recursive Proxy from a constructor replacement |
unsubscribe(), clean and reversible |
| Runtime support |
IITM: Node.js only. SDK runs on Node.js, Deno, and Bun |
Any runtime with diagnostics_channel |
| Maintenance |
External packages must track SDK internal structure changes across Stainless regenerations |
Native, maintained as part of the SDK |
Implementation Notes
Insertion Points
The Anthropic SDK is generated by Stainless. TracingChannel instrumentation should be added at the resource method level, where each API method (messages.create, messages.stream, completions.create) calls into the core HTTP client. This is where the operation type, model, and parameters are known.
Stainless supports custom code that persists across regeneration. TracingChannel support could be added as:
- A core middleware in the HTTP client pipeline, triggered for specific resource methods
- Custom method wrappers at the resource class level, configured through Stainless
The exact integration point is an implementation detail. The key requirement is that events are emitted at the right lifecycle moments with the right context.
Async Model
All SDK methods return Promises. tracePromise is the correct wrapper for non-streaming operations. Streaming operations need manual lifecycle management (see the Streaming section above).
shouldTrace Helper
const shouldTrace = (ch) => ch.hasSubscribers !== false;
This treats undefined (Node 18, where the aggregated hasSubscribers is broken) as "trace anyway" and false (Node 20+) as "skip". See Node.js #54470 for background.
Zero-Cost Guarantee
Context objects should only be constructed inside a hasSubscribers guard:
if (shouldTrace(messagesCreateChannel)) {
const context = { model: params.model, params };
return messagesCreateChannel.tracePromise(fn, context);
} else {
return fn();
}
When no APM subscribes, the overhead is a single boolean check per API call.
Backward Compatibility
Zero-cost when no subscribers are registered. hasSubscribers is checked before constructing any context objects. Silently skipped on runtimes where TracingChannel is unavailable.
Since the Anthropic SDK supports Node.js, Deno, and Bun, the cross-runtime loading pattern is needed:
let dc;
try {
if (typeof process !== 'undefined' && typeof process.getBuiltinModule === 'function') {
dc = process.getBuiltinModule('node:diagnostics_channel');
}
if (!dc) {
dc = require('node:diagnostics_channel');
}
} catch {
// diagnostics_channel not available on this runtime, no-op
}
typeof process guard: safe in browsers and edge runtimes where process doesn't exist
getBuiltinModule path: bundler-invisible (no static import to resolve), works in Node 22.3+, Deno, and Bun 1.2.7+
require fallback: covers older Node, Bun, and Cloudflare Workers (with nodejs_compat)
try/catch: swallows the error in browsers or any runtime without diagnostics_channel
Prior Art
This approach follows the same pattern already adopted or in progress by other libraries:
AI / ML:
Frameworks:
Databases:
Other:
Would love to hear if there's appetite for this. Happy to put together a PR with the implementation if so.
I'd like to propose adding first-class
TracingChannelsupport to the Anthropic Node.js SDK, following the pattern established byundiciin Node.js core and adopted across the npm ecosystem.TracingChannelis a higher-level API built on top ofdiagnostics_channel, specifically designed for tracing async operations. It provides structured lifecycle channels (start,end,error,asyncStart,asyncEnd) and handles async context propagation correctly. This is the missing piece that makes monkey-patching approaches fragile in real-world async applications.Current APM instrumentations use IITM (import-in-the-middle) for ESM and RITM (require-in-the-middle) for CJS to monkey-patch SDK internals. This has several fragility concerns:
Module._resolveFilename,module.register()). They don't work on Bun or Deno, which implement the Node.js API surface but not the module loader internals. The Anthropic SDK explicitly supports Node.js, Deno, and Bun, making monkey-patching especially inadequate.require()'d /import'd. Get the order wrong and instrumentation silently does nothing, which is very hard to debug in production.The current instrumentation landscape for the Anthropic SDK illustrates this problem well. There is no official
@opentelemetry/instrumentation-anthropicfor JavaScript. Instead, APM vendors have independently built their own solutions:require('@anthropic-ai/sdk'), replaces theAnthropicconstructor, and creates a deep recursive Proxy around the client instance to intercept method calls (client.messages.create,client.messages.stream,client.completions.create, etc.). This spans ~800 lines across 6 files: constructor wrapping, deep Proxy creation, method interception, streaming event accumulation, attribute extraction, error mapping, and type definitions.Messages.prototype.createandCompletions.prototype.createdirectly.Messages.prototype.createwith its own span creation logic.Every vendor independently replicates the same logic: intercept construction or patch prototypes, extract model/token attributes, handle streaming chunk accumulation, map error types. With native TracingChannel support, all of this becomes a single subscription.
If the Anthropic SDK emits structured events through
TracingChannel, instrumentation libraries become subscribers, not patches. Each tool listens independently with no ordering concerns, no clobbering, and no internal API dependency.Proposed Tracing Channels
All channels use the Node.js
TracingChannelAPI, which providesstart,end,asyncStart,asyncEnd, anderrorsub-channels automatically. The channel design aims to support the OpenTelemetry Semantic Conventions for Generative AI systems, enabling APM vendors to produce standardgen_ai.*spans and attributes from the emitted events.@anthropic-ai/sdk:messages.createmessages.createwithoutstream: true)model,params@anthropic-ai/sdk:messages.streammessages.stream()ormessages.createwithstream: true), from request initiation to stream completionmodel,params@anthropic-ai/sdk:completions.createmodel,paramsWhy Separate Channels
Each API method gets its own
TracingChannel. This follows thediagnostics_channeldesign philosophy: many purpose-focused channels with their own subscriber sets, so dispatch is extremely cheap. Subscribers listen only to the operations they care about rather than filtering a firehose channel, which would add continuous overhead on every published message.This also eliminates the need for a
methoddiscriminator field in the context — the channel name itself identifies the operation.Operations like
messages.countTokens,models.get, andmessages.batches.createare administrative/utility calls that don't represent AI inference operations. APMs generally don't create GenAI spans for these. They are excluded to keep the channels focused on the operations that matter for tracing.Context Properties
Shared across all channels:
modelparams.modelgen_ai.request.modelparamsgen_ai.request.temperature,gen_ai.request.top_p,gen_ai.request.top_k,gen_ai.request.max_tokens,gen_ai.input.messages,gen_ai.system_instructions,gen_ai.request.available_toolsresultgen_ai.response.id,gen_ai.response.model,gen_ai.response.finish_reasons,gen_ai.usage.input_tokens,gen_ai.usage.output_tokens,gen_ai.usage.cache_creation_input_tokens,gen_ai.usage.cache_read_input_tokens,gen_ai.response.text,gen_ai.response.tool_callsWhy Raw Params and Response
The context passes raw
paramsand the auto-setresult(the API response) rather than pre-extracting individual attributes. This follows the pattern established by framework TracingChannel proposals (h3, Hono, Elysia) where raw objects are passed and APMs extract what they need. Benefits:modelis a convenience accessor for the most common attribute. Everything else comes from the raw objects.recordInputs/recordOutputspolicies.Streaming
For non-streaming requests,
tracePromisewraps the full operation:startfires before the request,asyncEndfires when the response promise resolves.For streaming (
messages.stream()orstream: trueonmessages.create), the SDK returns a stream object immediately, but the work continues until all chunks arrive. The TracingChannel lifecycle should cover the full duration, from request initiation to stream completion. Theresultis populated with the final accumulated message (including total token usage, finish reasons, response ID, and content blocks) when the stream ends. This ensures APM spans reflect total generation time, not just time to first chunk.This is particularly important for the Anthropic SDK because streaming responses arrive as a sequence of typed events (
message_start,content_block_start,content_block_delta,content_block_stop,message_delta,message_stop). Today, every APM vendor independently implements streaming event accumulation logic. With TracingChannel, the SDK handles this internally and exposes the final accumulated result onasyncEnd.Example: What the SDK Emits
A simplified sketch of what the instrumentation looks like inside the SDK:
Each method gets its own channel. No Proxy, no constructor wrapping, no stream accumulation logic pushed onto consumers.
How APM Tools Use This
Today: Deep Proxy on the Client Constructor
Taking Sentry as an example, their Anthropic instrumentation uses IITM to intercept
require('@anthropic-ai/sdk'), replaces theAnthropicconstructor, and creates a deep recursive Proxy around the resulting client instance to intercept method calls at arbitrary nesting depth. This spans ~800 lines across 6 files:This approach has several problems:
Anthropicconstructor, wrapping every client instance even if no APM is listeningclient.messages.create()call goes through multiple levels of Proxygettrapscontent_block_start/content_block_delta/content_block_stopsequencesWith TracingChannel: Subscribe to Structured Events
What changes for APM vendors:
require('@anthropic-ai/sdk')before first importdiagnostics_channelat any time. No ordering constraintunsubscribe(), clean and reversiblediagnostics_channelImplementation Notes
Insertion Points
The Anthropic SDK is generated by Stainless. TracingChannel instrumentation should be added at the resource method level, where each API method (
messages.create,messages.stream,completions.create) calls into the core HTTP client. This is where the operation type, model, and parameters are known.Stainless supports custom code that persists across regeneration. TracingChannel support could be added as:
The exact integration point is an implementation detail. The key requirement is that events are emitted at the right lifecycle moments with the right context.
Async Model
All SDK methods return Promises.
tracePromiseis the correct wrapper for non-streaming operations. Streaming operations need manual lifecycle management (see the Streaming section above).shouldTrace Helper
This treats
undefined(Node 18, where the aggregatedhasSubscribersis broken) as "trace anyway" andfalse(Node 20+) as "skip". See Node.js #54470 for background.Zero-Cost Guarantee
Context objects should only be constructed inside a
hasSubscribersguard:When no APM subscribes, the overhead is a single boolean check per API call.
Backward Compatibility
Zero-cost when no subscribers are registered.
hasSubscribersis checked before constructing any context objects. Silently skipped on runtimes whereTracingChannelis unavailable.Since the Anthropic SDK supports Node.js, Deno, and Bun, the cross-runtime loading pattern is needed:
typeof processguard: safe in browsers and edge runtimes whereprocessdoesn't existgetBuiltinModulepath: bundler-invisible (no static import to resolve), works in Node 22.3+, Deno, and Bun 1.2.7+requirefallback: covers older Node, Bun, and Cloudflare Workers (withnodejs_compat)try/catch: swallows the error in browsers or any runtime withoutdiagnostics_channelPrior Art
This approach follows the same pattern already adopted or in progress by other libraries:
AI / ML:
openai: openai/openai-node#1819, issue openedai(Vercel AI SDK): vercel/ai#14410, issue openedFrameworks:
undici(Node.js core): shipsTracingChannelsupport since Node 20.12 (undici:request)fastify: shipsTracingChannelsupport natively (tracing:fastify.request.handler)h3: h3js/h3#1251 ✅ mergedsrvx: h3js/srvx#141 ✅ mergedelysia: elysiajs/elysia#1809, in discussionhono: honojs/hono#4842, issue openedkoa: proposal draftedexpress: pillarjs/router#196, PR openDatabases:
mysql2: sidorares/node-mysql2#4178 ✅ mergednode-redis: redis/node-redis#3195 ✅ mergedioredis: redis/ioredis#2089 ✅ mergedpg/pg-pool: brianc/node-postgres#3650, PR openknex: knex/knex#6410, PR openmongodb: NODE-7472, issue openedmongoose: Automattic/mongoose#16105, issue openedtedious: tediousjs/tedious#1727, issue opened@prisma/client: prisma/prisma#29353, issue openedOther:
graphql: graphql/graphql-js#4670, PR openunstorage: unjs/unstorage#707 ✅ mergeddb0: unjs/db0#193, PR opennitro: nitrojs/nitro#4001 ✅ mergedWould love to hear if there's appetite for this. Happy to put together a PR with the implementation if so.