feat(instrumentation-openai): Migrate OpenAI instrumentation to OTel 1.40 semantic conventions by dvirski · Pull Request #909 · traceloop/openllmetry-js

dvirski · 2026-03-29T11:27:01Z

Summary by CodeRabbit

Release Notes

New Features
- Added enhanced telemetry support for OpenAI image operations with improved attribute tracking.
- Extended OpenAI SDK compatibility to version 6.x.
Improvements
- Standardized telemetry attributes for chat completions, text completions, and image generation operations.
- Enhanced token usage tracking with separate input and output token accounting.
- Improved finish reason and tool call reporting in telemetry data.
Dependencies
- Upgraded OpenTelemetry semantic conventions dependency.
- Updated OpenAI SDK from 5.x to 6.x.

coderabbitai · 2026-03-29T11:27:20Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR migrates OpenAI instrumentation to standardized OpenTelemetry GenAI semantic conventions (v1.40.0), replacing legacy prompt/completion attributes with structured input/output messages. It updates OpenAI SDK support to v6, introduces new message-building helpers in instrumentation-utils, and updates all test assertions accordingly.

Changes

Cohort / File(s)	Summary
Package Dependencies `packages/instrumentation-openai/package.json`, `packages/sample-app/package.json`	Updated `@opentelemetry/semantic-conventions` to `^1.40.0`, OpenAI SDK from `5.12.2` to `^6.32.0`, added `@traceloop/instrumentation-utils` workspace dependency, and added `@opentelemetry/instrumentation` `^0.203.0` to sample app.
Core OpenAI Instrumentation `packages/instrumentation-openai/src/instrumentation.ts`, `packages/instrumentation-openai/src/image-wrappers.ts`	Migrated from legacy prompt/completion attributes to standardized `gen_ai.input.messages`, `gen_ai.output.messages`, `gen_ai.usage.input_tokens`, and `gen_ai.usage.output_tokens`. Updated span naming to include operation/model. Replaced incubating finish-reason attributes with mapped finish-reason arrays. Added provider/operation name constants.
Message Helpers `packages/instrumentation-openai/src/message-helpers.ts`	New module providing OpenAI-specific converters: `buildOpenAIInputMessages`, `buildOpenAIOutputMessage`, `buildOpenAICompletionOutputMessage`, and `openaiFinishReasonMap` to translate between OpenAI SDK shapes and OTel message structures, including content-block mapping and tool-call handling.
Shared Utilities `packages/instrumentation-utils/src/content-block-mappers.ts`, `packages/instrumentation-utils/src/index.ts`	Added `mapOpenAIContentBlock` function to convert OpenAI request content parts (text, images, audio, files) into OTel-compliant part objects alongside existing Anthropic mapper. Exported new function in package index.
Test Updates `packages/instrumentation-openai/test/instrumentation.test.ts`, `packages/traceloop-sdk/test/...`	Updated test assertions to validate structured `ATTR_GEN_AI_INPUT_MESSAGES`/`ATTR_GEN_AI_OUTPUT_MESSAGES` (parsed from JSON), span discovery by name prefix matching, finish reasons via mapped arrays, and split token usage into input/output. Updated OpenAI mock SDK usage for new v6 API surface.
Documentation `packages/traceloop-sdk/test/test-setup.ts`	Added TODO comment noting planned migration from Polly.js to `msw`/`undici` MockAgent for OpenAI v6 compatibility.

Sequence Diagram

sequenceDiagram
    participant Client as Client Code
    participant Instr as OpenAI Instrumentation
    participant Helpers as Message Helpers
    participant OTel as OTel Attributes
    participant SDK as OpenAI SDK

    Client->>Instr: Chat/Image Request
    Instr->>Helpers: buildOpenAIInputMessages(messages)
    Helpers->>Helpers: mapOpenAIContentBlock(parts)
    Helpers->>OTel: gen_ai.input.messages
    
    Instr->>SDK: Execute Request
    SDK->>SDK: Process (model inference)
    SDK-->>Instr: Response (choices, usage)
    
    Instr->>Helpers: buildOpenAIOutputMessage(choice)
    Helpers->>Helpers: Map finish_reason via map
    Helpers->>Helpers: Build tool_calls/content parts
    Helpers->>OTel: gen_ai.output.messages
    
    Instr->>OTel: gen_ai.provider_name<br/>gen_ai.operation_name<br/>gen_ai.usage.input_tokens<br/>gen_ai.usage.output_tokens<br/>gen_ai.response.finish_reasons
    Instr-->>Client: Return Response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(ai-semantic-conventions, instrumentation-anthropic): align GenAI semconv with OTel 1.40 and migrate Anthropic instrumentation #896 — Parallel migration of Anthropic instrumentation to the same GenAI semantic conventions v1.40.0, reusing expanded instrumentation-utils for content-block mapping and shared message-building patterns.

Suggested reviewers

galkleinman
galzilber
doronkopit5
OzBenSimhonTraceloop

Poem

🐰 OpenAI now speaks in OTel's tongue,
Messages and tokens perfectly strung,
No more legacy attributes to confuse,
Structured conventions—the right path to choose!
hops 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: migrating OpenAI instrumentation to use OTel 1.40 semantic conventions, which is reflected throughout the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

packages/instrumentation-openai/src/instrumentation.ts (1)

872-874: OpenRouter provider uses raw string instead of constant.

Unlike other providers that use GEN_AI_PROVIDER_NAME_VALUE_* constants, OpenRouter returns a hardcoded "openrouter" string. This is acceptable since there's no OTel standard constant for OpenRouter. Consider defining a local constant for consistency if more non-standard providers are added.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/src/instrumentation.ts` around lines 872 -
874, The return for the OpenRouter detection currently uses a hardcoded string
in the conditional that checks baseURL.includes("openrouter") and returns {
provider: "openrouter" }; define a local constant (e.g.,
GEN_AI_PROVIDER_NAME_VALUE_OPENROUTER) near the other
GEN_AI_PROVIDER_NAME_VALUE_* constants and replace the literal with that
constant in the detection branch inside instrumentation.ts so the provider name
follows the same constant-based style as other providers (update the
conditional/return that uses baseURL.includes("openrouter")).

packages/instrumentation-openai/src/message-helpers.ts (1)

270-277: Audio response handling missing mime_type.

The blob part for audio responses includes modality: "audio" but no mime_type. OpenAI's audio response format isn't captured here. Consider adding mime_type if available from message.audio.format or defaulting to a common format.

♻️ Proposed fix to include mime_type

   // Audio response
   if (message.audio?.data) {
     parts.push({
       type: "blob",
       modality: "audio",
+      mime_type: `audio/${message.audio.format || "wav"}`,
       content: message.audio.data,
     });
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/src/message-helpers.ts` around lines 270 -
277, The audio blob part currently pushed by parts.push({ type: "blob",
modality: "audio", content: message.audio.data }) lacks a mime_type; update the
logic that handles message.audio?.data to include a mime_type property (prefer
message.audio.format or message.audio.mime_type if present, otherwise default to
a sensible value like "audio/mpeg" or "audio/wav") when constructing the blob
object so the audio modality and format are both captured.

packages/instrumentation-utils/src/content-block-mappers.ts (1)

185-200: Data URI parsing may miss edge cases.

The regex /^data:([^;]+);base64,(.+)$/ assumes:

A semicolon always separates MIME type from encoding

Encoding is always base64

Some valid data URIs use different formats (e.g., data:text/plain,Hello without encoding, or data:image/png;charset=utf-8;base64,...). However, for OpenAI image inputs, base64 is the standard, so this is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-utils/src/content-block-mappers.ts` around lines 185
- 200, The data-URI handling in the "image_url" case currently uses
url.match(/^data:([^;]+);base64,(.+)$/) which misses variants like charset
parameters; update the logic in the image_url branch (the url variable and the
match check) to only treat as a blob when the data URI actually contains a
base64 segment and to extract the MIME type even if additional params exist
(e.g., handle patterns with extra params before ;base64), otherwise fall back to
returning the URI; implement this by loosening the match to allow optional
params before the final ";base64" (case-insensitive) or by parsing the data URI
components and checking for a base64 marker, then populate mime_type and content
only when base64 is confirmed.

packages/instrumentation-openai/src/image-wrappers.ts (1)

229-245: JSON parse/stringify pattern is fragile but functional.

The code parses the existing ATTR_GEN_AI_INPUT_MESSAGES JSON, appends a new message, and re-serializes. This works but has risks:

If existingMessages is malformed, JSON.parse throws
Type safety is lost during the parse

Consider building the messages array in memory before serializing once, rather than parsing/modifying/re-serializing.

♻️ Suggested refactor to avoid parse/stringify cycle

 export async function setImageEditRequestAttributes(
   span: Span,
   params: ImageEditParams,
   uploadCallback?: ImageUploadCallback,
 ): Promise<void> {
   const attributes: Attributes = {};
+  const inputMessages: object[] = [];

   if (params.model) {
     attributes[ATTR_GEN_AI_REQUEST_MODEL] = params.model;
   }

   // ... other attributes ...

   if (params.prompt) {
-    attributes[ATTR_GEN_AI_INPUT_MESSAGES] = JSON.stringify([
-      { role: "user", parts: [{ type: "text", content: params.prompt }] },
-    ]);
+    inputMessages.push({
+      role: "user",
+      parts: [{ type: "text", content: params.prompt }],
+    });
   }

   // Process input image if upload callback is available
   if (params.image && uploadCallback && ...) {
     // ...
     if (imageUrl) {
-      // Add the image as a second input message
-      const existingMessages = attributes[ATTR_GEN_AI_INPUT_MESSAGES];
-      if (existingMessages) {
-        const parsed = JSON.parse(existingMessages as string);
-        parsed.push({
-          role: "user",
-          parts: [{ type: "uri", modality: "image", uri: imageUrl }],
-        });
-        attributes[ATTR_GEN_AI_INPUT_MESSAGES] = JSON.stringify(parsed);
-      } else {
-        attributes[ATTR_GEN_AI_INPUT_MESSAGES] = JSON.stringify([
-          {
-            role: "user",
-            parts: [{ type: "uri", modality: "image", uri: imageUrl }],
-          },
-        ]);
-      }
+      inputMessages.push({
+        role: "user",
+        parts: [{ type: "uri", modality: "image", uri: imageUrl }],
+      });
     }
   }

+  if (inputMessages.length > 0) {
+    attributes[ATTR_GEN_AI_INPUT_MESSAGES] = JSON.stringify(inputMessages);
+  }

   // ... set attributes on span ...
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/src/image-wrappers.ts` around lines 229 -
245, The current code directly
JSON.parse(attributes[ATTR_GEN_AI_INPUT_MESSAGES]) which will throw on malformed
JSON and loses type safety; instead, read attributes[ATTR_GEN_AI_INPUT_MESSAGES]
into a local variable, attempt a guarded parse (try/catch) or validate its type,
fall back to an empty array on failure, then create a newMessages array in
memory, push the new message object ({ role: "user", parts: [{ type: "uri",
modality: "image", uri: imageUrl }] }) into that array, and finally set
attributes[ATTR_GEN_AI_INPUT_MESSAGES] = JSON.stringify(newMessages); use
ATTR_GEN_AI_INPUT_MESSAGES, attributes, and imageUrl to locate and replace the
parse/modify/re-serialize pattern.

packages/instrumentation-openai/test/instrumentation.test.ts (1)

182-186: Inconsistent type comparison for token values.

Token assertions mix string and number comparisons across tests:

Lines 183-184: "15" (string)

Line 231: "8" (string)

Line 414: 82 (number)

Line 492: 82 (number)

While assert.equal uses loose equality so these will pass, the inconsistency suggests the implementation may return different types in different contexts, or this is a test oversight. Consider normalizing to a consistent type.
Suggested fix for consistency
     assert.equal(
       completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS],
-      "15",
+      15,
     );
Apply similar changes to line 231 ("8" → 8).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/test/instrumentation.test.ts` around lines
182 - 186, The token assertions are inconsistent: some tests expect strings
("15", "8") while others expect numbers (82); update the tests to normalize
types by comparing numeric values consistently. In the assertions that reference
completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS] and
ATTR_GEN_AI_USAGE_OUTPUT_TOKENS, convert the asserted expected values to numbers
(e.g., 15, 8) or coerce the actual attribute with + before asserting so all
tests use numeric comparisons consistently; ensure you update both occurrences
(the lines using "15" and "8") to match the numeric style used elsewhere.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/traceloop-sdk/package.json`:
- Line 122: The lockfile is out of sync with package.json's "openai" dependency
("openai": "^6.32.0"); run pnpm install at the repository root to regenerate the
pnpm lockfile (pnpm-lock.yaml), verify the updated lockfile includes the new
openai version, and commit the updated lockfile so CI's pnpm install
--frozen-lockfile succeeds.

---

Nitpick comments:
In `@packages/instrumentation-openai/src/image-wrappers.ts`:
- Around line 229-245: The current code directly
JSON.parse(attributes[ATTR_GEN_AI_INPUT_MESSAGES]) which will throw on malformed
JSON and loses type safety; instead, read attributes[ATTR_GEN_AI_INPUT_MESSAGES]
into a local variable, attempt a guarded parse (try/catch) or validate its type,
fall back to an empty array on failure, then create a newMessages array in
memory, push the new message object ({ role: "user", parts: [{ type: "uri",
modality: "image", uri: imageUrl }] }) into that array, and finally set
attributes[ATTR_GEN_AI_INPUT_MESSAGES] = JSON.stringify(newMessages); use
ATTR_GEN_AI_INPUT_MESSAGES, attributes, and imageUrl to locate and replace the
parse/modify/re-serialize pattern.

In `@packages/instrumentation-openai/src/instrumentation.ts`:
- Around line 872-874: The return for the OpenRouter detection currently uses a
hardcoded string in the conditional that checks baseURL.includes("openrouter")
and returns { provider: "openrouter" }; define a local constant (e.g.,
GEN_AI_PROVIDER_NAME_VALUE_OPENROUTER) near the other
GEN_AI_PROVIDER_NAME_VALUE_* constants and replace the literal with that
constant in the detection branch inside instrumentation.ts so the provider name
follows the same constant-based style as other providers (update the
conditional/return that uses baseURL.includes("openrouter")).

In `@packages/instrumentation-openai/src/message-helpers.ts`:
- Around line 270-277: The audio blob part currently pushed by parts.push({
type: "blob", modality: "audio", content: message.audio.data }) lacks a
mime_type; update the logic that handles message.audio?.data to include a
mime_type property (prefer message.audio.format or message.audio.mime_type if
present, otherwise default to a sensible value like "audio/mpeg" or "audio/wav")
when constructing the blob object so the audio modality and format are both
captured.

In `@packages/instrumentation-openai/test/instrumentation.test.ts`:
- Around line 182-186: The token assertions are inconsistent: some tests expect
strings ("15", "8") while others expect numbers (82); update the tests to
normalize types by comparing numeric values consistently. In the assertions that
reference completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS] and
ATTR_GEN_AI_USAGE_OUTPUT_TOKENS, convert the asserted expected values to numbers
(e.g., 15, 8) or coerce the actual attribute with + before asserting so all
tests use numeric comparisons consistently; ensure you update both occurrences
(the lines using "15" and "8") to match the numeric style used elsewhere.

In `@packages/instrumentation-utils/src/content-block-mappers.ts`:
- Around line 185-200: The data-URI handling in the "image_url" case currently
uses url.match(/^data:([^;]+);base64,(.+)$/) which misses variants like charset
parameters; update the logic in the image_url branch (the url variable and the
match check) to only treat as a blob when the data URI actually contains a
base64 segment and to extract the MIME type even if additional params exist
(e.g., handle patterns with extra params before ;base64), otherwise fall back to
returning the URI; implement this by loosening the match to allow optional
params before the final ";base64" (case-insensitive) or by parsing the data URI
components and checking for a base64 marker, then populate mime_type and content
only when base64 is confirmed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f4c7037-eaba-4676-8e22-a0047995420a

📥 Commits

Reviewing files that changed from the base of the PR and between 643425a and 20e53cf.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (9)

packages/instrumentation-openai/package.json
packages/instrumentation-openai/src/image-wrappers.ts
packages/instrumentation-openai/src/instrumentation.ts
packages/instrumentation-openai/src/message-helpers.ts
packages/instrumentation-openai/test/instrumentation.test.ts
packages/instrumentation-utils/src/content-block-mappers.ts
packages/instrumentation-utils/src/index.ts
packages/sample-app/package.json
packages/traceloop-sdk/package.json

nina-kollman · 2026-03-29T14:34:47Z

-    attributes[`${ATTR_GEN_AI_PROMPT}.0.content`] = params.prompt;
-    attributes[`${ATTR_GEN_AI_PROMPT}.0.role`] = "user";
+    attributes[ATTR_GEN_AI_INPUT_MESSAGES] = JSON.stringify([
+      { role: "user", parts: [{ type: "text", content: params.prompt }] },


I wonder if it would be cleaner to have a util function the creates this json and it will be used all over the place.
WDYT?

Not worth extracting further — it's only 2 places, and they're building arrays they may extend with image parts. A function that returns one message object would save almost nothing and add indirection.

nina-kollman · 2026-03-29T14:39:44Z

+ * @param messages - The messages array from the OpenAI chat completion request
+ * @returns Array of OTel-shaped chat messages
+ */
+export function buildOpenAIInputMessages(messages: any[]): OTelChatMessage[] {


This is not in use

It's in use here:
https://github.com/dvirski/openllmetry-js/blob/feat-migrate-openai-semcov-1.40/packages/instrumentation-openai/src/instrumentation.ts#L369

nina-kollman · 2026-03-29T14:40:04Z

+ * @param finishReasonMap - Mapping of OpenAI finish reasons to OTel standard values
+ * @returns Array with a single OTelOutputMessage
+ */
+export function buildOpenAIOutputMessage(


This is also not in use

It's in use here:
https://github.com/dvirski/openllmetry-js/blob/feat-migrate-openai-semcov-1.40/packages/instrumentation-openai/src/instrumentation.ts#L719

nina-kollman · 2026-03-29T14:40:49Z

+ * Maps a single OpenAI SDK content part to its OTel-compliant part object.
+ * Used for input content parts in user messages.
+ */
+export function mapOpenAIContentBlock(block: any): object {


It's in use here:
https://github.com/dvirski/openllmetry-js/blob/feat-migrate-openai-semcov-1.40/packages/instrumentation-openai/src/message-helpers.ts#L77

And in more places in this file

…1.40 semantic convetions

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

packages/instrumentation-openai/src/image-wrappers.ts (2)
355-403: ⚠️ Potential issue | 🟠 Major

ATTR_GEN_AI_OUTPUT_MESSAGES only captures the first generated image.

setImageGenerationResponseAttributes() is shared by generate/edit/variation, but this block always reads response.data[0] even though token accounting uses the full response.data.length. For n > 1, the span silently drops the rest of the model output, so downstream viewers cannot reconstruct the actual response payload.

Based on learnings, "Instrumentations must extract request/response data and token usage from wrapped calls."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/src/image-wrappers.ts` around lines 355 -
403, setImageGenerationResponseAttributes currently only reads response.data[0]
and so ATTR_GEN_AI_OUTPUT_MESSAGES drops all but the first image; iterate
response.data instead of using firstImage: for each item in response.data call
uploadCallback (or fallback to item.url) and build the parts array with one
entry per image ({type:"uri", modality:"image", uri}). Replace the single-part
payload assignment with the assembled parts list and stringify that into
ATTR_GEN_AI_OUTPUT_MESSAGES; reference the setImageGenerationResponseAttributes
function, ATTR_GEN_AI_OUTPUT_MESSAGES, uploadCallback, and response.data when
making the change.
321-325: ⚠️ Potential issue | 🟠 Major

gpt-image-1 still uses the fallback token cost.

calculateImageGenerationTokens() documents gpt-image-1 pricing above, but the helper only branches on DALL-E models. Because this result is now written directly into the new usage attrs, any gpt-image-1 span reports 1056 output tokens regardless of size/quality, and total tokens inherit the same error.

Based on learnings, "Instrumentations must extract request/response data and token usage from wrapped calls."

Also applies to: 344-350
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/src/image-wrappers.ts` around lines 321 -
325, The image token calculation is using the DALL·E fallback for all image
models, causing gpt-image-1 spans to always report the same output tokens;
update the logic in calculateImageGenerationTokens (or immediately before it's
used where completionTokens is assigned) to detect the model name 'gpt-image-1'
and apply its correct pricing/token computation (based on response size/quality
as documented) instead of the DALL·E branch, and ensure the computed value is
written to ATTR_GEN_AI_USAGE_OUTPUT_TOKENS and that
ATTR_GEN_AI_USAGE_TOTAL_TOKENS is updated accordingly; also apply the same fix
to the other usage calculation site that mirrors this code path so both output
and total token attributes reflect gpt-image-1 correctly.

🧹 Nitpick comments (2)

packages/instrumentation-openai/test/instrumentation.test.ts (2)

385-388: Use the shared tool-definition attribute constant here.

The runtime side already relies on ATTR_GEN_AI_TOOL_DEFINITIONS; keeping the raw "gen_ai.tool.definitions" literal in these assertions makes the tests drift-prone for no gain.

♻️ Suggested change

 import {
   ATTR_GEN_AI_REQUEST_MODEL,
   ATTR_GEN_AI_PROVIDER_NAME,
   ATTR_GEN_AI_OPERATION_NAME,
   ATTR_GEN_AI_INPUT_MESSAGES,
   ATTR_GEN_AI_OUTPUT_MESSAGES,
   ATTR_GEN_AI_USAGE_INPUT_TOKENS,
   ATTR_GEN_AI_USAGE_OUTPUT_TOKENS,
   ATTR_GEN_AI_RESPONSE_FINISH_REASONS,
+  ATTR_GEN_AI_TOOL_DEFINITIONS,
 } from "@opentelemetry/semantic-conventions/incubating";
@@
-    const toolDefs = JSON.parse(
-      completionSpan.attributes["gen_ai.tool.definitions"] as string,
-    );
+    const toolDefs = JSON.parse(
+      completionSpan.attributes[ATTR_GEN_AI_TOOL_DEFINITIONS] as string,
+    );

Also applies to: 467-470

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/test/instrumentation.test.ts` around lines
385 - 388, Replace the hard-coded attribute key string usage
completionSpan.attributes["gen_ai.tool.definitions"] with the shared constant
ATTR_GEN_AI_TOOL_DEFINITIONS (import it from its module if not already) in the
test, and do the same for the second occurrence around the other assertion
(lines noted in the review); update both assertions to read
completionSpan.attributes[ATTR_GEN_AI_TOOL_DEFINITIONS] so the tests use the
runtime-shared key constant.

649-656: The image edit/variation tests still don't guard the migrated operation attribute.

image_generation now validates ATTR_GEN_AI_OPERATION_NAME, but the edit test still checks the legacy gen_ai.request.type key and the variation test doesn't assert an operation at all. Once these are re-enabled, they still won't catch regressions in the semconv migration.

♻️ Suggested change

     assert.strictEqual(
-      editSpan.attributes["gen_ai.request.type"],
+      editSpan.attributes[ATTR_GEN_AI_OPERATION_NAME],
       "image_edit",
     );
@@
+    assert.strictEqual(
+      variationSpan.attributes[ATTR_GEN_AI_OPERATION_NAME],
+      "image_variation",
+    );

Also applies to: 690-702

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/test/instrumentation.test.ts` around lines
649 - 656, Update the image edit and image variation tests to assert the
migrated semantic convention attribute ATTR_GEN_AI_OPERATION_NAME instead of (or
in addition to) the legacy "gen_ai.request.type" key: for the edit test
(referencing editSpan and ATTR_GEN_AI_PROVIDER_NAME) assert
ATTR_GEN_AI_OPERATION_NAME === "image.edit" and for the variation test assert
ATTR_GEN_AI_OPERATION_NAME === "image.variation" (also ensure the provider
assertion still checks ATTR_GEN_AI_PROVIDER_NAME === "openai"); replace the old
"gen_ai.request.type" checks at the edit and variation locations so the tests
will catch regressions in the semconv migration.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/instrumentation-openai/src/image-wrappers.ts`:
- Around line 355-403: setImageGenerationResponseAttributes currently only reads
response.data[0] and so ATTR_GEN_AI_OUTPUT_MESSAGES drops all but the first
image; iterate response.data instead of using firstImage: for each item in
response.data call uploadCallback (or fallback to item.url) and build the parts
array with one entry per image ({type:"uri", modality:"image", uri}). Replace
the single-part payload assignment with the assembled parts list and stringify
that into ATTR_GEN_AI_OUTPUT_MESSAGES; reference the
setImageGenerationResponseAttributes function, ATTR_GEN_AI_OUTPUT_MESSAGES,
uploadCallback, and response.data when making the change.
- Around line 321-325: The image token calculation is using the DALL·E fallback
for all image models, causing gpt-image-1 spans to always report the same output
tokens; update the logic in calculateImageGenerationTokens (or immediately
before it's used where completionTokens is assigned) to detect the model name
'gpt-image-1' and apply its correct pricing/token computation (based on response
size/quality as documented) instead of the DALL·E branch, and ensure the
computed value is written to ATTR_GEN_AI_USAGE_OUTPUT_TOKENS and that
ATTR_GEN_AI_USAGE_TOTAL_TOKENS is updated accordingly; also apply the same fix
to the other usage calculation site that mirrors this code path so both output
and total token attributes reflect gpt-image-1 correctly.

---

Nitpick comments:
In `@packages/instrumentation-openai/test/instrumentation.test.ts`:
- Around line 385-388: Replace the hard-coded attribute key string usage
completionSpan.attributes["gen_ai.tool.definitions"] with the shared constant
ATTR_GEN_AI_TOOL_DEFINITIONS (import it from its module if not already) in the
test, and do the same for the second occurrence around the other assertion
(lines noted in the review); update both assertions to read
completionSpan.attributes[ATTR_GEN_AI_TOOL_DEFINITIONS] so the tests use the
runtime-shared key constant.
- Around line 649-656: Update the image edit and image variation tests to assert
the migrated semantic convention attribute ATTR_GEN_AI_OPERATION_NAME instead of
(or in addition to) the legacy "gen_ai.request.type" key: for the edit test
(referencing editSpan and ATTR_GEN_AI_PROVIDER_NAME) assert
ATTR_GEN_AI_OPERATION_NAME === "image.edit" and for the variation test assert
ATTR_GEN_AI_OPERATION_NAME === "image.variation" (also ensure the provider
assertion still checks ATTR_GEN_AI_PROVIDER_NAME === "openai"); replace the old
"gen_ai.request.type" checks at the edit and variation locations so the tests
will catch regressions in the semconv migration.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 52d3fe45-dc87-4fa7-90bc-93ce84db9aa0

📥 Commits

Reviewing files that changed from the base of the PR and between 20e53cf and 6fcadd6.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (9)

packages/instrumentation-openai/package.json
packages/instrumentation-openai/src/image-wrappers.ts
packages/instrumentation-openai/src/instrumentation.ts
packages/instrumentation-openai/src/message-helpers.ts
packages/instrumentation-openai/test/instrumentation.test.ts
packages/instrumentation-utils/src/content-block-mappers.ts
packages/instrumentation-utils/src/index.ts
packages/sample-app/package.json
packages/traceloop-sdk/package.json

✅ Files skipped from review due to trivial changes (3)

packages/traceloop-sdk/package.json
packages/instrumentation-openai/package.json
packages/instrumentation-openai/src/message-helpers.ts

🚧 Files skipped from review as they are similar to previous changes (4)

packages/instrumentation-utils/src/index.ts
packages/sample-app/package.json
packages/instrumentation-utils/src/content-block-mappers.ts
packages/instrumentation-openai/src/instrumentation.ts

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/instrumentation-openai/test/instrumentation.test.ts (1)
608-612: ⚠️ Potential issue | 🟠 Major

Add constants for image attributes to SemanticAttributes and use them throughout.

Four image attribute keys are hardcoded as strings in both the implementation (image-wrappers.ts) and test file:

gen_ai.request.image.size

gen_ai.request.image.count

gen_ai.request.image.quality

gen_ai.request.image.style

Per coding guidelines, these should be defined as constants in packages/ai-semantic-conventions/src/SemanticAttributes.ts and imported in packages/instrumentation-openai/src/image-wrappers.ts and the test file rather than hardcoded.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/test/instrumentation.test.ts` around lines
608 - 612, Add four new constants in the SemanticAttributes export (e.g.,
IMAGE_SIZE, IMAGE_COUNT, IMAGE_QUALITY, IMAGE_STYLE) with values
"gen_ai.request.image.size", "gen_ai.request.image.count",
"gen_ai.request.image.quality", and "gen_ai.request.image.style"; then replace
the hardcoded string keys in the image wrappers module (image-wrappers.ts) and
the test file (instrumentation.test.ts) to import and use these new
SemanticAttributes constants instead of literal strings so all references share
the canonical identifiers.

🧹 Nitpick comments (2)

packages/instrumentation-openai/test/instrumentation.test.ts (2)

385-393: Hardcoded attribute string violates coding guidelines.

"gen_ai.tool.definitions" should use a constant from @traceloop/ai-semantic-conventions or @opentelemetry/semantic-conventions/incubating rather than a hardcoded string. The same issue appears at line 469.

♻️ Suggested fix

Import and use the constant:

 import {
   ATTR_GEN_AI_REQUEST_MODEL,
   ATTR_GEN_AI_PROVIDER_NAME,
   ATTR_GEN_AI_OPERATION_NAME,
   ATTR_GEN_AI_INPUT_MESSAGES,
   ATTR_GEN_AI_OUTPUT_MESSAGES,
   ATTR_GEN_AI_USAGE_INPUT_TOKENS,
   ATTR_GEN_AI_USAGE_OUTPUT_TOKENS,
   ATTR_GEN_AI_RESPONSE_FINISH_REASONS,
+  ATTR_GEN_AI_TOOL_DEFINITIONS,
 } from "@opentelemetry/semantic-conventions/incubating";

Then replace:

-    const toolDefs = JSON.parse(
-      completionSpan.attributes["gen_ai.tool.definitions"] as string,
-    );
+    const toolDefs = JSON.parse(
+      completionSpan.attributes[ATTR_GEN_AI_TOOL_DEFINITIONS] as string,
+    );

As per coding guidelines: "Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/test/instrumentation.test.ts` around lines
385 - 393, The test currently uses a hardcoded attribute key
completionSpan.attributes["gen_ai.tool.definitions"]; update the test to import
the AI semantic attribute constant (e.g., GEN_AI_TOOL_DEFINITIONS) from
`@traceloop/ai-semantic-conventions` (or the appropriate `@opentelemetry` incubating
constant) and replace the hardcoded string usages with that constant in the
assertions for completionSpan.attributes; also update the other occurrence noted
in the file (the similar use around line ~469) to use the same imported constant
so both tests follow the coding guideline.

638-643: Type assertion bypasses type safety.

Using (openai as any) hides potential API mismatches. Since these tests are skipped, consider adding a comment explaining why the type assertion is needed (e.g., API method deprecated, type definitions outdated) or removing the skipped tests if they're no longer relevant.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/test/instrumentation.test.ts` around lines
638 - 643, The test uses a type-escaping cast "(openai as any)" when calling
images.edit with mockImageFile which hides API/type mismatches; either remove
the skipped test if it's obsolete or make the intent explicit by adding a
one-line comment above the call explaining why the assertion is necessary (e.g.,
deprecated API, outdated types) and, if you want safer typing, replace the broad
any with a targeted cast to the expected client interface (reference the openai
variable and the images.edit invocation and mockImageFile) so the reason and
scope of the bypass are clear.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/instrumentation-openai/test/instrumentation.test.ts`:
- Around line 188-192: Tests assert token attributes as strings but
instrumentation sets them as numbers; update the assertions to expect numeric
values. In the test file change assertions referencing
completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS] and
ATTR_GEN_AI_USAGE_OUTPUT_TOKENS (and the similar string-quoted checks) to
compare numeric values (e.g., remove quotes or coerce to Number) so they match
how enrichTokens/tokenCountFromString and
result.usage?.prompt_tokens/completion_tokens set attributes as numbers.

---

Outside diff comments:
In `@packages/instrumentation-openai/test/instrumentation.test.ts`:
- Around line 608-612: Add four new constants in the SemanticAttributes export
(e.g., IMAGE_SIZE, IMAGE_COUNT, IMAGE_QUALITY, IMAGE_STYLE) with values
"gen_ai.request.image.size", "gen_ai.request.image.count",
"gen_ai.request.image.quality", and "gen_ai.request.image.style"; then replace
the hardcoded string keys in the image wrappers module (image-wrappers.ts) and
the test file (instrumentation.test.ts) to import and use these new
SemanticAttributes constants instead of literal strings so all references share
the canonical identifiers.

---

Nitpick comments:
In `@packages/instrumentation-openai/test/instrumentation.test.ts`:
- Around line 385-393: The test currently uses a hardcoded attribute key
completionSpan.attributes["gen_ai.tool.definitions"]; update the test to import
the AI semantic attribute constant (e.g., GEN_AI_TOOL_DEFINITIONS) from
`@traceloop/ai-semantic-conventions` (or the appropriate `@opentelemetry` incubating
constant) and replace the hardcoded string usages with that constant in the
assertions for completionSpan.attributes; also update the other occurrence noted
in the file (the similar use around line ~469) to use the same imported constant
so both tests follow the coding guideline.
- Around line 638-643: The test uses a type-escaping cast "(openai as any)" when
calling images.edit with mockImageFile which hides API/type mismatches; either
remove the skipped test if it's obsolete or make the intent explicit by adding a
one-line comment above the call explaining why the assertion is necessary (e.g.,
deprecated API, outdated types) and, if you want safer typing, replace the broad
any with a targeted cast to the expected client interface (reference the openai
variable and the images.edit invocation and mockImageFile) so the reason and
scope of the bypass are clear.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0796ca96-1e4c-4b39-bcf7-c1f8f93e2bc0

📥 Commits

Reviewing files that changed from the base of the PR and between 6fcadd6 and 34529f2.

📒 Files selected for processing (2)

packages/instrumentation-openai/src/image-wrappers.ts
packages/instrumentation-openai/test/instrumentation.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

packages/instrumentation-openai/src/image-wrappers.ts

coderabbitai · 2026-03-29T16:45:50Z

    assert.equal(
-      completionSpan.attributes[`${ATTR_GEN_AI_USAGE_PROMPT_TOKENS}`],
-      "8",
-    );
-    assert.ok(
-      +completionSpan.attributes[`${ATTR_GEN_AI_USAGE_COMPLETION_TOKENS}`]! > 0,
+      completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS],
+      "15",
    );
+    assert.ok(+completionSpan.attributes[ATTR_GEN_AI_USAGE_OUTPUT_TOKENS]! > 0);


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check how token attributes are being set in the instrumentation rg -n "ATTR_GEN_AI_USAGE_INPUT_TOKENS|ATTR_GEN_AI_USAGE_OUTPUT_TOKENS" --type ts -A2 -B2

Repository: traceloop/openllmetry-js

Length of output: 25662

🏁 Script executed:

cat -n packages/instrumentation-openai/test/instrumentation.test.ts | sed -n '170,200p'

Repository: traceloop/openllmetry-js

Length of output: 1256

🏁 Script executed:

cat -n packages/instrumentation-openai/test/instrumentation.test.ts | sed -n '220,245p'

Repository: traceloop/openllmetry-js

Length of output: 1140

🏁 Script executed:

cat -n packages/instrumentation-openai/test/instrumentation.test.ts | sed -n '400,425p'

Repository: traceloop/openllmetry-js

Length of output: 1176

🏁 Script executed:

cat -n packages/instrumentation-openai/src/instrumentation.ts | sed -n '690,720p'

Repository: traceloop/openllmetry-js

Length of output: 1325

🏁 Script executed:

# Look for streaming-specific token handling rg -n "streaming|stream" packages/instrumentation-openai/src/instrumentation.ts -A 3 -B 3 | head -80

Repository: traceloop/openllmetry-js

Length of output: 922

🏁 Script executed:

cat -n packages/instrumentation-openai/src/instrumentation.ts | sed -n '422,550p'

Repository: traceloop/openllmetry-js

Length of output: 5078

🏁 Script executed:

cat -n packages/instrumentation-openai/test/instrumentation.test.ts | sed -n '140,200p'

Repository: traceloop/openllmetry-js

Length of output: 2245

🏁 Script executed:

cat -n packages/instrumentation-openai/test/instrumentation.test.ts | sed -n '195,245p'

Repository: traceloop/openllmetry-js

Length of output: 2115

🏁 Script executed:

cat -n packages/instrumentation-openai/test/instrumentation.test.ts | sed -n '1,120p'

Repository: traceloop/openllmetry-js

Length of output: 4622

🏁 Script executed:

rg -n "toString|String|\.split|String\(" packages/instrumentation-openai/src/ -A 2 -B 2 | grep -A 5 -B 5 "token"

Repository: traceloop/openllmetry-js

Length of output: 2379

🏁 Script executed:

# Check the recorded mock responses to see what type tokens are find packages/instrumentation-openai/test/recordings -name "*.json" -exec grep -l "prompt_tokens\|completion_tokens" {} \; 2>/dev/null | head -3

Repository: traceloop/openllmetry-js

Length of output: 50

🏁 Script executed:

cat -n packages/instrumentation-openai/src/instrumentation.ts | sed -n '528,560p'

Repository: traceloop/openllmetry-js

Length of output: 1198

🏁 Script executed:

# Look at lines 416 and 492 test context cat -n packages/instrumentation-openai/test/instrumentation.test.ts | sed -n '360,420p'

Repository: traceloop/openllmetry-js

Length of output: 2453

🏁 Script executed:

# Check if there's any token coercion or conversion happening in the span attribute setter rg -n "setAttribute" packages/instrumentation-openai/src/instrumentation.ts | head -20

Repository: traceloop/openllmetry-js

Length of output: 506

🏁 Script executed:

# Verify the enrichTokens test configuration cat -n packages/instrumentation-openai/test/instrumentation.test.ts | sed -n '99,110p'

Repository: traceloop/openllmetry-js

Length of output: 631

Token value assertions use inconsistent types; tests expect strings but instrumentation sets numbers.

Lines 189-190 and 236-237 assert token values as strings ("15" and "8"), while lines 416 and 492 assert them as numbers (82). The instrumentation code at lines 701-703 consistently sets these attributes directly from result.usage?.prompt_tokens and result.usage?.completion_tokens, which are numbers. When enrichTokens is enabled (as configured in all tests), tokens are computed numerically via tokenCountFromString() and stored as numbers (lines 543-547). The string assertions should be changed to expect numbers.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/instrumentation-openai/test/instrumentation.test.ts` around lines 188 - 192, Tests assert token attributes as strings but instrumentation sets them as numbers; update the assertions to expect numeric values. In the test file change assertions referencing completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS] and ATTR_GEN_AI_USAGE_OUTPUT_TOKENS (and the similar string-quoted checks) to compare numeric values (e.g., remove quotes or coerce to Number) so they match how enrichTokens/tokenCountFromString and result.usage?.prompt_tokens/completion_tokens set attributes as numbers.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

packages/instrumentation-openai/src/image-wrappers.ts (2)

355-404: ⚠️ Potential issue | 🟠 Major

Export all generated images, not just response.data[0].

When n > 1, this serializes a single assistant image even though the API returned multiple results. The span then loses most of the response payload while gen_ai.request.image.count and token usage still reflect all images.

Based on learnings, "Instrumentations must extract request/response data and token usage from wrapped calls".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/src/image-wrappers.ts` around lines 355 -
404, The code only serializes response.data[0]; change it to iterate over all
entries in response.data, processing each item's b64_json or url similarly to
the current logic (use span.spanContext().traceId/spanId and uploadCallback when
provided, fetch+base64 when only url is present, fallback to the original url on
error), collect a URI for each image, and then set ATTR_GEN_AI_OUTPUT_MESSAGES
to a JSON string containing a single assistant message whose parts array
contains one { type: "uri", modality: "image", uri: ... } entry per image;
ensure per-image errors are caught and do not abort processing of remaining
images.

322-326: ⚠️ Potential issue | 🟠 Major

gpt-image-1 usage tokens still come out of the generic fallback.

calculateImageGenerationTokens() documents gpt-image-1 pricing, but this path still publishes the fallback 1056 value for any non-DALL·E model. That makes both ATTR_GEN_AI_USAGE_OUTPUT_TOKENS and SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS inaccurate for gpt-image-1 requests.

Possible fix

-  } else {
-    // Default fallback for unknown models
-    tokensPerImage = 1056;
+  } else if (model === "gpt-image-1") {
+    const gptImageCosts: Record<string, Record<string, number>> = {
+      low: {
+        "1024x1024": 272,
+        "1024x1536": 408,
+        "1536x1024": 400,
+      },
+      medium: {
+        "1024x1024": 1056,
+        "1024x1536": 1584,
+        "1536x1024": 1568,
+      },
+      high: {
+        "1024x1024": 4160,
+        "1024x1536": 6240,
+        "1536x1024": 6208,
+      },
+    };
+    tokensPerImage =
+      gptImageCosts[quality]?.[size] ?? gptImageCosts.medium["1024x1024"];
+  } else {
+    // Default fallback for unknown models
+    tokensPerImage = 1056;

Based on learnings, "Instrumentations must extract request/response data and token usage from wrapped calls".

Also applies to: 341-351

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/src/image-wrappers.ts` around lines 322 -
326, The image token count path currently falls back to the generic 1056 for
non-DALL·E models; update the two spots that set ATTR_GEN_AI_USAGE_OUTPUT_TOKENS
(the block using calculateImageGenerationTokens and the similar 341-351 block)
to detect and pass the actual image model identifier (e.g., params.model or
params.modelName) into calculateImageGenerationTokens so it can apply the
gpt-image-1 pricing branch, and ensure you compute the response byte/length from
response.data (or sum of response.data entries) for gpt-image-1 instead of using
the generic fallback so SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS reflects the
real token usage.

♻️ Duplicate comments (1)

packages/instrumentation-openai/test/instrumentation.test.ts (1)

188-192: ⚠️ Potential issue | 🟡 Minor

Tighten these token checks to numeric strictEqual.

These assertions still use string literals with assert.equal, so a numeric attribute passes via coercion. The rest of the file already treats token attrs as numbers, so this won't catch a type regression.

Suggested change

-    assert.equal(
-      completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS],
-      "15",
-    );
+    assert.strictEqual(
+      completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS],
+      15,
+    );
@@
-    assert.equal(
-      completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS],
-      "8",
-    );
+    assert.strictEqual(
+      completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS],
+      8,
+    );

Also applies to: 235-239

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/test/instrumentation.test.ts` around lines
188 - 192, The assertions use string coercion and non-strict equality; convert
token attributes to numbers and use strict assertions: replace checks on
completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS] with numeric
comparisons via Number(...) (or unary +) and assert.strictEqual to 15, and
replace the output token presence check on
completionSpan.attributes[ATTR_GEN_AI_USAGE_OUTPUT_TOKENS] with a strict numeric
assertion (e.g. assert.strictEqual(Number(... ) > 0, true) or
assert.ok(Number(... ) > 0) if you prefer truthiness). Do the same fixes for the
duplicated assertions later in the file (the other occurrences around the second
block at lines ~235-239) referencing the same ATTR_GEN_AI_USAGE_INPUT_TOKENS and
ATTR_GEN_AI_USAGE_OUTPUT_TOKENS symbols.

🧹 Nitpick comments (1)

packages/instrumentation-openai/test/instrumentation.test.ts (1)
629-667: The migrated image edit/variation paths still have no active regression coverage.

Both cases stay it.skip(...), and the skipped edit case still checks the legacy gen_ai.request.type field instead of the new operation attr. Right now the image-generation path is covered, but the two other image endpoints changed in this PR can still regress silently.

Also applies to: 669-706
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/instrumentation-openai/test/instrumentation.test.ts` around lines
629 - 667, Un-skip the image edit and image variation tests (remove it.skip) so
they run, and update their assertion that checks the legacy
"gen_ai.request.type" to assert the new operation attribute instead (use the new
operation attribute symbol/name used in your codebase, e.g.
ATTR_GEN_AI_OPERATION or the string key "gen_ai.operation") while keeping the
existing checks for ATTR_GEN_AI_PROVIDER_NAME, ATTR_GEN_AI_USAGE_OUTPUT_TOKENS,
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS and ATTR_GEN_AI_OUTPUT_MESSAGES; ensure
you locate the assertions in instrumentation.test.ts that reference it.skip,
memoryExporter.getFinishedSpans(), the editSpan variable, and the legacy
"gen_ai.request.type" to perform the replacement.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/instrumentation-openai/src/image-wrappers.ts`:
- Around line 355-404: The code only serializes response.data[0]; change it to
iterate over all entries in response.data, processing each item's b64_json or
url similarly to the current logic (use span.spanContext().traceId/spanId and
uploadCallback when provided, fetch+base64 when only url is present, fallback to
the original url on error), collect a URI for each image, and then set
ATTR_GEN_AI_OUTPUT_MESSAGES to a JSON string containing a single assistant
message whose parts array contains one { type: "uri", modality: "image", uri:
... } entry per image; ensure per-image errors are caught and do not abort
processing of remaining images.
- Around line 322-326: The image token count path currently falls back to the
generic 1056 for non-DALL·E models; update the two spots that set
ATTR_GEN_AI_USAGE_OUTPUT_TOKENS (the block using calculateImageGenerationTokens
and the similar 341-351 block) to detect and pass the actual image model
identifier (e.g., params.model or params.modelName) into
calculateImageGenerationTokens so it can apply the gpt-image-1 pricing branch,
and ensure you compute the response byte/length from response.data (or sum of
response.data entries) for gpt-image-1 instead of using the generic fallback so
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS reflects the real token usage.

---

Duplicate comments:
In `@packages/instrumentation-openai/test/instrumentation.test.ts`:
- Around line 188-192: The assertions use string coercion and non-strict
equality; convert token attributes to numbers and use strict assertions: replace
checks on completionSpan.attributes[ATTR_GEN_AI_USAGE_INPUT_TOKENS] with numeric
comparisons via Number(...) (or unary +) and assert.strictEqual to 15, and
replace the output token presence check on
completionSpan.attributes[ATTR_GEN_AI_USAGE_OUTPUT_TOKENS] with a strict numeric
assertion (e.g. assert.strictEqual(Number(... ) > 0, true) or
assert.ok(Number(... ) > 0) if you prefer truthiness). Do the same fixes for the
duplicated assertions later in the file (the other occurrences around the second
block at lines ~235-239) referencing the same ATTR_GEN_AI_USAGE_INPUT_TOKENS and
ATTR_GEN_AI_USAGE_OUTPUT_TOKENS symbols.

---

Nitpick comments:
In `@packages/instrumentation-openai/test/instrumentation.test.ts`:
- Around line 629-667: Un-skip the image edit and image variation tests (remove
it.skip) so they run, and update their assertion that checks the legacy
"gen_ai.request.type" to assert the new operation attribute instead (use the new
operation attribute symbol/name used in your codebase, e.g.
ATTR_GEN_AI_OPERATION or the string key "gen_ai.operation") while keeping the
existing checks for ATTR_GEN_AI_PROVIDER_NAME, ATTR_GEN_AI_USAGE_OUTPUT_TOKENS,
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS and ATTR_GEN_AI_OUTPUT_MESSAGES; ensure
you locate the assertions in instrumentation.test.ts that reference it.skip,
memoryExporter.getFinishedSpans(), the editSpan variable, and the legacy
"gen_ai.request.type" to perform the replacement.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 91ea0ab4-cc48-4fe7-a58e-df17baf2ee3a

📥 Commits

Reviewing files that changed from the base of the PR and between 34529f2 and 511c570.

📒 Files selected for processing (2)

packages/instrumentation-openai/src/image-wrappers.ts
packages/instrumentation-openai/test/instrumentation.test.ts

…tercept the new version http requests so tests fail

coderabbitai

🧹 Nitpick comments (3)

packages/traceloop-sdk/test/agent_decorator.test.ts (2)
281-289: Consider using the imported constant for consistency.

The comment helpfully explains that manual instrumentation uses the legacy indexed format. However, the hardcoded strings gen_ai.prompt.0.role and gen_ai.prompt.0.content could use the imported ATTR_GEN_AI_PROMPT constant for consistency:
-    assert.strictEqual(
-      completionSpan.attributes[`gen_ai.prompt.0.role`],
+    assert.strictEqual(
+      completionSpan.attributes[`${ATTR_GEN_AI_PROMPT}.0.role`],
This would require importing ATTR_GEN_AI_PROMPT (already imported in the other test file). As per coding guidelines, AI/LLM semantic attribute constants should be imported from the conventions package rather than hardcoding strings.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/test/agent_decorator.test.ts` around lines 281 - 289,
Replace the hardcoded legacy attribute keys in the assertions with the imported
ATTR_GEN_AI_PROMPT constant: import ATTR_GEN_AI_PROMPT at the top of the test
file and use string concatenation (e.g., ATTR_GEN_AI_PROMPT + ".0.role" and
ATTR_GEN_AI_PROMPT + ".0.content") when accessing completionSpan.attributes in
the assertions so the test uses the canonical constant instead of raw strings.
110-110: Prefix matching is appropriate but could be more specific.

The change from exact "openai.chat" to startsWith("chat ") aligns with the new span naming convention that includes the model (e.g., "chat gpt-4o-mini"). This is correct for the migration.

Consider whether a more specific pattern like matching against a regex ^chat .+ or checking for additional attributes would reduce false positive matches in more complex test scenarios.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/test/agent_decorator.test.ts` at line 110, The test is
using a broad prefix match for span names (spans.find(... startsWith("chat ")))
which may yield false positives; update the matcher to be more specific by using
a regex match like /^chat .+/ on span.name (e.g., replace the startsWith check
in the spans.find callback) and/or assert an additional distinguishing attribute
on the found span (e.g., check span.attributes.model or
span.attributes["telemetry.span_type"]) to ensure the correct "chat <model>"
span is selected in agent_decorator.test.ts.
packages/traceloop-sdk/test/decorators.test.ts (1)
571-580: Functional but fragile string search in JSON.

The approach of using .includes() on the raw JSON string works for these tests with unique content, but is somewhat fragile. A more robust approach would parse the JSON first:
const openAI1Span = spans.find((span) => {
  const input = span.attributes[ATTR_GEN_AI_INPUT_MESSAGES] as string;
  if (!input) return false;
  return JSON.parse(input).some((msg: any) => 
    msg.parts?.some((p: any) => p.content?.includes("OpenTelemetry"))
  );
});
However, for test code with known unique content strings, the current approach is acceptable and more concise.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/test/decorators.test.ts` around lines 571 - 580, The
test currently looks for substrings inside the stored JSON by using spans.find
with .includes on the ATTR_GEN_AI_INPUT_MESSAGES string which is fragile; change
the predicate to parse the JSON string from
span.attributes[ATTR_GEN_AI_INPUT_MESSAGES] (guarding for falsy), iterate the
array of messages and their parts, and check message.parts[].content for the
expected keyword (e.g., "OpenTelemetry" or "Typescript") so the test asserts
against parsed structure rather than raw string matching.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/traceloop-sdk/test/agent_decorator.test.ts`:
- Around line 281-289: Replace the hardcoded legacy attribute keys in the
assertions with the imported ATTR_GEN_AI_PROMPT constant: import
ATTR_GEN_AI_PROMPT at the top of the test file and use string concatenation
(e.g., ATTR_GEN_AI_PROMPT + ".0.role" and ATTR_GEN_AI_PROMPT + ".0.content")
when accessing completionSpan.attributes in the assertions so the test uses the
canonical constant instead of raw strings.
- Line 110: The test is using a broad prefix match for span names
(spans.find(... startsWith("chat "))) which may yield false positives; update
the matcher to be more specific by using a regex match like /^chat .+/ on
span.name (e.g., replace the startsWith check in the spans.find callback) and/or
assert an additional distinguishing attribute on the found span (e.g., check
span.attributes.model or span.attributes["telemetry.span_type"]) to ensure the
correct "chat <model>" span is selected in agent_decorator.test.ts.

In `@packages/traceloop-sdk/test/decorators.test.ts`:
- Around line 571-580: The test currently looks for substrings inside the stored
JSON by using spans.find with .includes on the ATTR_GEN_AI_INPUT_MESSAGES string
which is fragile; change the predicate to parse the JSON string from
span.attributes[ATTR_GEN_AI_INPUT_MESSAGES] (guarding for falsy), iterate the
array of messages and their parts, and check message.parts[].content for the
expected keyword (e.g., "OpenTelemetry" or "Typescript") so the test asserts
against parsed structure rather than raw string matching.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2b959867-d793-4bd0-9d22-c20b2bb111d9

📥 Commits

Reviewing files that changed from the base of the PR and between 511c570 and 0d61000.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (3)

packages/traceloop-sdk/test/agent_decorator.test.ts
packages/traceloop-sdk/test/decorators.test.ts
packages/traceloop-sdk/test/test-setup.ts

✅ Files skipped from review due to trivial changes (1)

packages/traceloop-sdk/test/test-setup.ts

dvirski requested review from OzBenSimhonTraceloop, doronkopit5, galkleinman, galzilber, lenatraceloop and nina-kollman March 29, 2026 11:27

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

Comment thread packages/traceloop-sdk/package.json Outdated

nina-kollman reviewed Mar 29, 2026

View reviewed changes

dvirski added 4 commits March 29, 2026 18:19

feat(instrumentation-openai): Migrate OpenAI instrumentation to OTel …

9b7f332

…1.40 semantic convetions

Fix prettier

eeb7d5e

update lock file for traceloop-sdk

1a62402

CR Changes

6fcadd6

dvirski force-pushed the feat-migrate-openai-semcov-1.40 branch from 915c2a8 to 6fcadd6 Compare March 29, 2026 15:20

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

dvirski added 2 commits March 29, 2026 19:41

CR Changes

34529f2

Prettier fixes

511c570

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

Reverted back openai sdk package back to 4.38.3, since polly can't in…

0d61000

…tercept the new version http requests so tests fail

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

dvirski added 5 commits March 30, 2026 13:36

Added tests for open ai builder and block mapper functions

3b6eff3

fixed style errors

b7482fb

semconv review changes

8ea6656

fixed styles

7319818

CR Fix

3d23f46

OzBenSimhonTraceloop approved these changes Mar 30, 2026

View reviewed changes

dvirski merged commit 26a0d01 into traceloop:main Mar 30, 2026
5 of 6 checks passed

This was referenced Mar 31, 2026

feat(instrumentation-bedrock): align GenAI semconv with OTel 1.40 and migrate Bedrock instrumentation #917

Merged

feat(instrumentation-langchain): Migrate langchain semantic conventions to 1.40 #920

Merged

This was referenced Apr 9, 2026

feat(ai-sdk): migrate ai-sdk to opentelemtry 1.40 semantic conventions #924

Merged

feat(instrumentation-llamaindex): migrate to OTel 1.40 GenAI semantic conventions #925

Merged

Conversation

dvirski commented Mar 29, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dvirski commented Mar 29, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 29, 2026 •

edited

Loading