fix(ai-sdk): report cached tokens usage by galkleinman · Pull Request #839 · traceloop/openllmetry-js

galkleinman · 2025-12-02T14:44:52Z

Important

Add attributes for cache token usage tracking in AI SDKs and update tests and dependencies accordingly.

Behavior:
- Add attributes GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS, GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, and GEN_AI_USAGE_REASONING_TOKENS to SemanticAttributes.ts for tracking cache token usage.
- Update transformProviderMetadata() in ai-sdk-transformations.ts to handle new attributes for Anthropic and OpenAI metadata.
- Remove tests for span attributes in instrumentation.test.ts.
Dependencies:
- Update @anthropic-ai/sdk to ^0.71.0 in instrumentation-anthropic/package.json and traceloop-sdk/package.json.
Tests:
- Add tests in ai-sdk-integration.test.ts for capturing and transforming Anthropic and OpenAI cache tokens.

^{This description was created by}^{for 54fc119. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

Release Notes

New Features
- Added cache token metrics tracking (cache creation and read input tokens) for improved cost analysis
- Added reasoning tokens support for enhanced usage monitoring
- Provider metadata now enriched into standard AI generation metrics
Bug Fixes
- Corrected initial values for cache-related metrics
Chores
- Updated Anthropic SDK dependency to latest version
- Expanded test coverage for cache token handling and provider integrations

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-02T14:44:59Z

Walkthrough

This PR adds support for caching and reasoning token attributes from AI providers. It introduces three new semantic constants for cache creation/read input tokens and reasoning tokens, updates the Anthropic SDK dependency, adds provider metadata transformation logic to map vendor-specific fields into standardized GenAI attributes, and updates tests to validate the new cache token capture for Anthropic and OpenAI.

Changes

Cohort / File(s)	Summary
Semantic Attribute Constants `packages/ai-semantic-conventions/src/SemanticAttributes.ts`	Added three new `SpanAttributes` constants: `GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS`, `GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS`, and `GEN_AI_USAGE_REASONING_TOKENS`, mapped to their respective string keys.
Anthropic Instrumentation Updates `packages/instrumentation-anthropic/package.json`, `packages/instrumentation-anthropic/src/instrumentation.ts`	Updated `@anthropic-ai/sdk` from `^0.56.0` to `^0.71.0`; modified initial usage object initialization to set cache-related fields to null instead of zero in streaming and chat result handlers.
Anthropic Test Changes `packages/instrumentation-anthropic/test/instrumentation.test.ts`, `packages/instrumentation-anthropic/recordings/.../*`	Removed two test cases for non-streaming and streaming completions; deleted corresponding HAR recording files (`should-set-attributes-in-span-for-completions_...` and `should-set-attributes-in-span-for-completions-streaming_...`).
Provider Metadata Transformation `packages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts`	Added `transformProviderMetadata` function to extract and normalize cache and reasoning tokens from provider metadata (Anthropic and OpenAI), mapping them into standard GenAI usage attributes; integrated into `transformLLMSpans` pipeline.
SDK Dependency & Cache Token Tests `packages/traceloop-sdk/package.json`, `packages/traceloop-sdk/test/ai-sdk-integration.test.ts`	Added `@ai-sdk/anthropic` as devDependency and updated `@anthropic-ai/sdk` to `^0.71.0`; added two new integration tests validating cache token capture and transformation for Anthropic and OpenAI providers.
Test Recording Artifacts `packages/traceloop-sdk/recordings/.../*`	Added two new HAR recording files for cache token integration tests: `should-capture-and-transform-Anthropic-cache-tokens-from-providerMetadata_...` and `should-capture-and-transform-OpenAI-cache-tokens-from-providerMetadata_...`.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AIProvider as AI Provider<br/>(Anthropic/OpenAI)
    participant SDK as AI SDK
    participant Transformer as Provider Metadata<br/>Transformer
    participant Spans as Span Attributes

    Client->>SDK: Generate text/message with system context
    SDK->>AIProvider: HTTP request (POST)
    AIProvider-->>SDK: HTTP response + metadata<br/>(cache tokens, reasoning)
    
    SDK->>Transformer: Call transformLLMSpans with response
    
    rect rgb(200, 220, 255)
    Note over Transformer: Extract Provider Metadata Phase
    Transformer->>Transformer: Parse AI_RESPONSE_PROVIDER_METADATA
    Transformer->>Transformer: Extract cache_creation/read tokens<br/>& reasoning tokens
    end
    
    rect rgb(220, 200, 255)
    Note over Transformer: Map to Standard Attributes Phase
    Transformer->>Spans: Set GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS
    Transformer->>Spans: Set GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS
    Transformer->>Spans: Set GEN_AI_USAGE_REASONING_TOKENS
    Transformer->>Transformer: Remove AI_RESPONSE_PROVIDER_METADATA
    end
    
    Transformer->>Spans: Continue standard transformations<br/>(vendor, telemetry)
    Spans-->>Client: Enriched spans with cache/reasoning tokens

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Attention areas:
- Duplicate function declaration: Summary notes transformProviderMetadata is declared twice in ai-sdk-transformations.ts—verify this is resolved and not a merge conflict artifact.
- Provider metadata transformation logic: Review the parsing and mapping logic in transformProviderMetadata to ensure correct handling of Anthropic and OpenAI metadata structures and null/undefined field cases.
- Integration with transformation pipeline: Verify transformProviderMetadata is correctly positioned in transformLLMSpans before total token calculation and other dependent transformations.
- Test coverage: Confirm new Anthropic and OpenAI cache token tests exercise edge cases (missing metadata, malformed structures).

Possibly related PRs

fix(vercel): remove duplicate token attributes (prompt/input and completion/output) #831: Modifies token-related semantic attributes and ai-sdk transformation logic for GenAI/LLM usage token normalization.
fix(sdk): support vercel AI SDK tool calling + structured outputs #675: Updates ai-sdk-transformations.ts to map provider-specific metadata into standardized GenAI usage attributes.
fix(ai-sdk): agent name in span names when available #838: Adds and wires cache-related token attributes and transformations to normalize provider metadata into standard GenAI/LLM usage attributes.

Suggested reviewers

nirga
avivhalfon

Poem

🐰 Hops through the cache with reasoning clear,
Token by token, the metadata's here!
Anthropic whispers, and OpenAI sings,
Spans now capture all caching things. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main objective: enabling reporting of cached token usage in AI SDK integrations, which aligns with all major changes including semantic attribute additions, instrumentation updates, and test coverage.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gk/ai-sdk-cached-tokens

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ion api tests

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 54fc119 in 2 minutes and 26 seconds. Click for details.

Reviewed 1127 lines of code in 11 files
Skipped 1 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/ai-semantic-conventions/src/SemanticAttributes.ts:38

Draft comment:
New semantic attribute keys for cached tokens (cache creation, cache read, reasoning) have been added. Ensure these keys are consistently named and documented for downstream usage.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that the new semantic attribute keys are consistently named and documented. This falls under the category of asking the author to ensure something is done, which is against the rules.

2. packages/instrumentation-anthropic/src/instrumentation.ts:317

Draft comment:
Usage object initialization now uses 'cache_creation' (set to null) alongside '_input_tokens' fields. Consider clarifying the purpose of the 'cache_creation' property and whether it should contain a nested object versus individual token metrics.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% The comment is asking the author to "consider clarifying" the purpose of the cache_creation property. This is not pointing out a clear bug or issue - it's asking for clarification or documentation. The rules explicitly state "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended" and "Do NOT make comments that are obvious or unimportant." This comment doesn't identify a concrete problem; it's speculative about whether the structure is correct. The change from 0 to null for the cache fields and the addition of cache_creation appears intentional to match the API response structure. Without evidence that this structure is wrong, this is just asking for clarification. Perhaps the comment is valid if cache_creation should actually be an object containing nested properties rather than a separate field. Maybe there's a type mismatch with the Anthropic API that would cause issues. The author might have made a mistake in the structure. Even if there's uncertainty about the correct structure, the comment doesn't provide evidence of an actual problem. It's asking the author to "consider clarifying" rather than pointing out a definite issue. Without seeing the Anthropic API types or evidence that this structure is incorrect, this is speculative. The rules say to only comment if there's STRONG EVIDENCE of an issue. This comment should be deleted. It's asking for clarification about the purpose and structure of a field without providing evidence of an actual problem. It violates the rule against asking authors to explain or clarify their intentions.

3. packages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts:369

Draft comment:
The transformProviderMetadata function correctly parses provider metadata for both Anthropic and OpenAI. Consider logging errors in the catch block to aid troubleshooting in case JSON parsing fails.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is suggesting a code quality improvement (adding logging). However, I need to consider: 1) This is a new function being added, so the comment is about changed code. 2) The pattern of silent error handling is consistent across the entire file - there are at least 4 other similar catch blocks that also silently ignore errors. 3) The comment doesn't provide strong justification for why this particular function needs logging when others don't. 4) This could be considered a "nice to have" suggestion rather than a clear code issue. 5) The rules state that refactor suggestions are good "if they are actionable and clear" - this one is actionable, but it's not clear why this specific location needs it when the established pattern is to ignore errors silently. The comment is technically reasonable - logging errors can help with debugging. However, it singles out one function when the entire file follows a pattern of silently ignoring parsing errors. If this is a valid concern, it should apply to all similar functions, not just this new one. The comment might be seen as inconsistent with the existing codebase patterns. While the comment points out a valid improvement, it's not addressing a clear bug or issue with the new code. The new function follows the exact same error handling pattern as existing functions in the file. Without evidence that this specific function needs different error handling, or without a broader refactoring suggestion for all similar functions, this comment appears to be an arbitrary suggestion that doesn't meet the bar of "strong evidence" that something is wrong. This comment should be deleted. It suggests a code quality improvement that is inconsistent with the established patterns in the codebase. Multiple other transformation functions use the same silent error handling approach, and there's no strong evidence that this particular function requires different treatment. The comment is more of a "nice to have" suggestion rather than identifying a clear issue.

4. packages/traceloop-sdk/test/ai-sdk-integration.test.ts:275

Draft comment:
New tests for capturing and transforming cached token usage from providerMetadata (both Anthropic and OpenAI) provide good coverage. Verify that the expected numeric token values are correctly set on the span attributes.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is asking the PR author to verify something, which is against the rules. It doesn't provide a specific suggestion or point out a specific issue in the code.

5. packages/traceloop-sdk/test/ai-sdk-integration.test.ts:350

Draft comment:
The tests for Anthropic and OpenAI cache tokens accurately assert that the new attributes are present and valid. Ensure that any future provider metadata changes continue to update these span attributes as expected.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is asking the PR author to ensure future changes are made correctly, which violates the rule against asking the author to ensure behavior is intended or tested. It doesn't provide a specific suggestion or point out a specific issue with the current code.

Workflow ID: wflow_bQvJm9Jt36hzqsih

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

packages/traceloop-sdk/test/ai-sdk-integration.test.ts (1)
342-347: Consider relaxing the assertion to match OpenAI test pattern.

The assertion that cache_creation_input_tokens > 0 could be flaky if the API doesn't always return cache creation tokens (e.g., first request with no prior cache). The OpenAI test uses a more defensive pattern with >= 0 and conditional checks. Consider applying the same approach here.
-    assert.ok(
-      (generateTextSpan.attributes[
-        SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS
-      ] as number) > 0,
-      "cache_creation_input_tokens should be greater than 0",
-    );
+    assert.ok(
+      (generateTextSpan.attributes[
+        SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS
+      ] as number) >= 0,
+      "cache_creation_input_tokens should be a valid number",
+    );
packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-Anthropic-cache-tokens-from-providerMetadata_2925856219/recording.har (1)
51-90: Optionally scrub account-identifying response header values

The anthropic-organization-id, cf-ray, and request-id headers are not secrets but are account/request identifiers. If you prefer minimizing identifiable metadata in fixtures, you could redact them to sentinel values without impacting the tests.

Example minimal change:
-              "name": "anthropic-organization-id",
-              "value": "617d109c-a187-4902-889d-689223d134aa"
+              "name": "anthropic-organization-id",
+              "value": "ORG_ID_REDACTED"
@@
-              "name": "cf-ray",
-              "value": "9a7b83b54e107da0-TLV"
+              "name": "cf-ray",
+              "value": "CF_RAY_ID_REDACTED"
@@
-              "name": "request-id",
-              "value": "req_011CVi24YsBaALhQPhhdTaoi"
+              "name": "request-id",
+              "value": "REQ_ID_REDACTED"
Only needed if it matches your compliance/privacy posture.

Also applies to: 124-130

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1832387 and 54fc119.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (11)

packages/ai-semantic-conventions/src/SemanticAttributes.ts (1 hunks)
packages/instrumentation-anthropic/package.json (1 hunks)
packages/instrumentation-anthropic/recordings/Test-Anthropic-instrumentation_3769946143/should-set-attributes-in-span-for-completions-streaming_2198009633/recording.har (0 hunks)
packages/instrumentation-anthropic/recordings/Test-Anthropic-instrumentation_3769946143/should-set-attributes-in-span-for-completions_1224394582/recording.har (0 hunks)
packages/instrumentation-anthropic/src/instrumentation.ts (1 hunks)
packages/instrumentation-anthropic/test/instrumentation.test.ts (0 hunks)
packages/traceloop-sdk/package.json (1 hunks)
packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-Anthropic-cache-tokens-from-providerMetadata_2925856219/recording.har (1 hunks)
packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-OpenAI-cache-tokens-from-providerMetadata_2332139343/recording.har (1 hunks)
packages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts (3 hunks)
packages/traceloop-sdk/test/ai-sdk-integration.test.ts (3 hunks)

💤 Files with no reviewable changes (3)

packages/instrumentation-anthropic/test/instrumentation.test.ts
packages/instrumentation-anthropic/recordings/Test-Anthropic-instrumentation_3769946143/should-set-attributes-in-span-for-completions-streaming_2198009633/recording.har
packages/instrumentation-anthropic/recordings/Test-Anthropic-instrumentation_3769946143/should-set-attributes-in-span-for-completions_1224394582/recording.har

🧰 Additional context used

📓 Path-based instructions (8)

packages/ai-semantic-conventions/src/SemanticAttributes.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Define all AI/LLM span attribute constants in packages/ai-semantic-conventions/src/SemanticAttributes.ts

Files:

packages/ai-semantic-conventions/src/SemanticAttributes.ts

packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings

Files:

packages/traceloop-sdk/test/ai-sdk-integration.test.ts
packages/instrumentation-anthropic/src/instrumentation.ts
packages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts

packages/traceloop-sdk/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow, @task, @agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from @traceloop/node-server-sdk

Files:

packages/traceloop-sdk/test/ai-sdk-integration.test.ts
packages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts

**/recordings/**

📄 CodeRabbit inference engine (CLAUDE.md)

Store HTTP interaction recordings for tests under recordings/ directories for Polly.js replay

Files:

packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-OpenAI-cache-tokens-from-providerMetadata_2332139343/recording.har
packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-Anthropic-cache-tokens-from-providerMetadata_2925856219/recording.har

packages/instrumentation-*/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place each provider integration in its own package under packages/instrumentation-[provider]/

Files:

packages/instrumentation-anthropic/package.json
packages/instrumentation-anthropic/src/instrumentation.ts

packages/*/package.json

📄 CodeRabbit inference engine (CLAUDE.md)

Use workspace:* for intra-repo package dependencies in package.json

Files:

packages/instrumentation-anthropic/package.json
packages/traceloop-sdk/package.json

packages/traceloop-sdk/package.json

📄 CodeRabbit inference engine (CLAUDE.md)

When adding a new instrumentation package, add it to the main SDK dependencies

Files:

packages/traceloop-sdk/package.json

packages/instrumentation-*/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

packages/instrumentation-*/**/*.{ts,tsx}: Instrumentation classes must extend InstrumentationBase and register hooks using InstrumentationModuleDefinition
Instrumentations must create spans with appropriate AI/LLM semantic attributes for calls they wrap
Instrumentations must extract request/response data and token usage from wrapped calls
Instrumentations must capture and record errors appropriately
Do not implement anonymous telemetry collection in instrumentation packages; telemetry is collected only in the SDK

Files:

packages/instrumentation-anthropic/src/instrumentation.ts

🧠 Learnings (9)