fix(ai-sdk): report cached tokens usage#839
Conversation
WalkthroughThis PR adds support for caching and reasoning token attributes from AI providers. It introduces three new semantic constants for cache creation/read input tokens and reasoning tokens, updates the Anthropic SDK dependency, adds provider metadata transformation logic to map vendor-specific fields into standardized GenAI attributes, and updates tests to validate the new cache token capture for Anthropic and OpenAI. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant AIProvider as AI Provider<br/>(Anthropic/OpenAI)
participant SDK as AI SDK
participant Transformer as Provider Metadata<br/>Transformer
participant Spans as Span Attributes
Client->>SDK: Generate text/message with system context
SDK->>AIProvider: HTTP request (POST)
AIProvider-->>SDK: HTTP response + metadata<br/>(cache tokens, reasoning)
SDK->>Transformer: Call transformLLMSpans with response
rect rgb(200, 220, 255)
Note over Transformer: Extract Provider Metadata Phase
Transformer->>Transformer: Parse AI_RESPONSE_PROVIDER_METADATA
Transformer->>Transformer: Extract cache_creation/read tokens<br/>& reasoning tokens
end
rect rgb(220, 200, 255)
Note over Transformer: Map to Standard Attributes Phase
Transformer->>Spans: Set GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS
Transformer->>Spans: Set GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS
Transformer->>Spans: Set GEN_AI_USAGE_REASONING_TOKENS
Transformer->>Transformer: Remove AI_RESPONSE_PROVIDER_METADATA
end
Transformer->>Spans: Continue standard transformations<br/>(vendor, telemetry)
Spans-->>Client: Enriched spans with cache/reasoning tokens
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 54fc119 in 2 minutes and 26 seconds. Click for details.
- Reviewed
1127lines of code in11files - Skipped
1files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/ai-semantic-conventions/src/SemanticAttributes.ts:38
- Draft comment:
New semantic attribute keys for cached tokens (cache creation, cache read, reasoning) have been added. Ensure these keys are consistently named and documented for downstream usage. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to ensure that the new semantic attribute keys are consistently named and documented. This falls under the category of asking the author to ensure something is done, which is against the rules.
2. packages/instrumentation-anthropic/src/instrumentation.ts:317
- Draft comment:
Usage object initialization now uses 'cache_creation' (set to null) alongside '_input_tokens' fields. Consider clarifying the purpose of the 'cache_creation' property and whether it should contain a nested object versus individual token metrics. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% The comment is asking the author to "consider clarifying" the purpose of the cache_creation property. This is not pointing out a clear bug or issue - it's asking for clarification or documentation. The rules explicitly state "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended" and "Do NOT make comments that are obvious or unimportant." This comment doesn't identify a concrete problem; it's speculative about whether the structure is correct. The change from 0 to null for the cache fields and the addition of cache_creation appears intentional to match the API response structure. Without evidence that this structure is wrong, this is just asking for clarification. Perhaps the comment is valid if cache_creation should actually be an object containing nested properties rather than a separate field. Maybe there's a type mismatch with the Anthropic API that would cause issues. The author might have made a mistake in the structure. Even if there's uncertainty about the correct structure, the comment doesn't provide evidence of an actual problem. It's asking the author to "consider clarifying" rather than pointing out a definite issue. Without seeing the Anthropic API types or evidence that this structure is incorrect, this is speculative. The rules say to only comment if there's STRONG EVIDENCE of an issue. This comment should be deleted. It's asking for clarification about the purpose and structure of a field without providing evidence of an actual problem. It violates the rule against asking authors to explain or clarify their intentions.
3. packages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts:369
- Draft comment:
The transformProviderMetadata function correctly parses provider metadata for both Anthropic and OpenAI. Consider logging errors in the catch block to aid troubleshooting in case JSON parsing fails. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is suggesting a code quality improvement (adding logging). However, I need to consider: 1) This is a new function being added, so the comment is about changed code. 2) The pattern of silent error handling is consistent across the entire file - there are at least 4 other similar catch blocks that also silently ignore errors. 3) The comment doesn't provide strong justification for why this particular function needs logging when others don't. 4) This could be considered a "nice to have" suggestion rather than a clear code issue. 5) The rules state that refactor suggestions are good "if they are actionable and clear" - this one is actionable, but it's not clear why this specific location needs it when the established pattern is to ignore errors silently. The comment is technically reasonable - logging errors can help with debugging. However, it singles out one function when the entire file follows a pattern of silently ignoring parsing errors. If this is a valid concern, it should apply to all similar functions, not just this new one. The comment might be seen as inconsistent with the existing codebase patterns. While the comment points out a valid improvement, it's not addressing a clear bug or issue with the new code. The new function follows the exact same error handling pattern as existing functions in the file. Without evidence that this specific function needs different error handling, or without a broader refactoring suggestion for all similar functions, this comment appears to be an arbitrary suggestion that doesn't meet the bar of "strong evidence" that something is wrong. This comment should be deleted. It suggests a code quality improvement that is inconsistent with the established patterns in the codebase. Multiple other transformation functions use the same silent error handling approach, and there's no strong evidence that this particular function requires different treatment. The comment is more of a "nice to have" suggestion rather than identifying a clear issue.
4. packages/traceloop-sdk/test/ai-sdk-integration.test.ts:275
- Draft comment:
New tests for capturing and transforming cached token usage from providerMetadata (both Anthropic and OpenAI) provide good coverage. Verify that the expected numeric token values are correctly set on the span attributes. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is asking the PR author to verify something, which is against the rules. It doesn't provide a specific suggestion or point out a specific issue in the code.
5. packages/traceloop-sdk/test/ai-sdk-integration.test.ts:350
- Draft comment:
The tests for Anthropic and OpenAI cache tokens accurately assert that the new attributes are present and valid. Ensure that any future provider metadata changes continue to update these span attributes as expected. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is asking the PR author to ensure future changes are made correctly, which violates the rule against asking the author to ensure behavior is intended or tested. It doesn't provide a specific suggestion or point out a specific issue with the current code.
Workflow ID: wflow_bQvJm9Jt36hzqsih
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/traceloop-sdk/test/ai-sdk-integration.test.ts (1)
342-347: Consider relaxing the assertion to match OpenAI test pattern.The assertion that
cache_creation_input_tokens > 0could be flaky if the API doesn't always return cache creation tokens (e.g., first request with no prior cache). The OpenAI test uses a more defensive pattern with>= 0and conditional checks. Consider applying the same approach here.- assert.ok( - (generateTextSpan.attributes[ - SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS - ] as number) > 0, - "cache_creation_input_tokens should be greater than 0", - ); + assert.ok( + (generateTextSpan.attributes[ + SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS + ] as number) >= 0, + "cache_creation_input_tokens should be a valid number", + );packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-Anthropic-cache-tokens-from-providerMetadata_2925856219/recording.har (1)
51-90: Optionally scrub account-identifying response header valuesThe
anthropic-organization-id,cf-ray, andrequest-idheaders are not secrets but are account/request identifiers. If you prefer minimizing identifiable metadata in fixtures, you could redact them to sentinel values without impacting the tests.Example minimal change:
- "name": "anthropic-organization-id", - "value": "617d109c-a187-4902-889d-689223d134aa" + "name": "anthropic-organization-id", + "value": "ORG_ID_REDACTED" @@ - "name": "cf-ray", - "value": "9a7b83b54e107da0-TLV" + "name": "cf-ray", + "value": "CF_RAY_ID_REDACTED" @@ - "name": "request-id", - "value": "req_011CVi24YsBaALhQPhhdTaoi" + "name": "request-id", + "value": "REQ_ID_REDACTED"Only needed if it matches your compliance/privacy posture.
Also applies to: 124-130
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (11)
packages/ai-semantic-conventions/src/SemanticAttributes.ts(1 hunks)packages/instrumentation-anthropic/package.json(1 hunks)packages/instrumentation-anthropic/recordings/Test-Anthropic-instrumentation_3769946143/should-set-attributes-in-span-for-completions-streaming_2198009633/recording.har(0 hunks)packages/instrumentation-anthropic/recordings/Test-Anthropic-instrumentation_3769946143/should-set-attributes-in-span-for-completions_1224394582/recording.har(0 hunks)packages/instrumentation-anthropic/src/instrumentation.ts(1 hunks)packages/instrumentation-anthropic/test/instrumentation.test.ts(0 hunks)packages/traceloop-sdk/package.json(1 hunks)packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-Anthropic-cache-tokens-from-providerMetadata_2925856219/recording.har(1 hunks)packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-OpenAI-cache-tokens-from-providerMetadata_2332139343/recording.har(1 hunks)packages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts(3 hunks)packages/traceloop-sdk/test/ai-sdk-integration.test.ts(3 hunks)
💤 Files with no reviewable changes (3)
- packages/instrumentation-anthropic/test/instrumentation.test.ts
- packages/instrumentation-anthropic/recordings/Test-Anthropic-instrumentation_3769946143/should-set-attributes-in-span-for-completions-streaming_2198009633/recording.har
- packages/instrumentation-anthropic/recordings/Test-Anthropic-instrumentation_3769946143/should-set-attributes-in-span-for-completions_1224394582/recording.har
🧰 Additional context used
📓 Path-based instructions (8)
packages/ai-semantic-conventions/src/SemanticAttributes.ts
📄 CodeRabbit inference engine (CLAUDE.md)
Define all AI/LLM span attribute constants in packages/ai-semantic-conventions/src/SemanticAttributes.ts
Files:
packages/ai-semantic-conventions/src/SemanticAttributes.ts
packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Import AI/LLM semantic attribute constants from @traceloop/ai-semantic-conventions rather than hardcoding strings
Files:
packages/traceloop-sdk/test/ai-sdk-integration.test.tspackages/instrumentation-anthropic/src/instrumentation.tspackages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts
packages/traceloop-sdk/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
packages/traceloop-sdk/**/*.{ts,tsx}: Use the provided decorators (@workflow, @task, @agent) for workflow/task/agent spans instead of re-implementing them
For manual LLM operations, use trace.withLLMSpan from @traceloop/node-server-sdk
Files:
packages/traceloop-sdk/test/ai-sdk-integration.test.tspackages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts
**/recordings/**
📄 CodeRabbit inference engine (CLAUDE.md)
Store HTTP interaction recordings for tests under recordings/ directories for Polly.js replay
Files:
packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-OpenAI-cache-tokens-from-providerMetadata_2332139343/recording.harpackages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-Anthropic-cache-tokens-from-providerMetadata_2925856219/recording.har
packages/instrumentation-*/**
📄 CodeRabbit inference engine (CLAUDE.md)
Place each provider integration in its own package under packages/instrumentation-[provider]/
Files:
packages/instrumentation-anthropic/package.jsonpackages/instrumentation-anthropic/src/instrumentation.ts
packages/*/package.json
📄 CodeRabbit inference engine (CLAUDE.md)
Use workspace:* for intra-repo package dependencies in package.json
Files:
packages/instrumentation-anthropic/package.jsonpackages/traceloop-sdk/package.json
packages/traceloop-sdk/package.json
📄 CodeRabbit inference engine (CLAUDE.md)
When adding a new instrumentation package, add it to the main SDK dependencies
Files:
packages/traceloop-sdk/package.json
packages/instrumentation-*/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
packages/instrumentation-*/**/*.{ts,tsx}: Instrumentation classes must extend InstrumentationBase and register hooks using InstrumentationModuleDefinition
Instrumentations must create spans with appropriate AI/LLM semantic attributes for calls they wrap
Instrumentations must extract request/response data and token usage from wrapped calls
Instrumentations must capture and record errors appropriately
Do not implement anonymous telemetry collection in instrumentation packages; telemetry is collected only in the SDK
Files:
packages/instrumentation-anthropic/src/instrumentation.ts
🧠 Learnings (9)
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/ai-semantic-conventions/src/SemanticAttributes.ts : Define all AI/LLM span attribute constants in packages/ai-semantic-conventions/src/SemanticAttributes.ts
Applied to files:
packages/ai-semantic-conventions/src/SemanticAttributes.tspackages/traceloop-sdk/test/ai-sdk-integration.test.tspackages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/{instrumentation-*,traceloop-sdk}/**/*.{ts,tsx} : Import AI/LLM semantic attribute constants from traceloop/ai-semantic-conventions rather than hardcoding strings
Applied to files:
packages/ai-semantic-conventions/src/SemanticAttributes.tspackages/traceloop-sdk/test/ai-sdk-integration.test.tspackages/instrumentation-anthropic/package.jsonpackages/traceloop-sdk/package.jsonpackages/instrumentation-anthropic/src/instrumentation.tspackages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/instrumentation-*/**/*.{ts,tsx} : Instrumentations must create spans with appropriate AI/LLM semantic attributes for calls they wrap
Applied to files:
packages/ai-semantic-conventions/src/SemanticAttributes.tspackages/instrumentation-anthropic/src/instrumentation.tspackages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : Use the provided decorators (workflow, task, agent) for workflow/task/agent spans instead of re-implementing them
Applied to files:
packages/traceloop-sdk/test/ai-sdk-integration.test.tspackages/traceloop-sdk/package.json
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/package.json : When adding a new instrumentation package, add it to the main SDK dependencies
Applied to files:
packages/instrumentation-anthropic/package.jsonpackages/traceloop-sdk/package.json
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/instrumentation-*/**/*.{ts,tsx} : Do not implement anonymous telemetry collection in instrumentation packages; telemetry is collected only in the SDK
Applied to files:
packages/instrumentation-anthropic/package.jsonpackages/instrumentation-anthropic/src/instrumentation.ts
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/traceloop-sdk/**/*.{ts,tsx} : For manual LLM operations, use trace.withLLMSpan from traceloop/node-server-sdk
Applied to files:
packages/traceloop-sdk/package.json
📚 Learning: 2025-08-24T22:08:07.023Z
Learnt from: CR
Repo: traceloop/openllmetry-js PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-24T22:08:07.023Z
Learning: Applies to packages/instrumentation-*/**/*.{ts,tsx} : Instrumentations must extract request/response data and token usage from wrapped calls
Applied to files:
packages/instrumentation-anthropic/src/instrumentation.ts
📚 Learning: 2025-08-12T13:57:05.901Z
Learnt from: galzilber
Repo: traceloop/openllmetry-js PR: 643
File: packages/traceloop-sdk/test/datasets-final.test.ts:97-105
Timestamp: 2025-08-12T13:57:05.901Z
Learning: The traceloop-sdk uses a response transformer (`transformApiResponse` in `packages/traceloop-sdk/src/lib/utils/response-transformer.ts`) that converts snake_case API responses to camelCase for SDK interfaces. Raw API responses use snake_case but SDK consumers see camelCase fields.
Applied to files:
packages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts
🧬 Code graph analysis (2)
packages/traceloop-sdk/test/ai-sdk-integration.test.ts (1)
packages/ai-semantic-conventions/src/SemanticAttributes.ts (1)
SpanAttributes(23-79)
packages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts (1)
packages/ai-semantic-conventions/src/SemanticAttributes.ts (1)
SpanAttributes(23-79)
🔇 Additional comments (10)
packages/traceloop-sdk/package.json (1)
96-99: LGTM!The new
@ai-sdk/anthropicdevDependency and the@anthropic-ai/sdkversion bump appropriately support the new Anthropic cache token tests added in this PR.packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-OpenAI-cache-tokens-from-providerMetadata_2332139343/recording.har (1)
1-172: LGTM!The HAR recording correctly captures an OpenAI API response with
cached_tokensandreasoning_tokensin the usage details, which validates the provider metadata transformation logic. The recording follows the project's Polly.js conventions for test artifacts.packages/instrumentation-anthropic/src/instrumentation.ts (1)
317-325: LGTM!Using
nullinstead of0for cache-related token fields is semantically correct—it distinguishes "not available/not applicable" from "zero tokens used." The addition ofcache_creationaligns with the updated Anthropic SDK structure.packages/ai-semantic-conventions/src/SemanticAttributes.ts (1)
38-41: LGTM!The new semantic attribute constants follow the established
gen_ai.usage.*naming convention and are properly centralized in the ai-semantic-conventions package, as per coding guidelines.packages/traceloop-sdk/test/ai-sdk-integration.test.ts (2)
21-21: LGTM!The Anthropic provider import follows the same pattern as OpenAI and Google integrations.
63-63: LGTM!Correctly sets the dummy API key for Anthropic in replay mode, consistent with OpenAI and Google.
packages/traceloop-sdk/src/lib/tracing/ai-sdk-transformations.ts (2)
514-514: LGTM!Correctly positioned in the transformation pipeline—after token transformations but before
calculateTotalTokens, ensuring provider-specific cache metrics are available for any downstream calculations.
369-416: LGTM!The provider metadata transformation correctly:
- Handles both string and object formats
- Maps Anthropic's
cacheCreationInputTokens/cacheReadInputTokensand OpenAI'scachedPromptTokens/reasoningTokensto standardized GenAI usage attributes (field names verified against @ai-sdk/openai documentation)- Uses SpanAttributes constants from
@traceloop/ai-semantic-conventionsper coding guidelines- Cleans up the source attribute after transformation
packages/instrumentation-anthropic/package.json (1)
49-49: No action needed. The version bump to @anthropic-ai/sdk@0.71.0 is safe; this release contains only additive changes (new agent/agent-skills features) with no documented breaking changes. The instrumentation code uses compatible type imports and safe optional chaining patterns throughout, properly handling the usage object fields regardless of SDK version.packages/traceloop-sdk/recordings/Test-AI-SDK-Integration-with-Recording_156038438/should-capture-and-transform-Anthropic-cache-tokens-from-providerMetadata_2925856219/recording.har (1)
1-178: HAR recording structure and placement look correctThe HAR is well-formed (HAR 1.2
log/entriesstructure), lives underrecordings/for Polly.js replay as per guidelines, and captures the Anthropicusagepayload includingcache_creation_input_tokensandcache_read_input_tokens, which aligns with the PR’s cache-token reporting goals. No secrets or auth headers are present.
Important
Add attributes for cache token usage tracking in AI SDKs and update tests and dependencies accordingly.
GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS,GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, andGEN_AI_USAGE_REASONING_TOKENStoSemanticAttributes.tsfor tracking cache token usage.transformProviderMetadata()inai-sdk-transformations.tsto handle new attributes for Anthropic and OpenAI metadata.instrumentation.test.ts.@anthropic-ai/sdkto^0.71.0ininstrumentation-anthropic/package.jsonandtraceloop-sdk/package.json.ai-sdk-integration.test.tsfor capturing and transforming Anthropic and OpenAI cache tokens.This description was created by
for 54fc119. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.