fix(ollama,together,replicate,sagemaker,bedrock): record exceptions on error spans#4005
Conversation
…n error spans When API calls raise exceptions, spans were left in UNSET state with no error information. Add span.record_exception() and set StatusCode.ERROR on all sync/async wrappers in each affected package. Fixes traceloop#412
📝 WalkthroughWalkthroughInstrumentation wrappers across five OpenTelemetry packages now explicitly catch exceptions from instrumented SDK calls, record the exception on the active span, set the span status to ERROR, ensure spans are ended where applicable, and re-raise the original exception. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 224-229: The start_as_current_span context managers are
duplicating exception telemetry because the code manually calls
span.record_exception and span.set_status while the context manager also records
exceptions by default; update both start_as_current_span(...) usages (the blocks
wrapping the fn call where span.record_exception(e) and
span.set_status(Status(StatusCode.ERROR, str(e))) are invoked) to pass
record_exception=False and set_status_on_exception=False so only the manual
exception recording and status setting runs.
In
`@packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py`:
- Around line 100-105: The try/except inside the non-streaming path is
explicitly calling span.record_exception(e) and
span.set_status(Status(StatusCode.ERROR, str(e))) while using
start_as_current_span which, by default, will also record exceptions and set
status on exit; to avoid duplicate events, call start_as_current_span with
record_exception=False and set_status_on_exception=False (i.e., adjust the
start_as_current_span invocation around the non-streaming block) so only the
explicit span.record_exception and span.set_status are applied.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e3af10b6-bce9-4d75-98cf-798bac4171fc
📒 Files selected for processing (5)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.pypackages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.pypackages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.pypackages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py
For start_as_current_span blocks (bedrock invoke/converse, sagemaker endpoint invoke), add record_exception=False, set_status_on_exception=False so the manual try/except is the sole exception handler without duplication. Drop str(e) from all Status(StatusCode.ERROR) calls to prevent sensitive request data from leaking into span status descriptions. Addresses coderabbitai review feedback on traceloop#4005
…ings to reach 80% coverage Adds one-line docstrings to all functions that were missing them across the five LLM instrumentation packages. This brings docstring coverage from ~14-18% to 100% per file, satisfying CodeRabbit's 80% threshold.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Line 85: Replace the incorrect copied docstring on MetricParams.__init__ with
a concise description that reflects its real purpose: explain that this
initializer constructs a MetricParams instance and stores metric state (e.g.,
name, description, unit, aggregation/aggregation_temporality settings,
boundaries, initial value/observed state, and any label or resource defaults)
rather than initializing the instrumentor or applying configuration settings;
update the docstring on MetricParams.__init__ accordingly so it documents the
parameters stored and the state behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 37ab9282-b952-46eb-bc02-f962770246e7
📒 Files selected for processing (5)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.pypackages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.pypackages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.pypackages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py
✅ Files skipped from review due to trivial changes (1)
- packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/init.py
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/init.py
- packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/init.py
- packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/init.py
| guardrail_words: Counter, | ||
| prompt_caching: Counter, | ||
| ): | ||
| """Initialize the instrumentor and apply configuration settings.""" |
There was a problem hiding this comment.
Fix the copied docstring on MetricParams.__init__.
This initializer stores metric state; it does not initialize the instrumentor or apply configuration settings.
Proposed wording
- """Initialize the instrumentor and apply configuration settings."""
+ """Initialize metric state used by Bedrock instrumentation."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """Initialize the instrumentor and apply configuration settings.""" | |
| """Initialize metric state used by Bedrock instrumentation.""" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`
at line 85, Replace the incorrect copied docstring on MetricParams.__init__ with
a concise description that reflects its real purpose: explain that this
initializer constructs a MetricParams instance and stores metric state (e.g.,
name, description, unit, aggregation/aggregation_temporality settings,
boundaries, initial value/observed state, and any label or resource defaults)
rather than initializing the instrumentor or applying configuration settings;
update the docstring on MetricParams.__init__ accordingly so it documents the
parameters stored and the state behavior.
Summary
Fixes #412 (partial — LLM instrumentation packages)
When API calls to LLM providers raise exceptions, spans were left in
UNSETstatus with no error information attached. This PR adds proper error recording to five packages:_wrapand async_awrap— exceptions now callspan.record_exception(e),span.set_status(ERROR),span.end()before re-raising_wrap— same pattern_wrap— same pattern_instrumented_endpoint_invokeand_instrumented_endpoint_invoke_with_response_stream_instrumented_model_invoke,_instrumented_model_invoke_with_response_stream,_instrumented_converse,_instrumented_converse_stream— addsStatus, StatusCodeimportThis follows the same approach as #3970 (anthropic/groq/mistralai).
Test plan
status=ERRORand an attached exception eventstatus=OKspansnpx nx run-many -t test --projects=opentelemetry-instrumentation-ollama,opentelemetry-instrumentation-together,opentelemetry-instrumentation-replicate,opentelemetry-instrumentation-sagemaker,opentelemetry-instrumentation-bedrockSummary by CodeRabbit