Skip to content

fix(ollama,together,replicate,sagemaker,bedrock): record exceptions on error spans#4005

Open
Koushik-Salammagari wants to merge 3 commits intotraceloop:mainfrom
Koushik-Salammagari:fix/error-spans-llm-instrumentations
Open

fix(ollama,together,replicate,sagemaker,bedrock): record exceptions on error spans#4005
Koushik-Salammagari wants to merge 3 commits intotraceloop:mainfrom
Koushik-Salammagari:fix/error-spans-llm-instrumentations

Conversation

@Koushik-Salammagari
Copy link
Copy Markdown

@Koushik-Salammagari Koushik-Salammagari commented Apr 16, 2026

Summary

Fixes #412 (partial — LLM instrumentation packages)

When API calls to LLM providers raise exceptions, spans were left in UNSET status with no error information attached. This PR adds proper error recording to five packages:

  • ollama: sync _wrap and async _awrap — exceptions now call span.record_exception(e), span.set_status(ERROR), span.end() before re-raising
  • together: _wrap — same pattern
  • replicate: _wrap — same pattern
  • sagemaker: both _instrumented_endpoint_invoke and _instrumented_endpoint_invoke_with_response_stream
  • bedrock: all four _instrumented_model_invoke, _instrumented_model_invoke_with_response_stream, _instrumented_converse, _instrumented_converse_stream — adds Status, StatusCode import

This follows the same approach as #3970 (anthropic/groq/mistralai).

Test plan

  • Trigger an API error (e.g. invalid API key or model name) in each provider
  • Verify the resulting span has status=ERROR and an attached exception event
  • Verify successful calls still produce status=OK spans
  • Run existing test suites: npx nx run-many -t test --projects=opentelemetry-instrumentation-ollama,opentelemetry-instrumentation-together,opentelemetry-instrumentation-replicate,opentelemetry-instrumentation-sagemaker,opentelemetry-instrumentation-bedrock

Summary by CodeRabbit

  • Bug Fixes
    • Improved error handling across OpenTelemetry instrumentations (Bedrock, Ollama, Replicate, SageMaker, Together): exceptions are now explicitly recorded on spans, spans are marked with error status and ended promptly on exceptions (including streaming paths), and automatic exception-to-span behavior was adjusted to use explicit handling for more reliable and consistent trace visibility.

…n error spans

When API calls raise exceptions, spans were left in UNSET state with no
error information. Add span.record_exception() and set StatusCode.ERROR
on all sync/async wrappers in each affected package.

Fixes traceloop#412
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

Instrumentation wrappers across five OpenTelemetry packages now explicitly catch exceptions from instrumented SDK calls, record the exception on the active span, set the span status to ERROR, ensure spans are ended where applicable, and re-raise the original exception.

Changes

Cohort / File(s) Summary
Bedrock
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
Imported Status/StatusCode. Disabled automatic exception-to-span recording on start_as_current_span and wrapped SDK calls in try/except to span.record_exception(e), span.set_status(Status(StatusCode.ERROR)); streaming/manual paths also call span.end() before re-raising.
SageMaker
packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py
Added Status/StatusCode imports. Non-stream and streaming invoke_endpoint wrappers now disable automatic exception handling and wrap calls in try/except to record exceptions, set ERROR status, and explicitly span.end() on exception for manual/stream paths.
Ollama (sync & async)
packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py
Both _wrap and _awrap now catch exceptions from wrapped(...)/await wrapped(...), call span.record_exception(e), set StatusCode.ERROR, call span.end(), then re-raise.
Replicate
packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.py
Wrapped wrapped(...) in try/except; on exception record on span, set ERROR status, end span, and re-raise. Non-exceptional response handling unchanged.
Together
packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py
_wrap now wraps wrapped(...) in try/except; exceptions are recorded on the span, status set to ERROR, span ended, then re-raised. No public API changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through spans both wide and deep,

I caught the errors that no one could keep.
I record, I mark, and end with care—
Now failures stand up, plain and fair. 🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding exception recording to error spans across five LLM instrumentation packages.
Linked Issues check ✅ Passed The PR comprehensively addresses issue #412 by implementing exception recording and error status setting across all five specified LLM instrumentation packages (ollama, together, replicate, sagemaker, bedrock).
Out of Scope Changes check ✅ Passed All changes are scoped to implementing exception handling and error span recording in the five specified LLM instrumentation packages; docstring additions are supporting changes with no behavioral impact.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 224-229: The start_as_current_span context managers are
duplicating exception telemetry because the code manually calls
span.record_exception and span.set_status while the context manager also records
exceptions by default; update both start_as_current_span(...) usages (the blocks
wrapping the fn call where span.record_exception(e) and
span.set_status(Status(StatusCode.ERROR, str(e))) are invoked) to pass
record_exception=False and set_status_on_exception=False so only the manual
exception recording and status setting runs.

In
`@packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py`:
- Around line 100-105: The try/except inside the non-streaming path is
explicitly calling span.record_exception(e) and
span.set_status(Status(StatusCode.ERROR, str(e))) while using
start_as_current_span which, by default, will also record exceptions and set
status on exit; to avoid duplicate events, call start_as_current_span with
record_exception=False and set_status_on_exception=False (i.e., adjust the
start_as_current_span invocation around the non-streaming block) so only the
explicit span.record_exception and span.set_status are applied.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e3af10b6-bce9-4d75-98cf-798bac4171fc

📥 Commits

Reviewing files that changed from the base of the PR and between 25189ad and 7bc681e.

📒 Files selected for processing (5)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py
  • packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.py
  • packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py
  • packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py

For start_as_current_span blocks (bedrock invoke/converse, sagemaker
endpoint invoke), add record_exception=False, set_status_on_exception=False
so the manual try/except is the sole exception handler without duplication.

Drop str(e) from all Status(StatusCode.ERROR) calls to prevent sensitive
request data from leaking into span status descriptions.

Addresses coderabbitai review feedback on traceloop#4005
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 16, 2026

CLA assistant check
All committers have signed the CLA.

…ings to reach 80% coverage

Adds one-line docstrings to all functions that were missing them across
the five LLM instrumentation packages. This brings docstring coverage
from ~14-18% to 100% per file, satisfying CodeRabbit's 80% threshold.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Line 85: Replace the incorrect copied docstring on MetricParams.__init__ with
a concise description that reflects its real purpose: explain that this
initializer constructs a MetricParams instance and stores metric state (e.g.,
name, description, unit, aggregation/aggregation_temporality settings,
boundaries, initial value/observed state, and any label or resource defaults)
rather than initializing the instrumentor or applying configuration settings;
update the docstring on MetricParams.__init__ accordingly so it documents the
parameters stored and the state behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 37ab9282-b952-46eb-bc02-f962770246e7

📥 Commits

Reviewing files that changed from the base of the PR and between 15ca97a and 0053d7a.

📒 Files selected for processing (5)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py
  • packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.py
  • packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py
  • packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py
✅ Files skipped from review due to trivial changes (1)
  • packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/init.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/init.py
  • packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/init.py
  • packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/init.py

guardrail_words: Counter,
prompt_caching: Counter,
):
"""Initialize the instrumentor and apply configuration settings."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the copied docstring on MetricParams.__init__.

This initializer stores metric state; it does not initialize the instrumentor or apply configuration settings.

Proposed wording
-        """Initialize the instrumentor and apply configuration settings."""
+        """Initialize metric state used by Bedrock instrumentation."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""Initialize the instrumentor and apply configuration settings."""
"""Initialize metric state used by Bedrock instrumentation."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`
at line 85, Replace the incorrect copied docstring on MetricParams.__init__ with
a concise description that reflects its real purpose: explain that this
initializer constructs a MetricParams instance and stores metric state (e.g.,
name, description, unit, aggregation/aggregation_temporality settings,
boundaries, initial value/observed state, and any label or resource defaults)
rather than initializing the instrumentor or applying configuration settings;
update the docstring on MetricParams.__init__ accordingly so it documents the
parameters stored and the state behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: Errors are not logged

2 participants