Skip to content

feat(evaluators): add standalone evaluator create/list/get/update/run SDK methods#908

Merged
lenatraceloop merged 2 commits intomainfrom
lena/standalone-evaluator-sdk-methods
Mar 29, 2026
Merged

feat(evaluators): add standalone evaluator create/list/get/update/run SDK methods#908
lenatraceloop merged 2 commits intomainfrom
lena/standalone-evaluator-sdk-methods

Conversation

@lenatraceloop
Copy link
Copy Markdown
Member

@lenatraceloop lenatraceloop commented Mar 23, 2026

Summary

Adds SDK support for managing standalone custom evaluators (not bound to any project or environment): create, list, get, update, and run.

Fixed TLP-1998

Test plan

  • pnpm test passes (all existing + new standalone evaluator tests)

Summary by CodeRabbit

  • New Features

    • Complete evaluator lifecycle: create custom evaluators, list (filter by source), retrieve full configurations, update settings, and execute evaluators with custom inputs.
    • Full support for LLM provider configuration and input/output schemas (including enum handling).
    • SDK types exported for evaluator operations and schemas to enable typed usage in client code.
  • Tests

    • Added end-to-end tests covering create, list, get, update, and run behaviors, request/response mapping, validation, and error handling.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a standalone evaluator API to the SDK: new Evaluator methods create, list, get, update, run; a patch HTTP helper on TraceloopClient; new evaluator TypeScript interfaces and re-exports; and an end-to-end test suite validating HTTP contracts and transformations.

Changes

Cohort / File(s) Summary
Evaluator Client Implementation
packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts
Added public methods create, list, get, update, run. Introduced private helpers validateIdentifier, buildPayload, and toEvaluatorData to handle identifier validation, camelCase→snake_case payload building (including enumValuesenum_values), and response→EvaluatorData conversion with required-field checks.
HTTP Client Enhancement
packages/traceloop-sdk/src/lib/client/traceloop-client.ts
Added patch(path: string, body) method that sends authenticated JSON PATCH requests with Authorization, Content-Type, and X-Traceloop-SDK-Version headers.
Type Definitions
packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts
Added evaluator-related types and interfaces: LLMMessage, PropertySchema, CreateCustomEvaluatorRequest, EvaluatorCreateResponse, EvaluatorUpdateRequest, EvaluatorUpdateResponse, EvaluatorExecuteOptions, EvaluatorExecuteResponse, EvaluatorSource, EvaluatorCatalogItem, EvaluatorData.
Public API Exports
packages/traceloop-sdk/src/lib/node-server-sdk.ts
Re-exported new evaluator interfaces/types so they are available from the SDK entry point.
Test Suite
packages/traceloop-sdk/test/standalone-evaluators.test.ts
Added comprehensive E2E-style tests covering create, list, get, update, run: request URLs/methods/headers, payload snake_case and enum handling, response mappings, required-field validation, slug/id handling, and error propagation.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant EC as Evaluator
    participant TC as TraceloopClient
    participant API as Traceloop API

    rect rgba(100,150,255,0.5)
    Note over App,API: Create Evaluator
    App->>EC: create(options)
    EC->>EC: buildPayload (schema & snake_case)
    EC->>TC: post("/v2/evaluators", payload)
    TC->>API: POST /v2/evaluators (auth headers)
    API-->>TC: { evaluatorId, slug }
    TC-->>EC: response
    EC-->>App: { id, slug }
    end

    rect rgba(150,200,100,0.5)
    Note over App,API: List / Get Evaluator
    App->>EC: list(source?) / get(identifier)
    EC->>TC: get("/v2/evaluators" or "/v2/evaluators/:id")
    TC->>API: GET (auth)
    API-->>TC: { evaluators | evaluator }
    TC-->>EC: response
    EC->>EC: toEvaluatorData (convert schemas, validate llm config)
    EC-->>App: EvaluatorCatalogItem | EvaluatorData
    end

    rect rgba(200,150,100,0.5)
    Note over App,API: Run Execution
    App->>EC: run(identifier, { input })
    EC->>TC: post("/v2/evaluators/:id/executions", { input })
    TC->>API: POST execution (auth)
    API-->>TC: { executionId, result }
    TC-->>EC: response
    EC-->>App: { executionId, result }
    end
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped through specs and tiny trails,
Built payloads, checked the missing tails.
Create, list, fetch, patch, then run—
Schemas aligned, the tests had fun.
A carrot for each passing run. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding five new standalone evaluator methods (create/list/get/update/run) to the SDK.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lena/standalone-evaluator-sdk-methods

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/traceloop-sdk/test/standalone-evaluators.test.ts (1)

333-361: Use raw API payload shapes in the response-mapping tests.

BaseDatasetEntity.handleResponse() normalizes successful JSON via transformApiResponse(), but these cases stub already-camelCased inputSchema / outputSchema / enumValues. That means the tests won't catch a regression in the actual wire-format mapping path.

🧪 Example fixture change
-                inputSchema: [{ name: "text", type: "string" }],
-                outputSchema: [
-                  { name: "label", type: "enum", enumValues: ["good", "bad"] },
-                ],
+                input_schema: [{ name: "text", type: "string" }],
+                output_schema: [
+                  { name: "label", type: "enum", enum_values: ["good", "bad"] },
+                ],

Also applies to: 591-621

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/test/standalone-evaluators.test.ts` around lines 333 -
361, The test stubs use already camelCased fields
(inputSchema/outputSchema/enumValues) so it bypasses the API-to-model mapping in
BaseDatasetEntity.handleResponse() which uses transformApiResponse(); update the
test fixture in standalone-evaluators.test.ts to return the raw wire-format
(snake_case) payload (e.g., input_schema, output_schema, enum_values) so the
response-mapping path is exercised and assertions still verify the post-mapping
properties (inputSchema, outputSchema) on the result of client.evaluator.list().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts`:
- Around line 64-66: The evaluator methods get, update, and run interpolate
arbitrary evaluator identifiers directly into the URL, which allows reserved
characters to break the path; fix this by wrapping the identifier with
encodeURIComponent(...) wherever it's used in the URL templates (e.g., change
this.client.get(`/v2/evaluators/${identifier}`) and the similar
this.client.put(...) and this.client.post(...) calls in Evaluator class methods
get, update, and run to use `/v2/evaluators/${encodeURIComponent(identifier)}`)
so the identifier is treated as a single path segment.
- Around line 205-220: The getter in evaluator.ts is stripping explicit zero
numeric LLM settings by only copying llmConfig fields into result when they are
non-zero; update the code that builds result (the block that checks
llmConfig.temperature, maxTokens, topP, frequencyPenalty, presencePenalty) to
copy values whenever the field is defined (!== undefined) regardless of being 0
so zero is preserved on read-after-write — locate the logic around llmConfig and
result in the get() path of Evaluator (evaluator.ts) and change the conditional
checks to test for undefined only.

---

Nitpick comments:
In `@packages/traceloop-sdk/test/standalone-evaluators.test.ts`:
- Around line 333-361: The test stubs use already camelCased fields
(inputSchema/outputSchema/enumValues) so it bypasses the API-to-model mapping in
BaseDatasetEntity.handleResponse() which uses transformApiResponse(); update the
test fixture in standalone-evaluators.test.ts to return the raw wire-format
(snake_case) payload (e.g., input_schema, output_schema, enum_values) so the
response-mapping path is exercised and assertions still verify the post-mapping
properties (inputSchema, outputSchema) on the result of client.evaluator.list().

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e5ca8cf5-492d-4019-8dac-15f703557978

📥 Commits

Reviewing files that changed from the base of the PR and between 50a591c and c3ecce5.

📒 Files selected for processing (5)
  • packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts
  • packages/traceloop-sdk/src/lib/client/traceloop-client.ts
  • packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts
  • packages/traceloop-sdk/src/lib/node-server-sdk.ts
  • packages/traceloop-sdk/test/standalone-evaluators.test.ts

Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts Outdated
Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts Outdated
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

Request timed out after 900000ms (requestId=9a405c62-4ad9-4f45-976c-b48aed23c314)

@lenatraceloop lenatraceloop force-pushed the lena/standalone-evaluator-sdk-methods branch from 077fdb5 to 7276037 Compare March 23, 2026 16:17
Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts (1)

85-98: Extract shared schema mapping to one helper.

update() and buildPayload() duplicate the same PropertySchema → API field mapping. Centralizing this avoids future drift.

♻️ Suggested refactor
+  private toApiPropertySchema(p: PropertySchema): Record<string, unknown> {
+    return {
+      name: p.name,
+      type: p.type,
+      description: p.description,
+      enum_values: p.enumValues,
+    };
+  }
+
   async update(
@@
     if (patch.inputSchema !== undefined)
-      payload.input_schema = patch.inputSchema.map((p) => ({
-        name: p.name,
-        type: p.type,
-        description: p.description,
-        enum_values: p.enumValues,
-      }));
+      payload.input_schema = patch.inputSchema.map((p) =>
+        this.toApiPropertySchema(p),
+      );
     if (patch.outputSchema !== undefined)
-      payload.output_schema = patch.outputSchema.map((p) => ({
-        name: p.name,
-        type: p.type,
-        description: p.description,
-        enum_values: p.enumValues,
-      }));
+      payload.output_schema = patch.outputSchema.map((p) =>
+        this.toApiPropertySchema(p),
+      );
@@
       model: options.model,
-      input_schema: options.inputSchema.map((p) => ({
-        name: p.name,
-        type: p.type,
-        description: p.description,
-        enum_values: p.enumValues,
-      })),
-      output_schema: options.outputSchema.map((p) => ({
-        name: p.name,
-        type: p.type,
-        description: p.description,
-        enum_values: p.enumValues,
-      })),
+      input_schema: options.inputSchema.map((p) => this.toApiPropertySchema(p)),
+      output_schema: options.outputSchema.map((p) => this.toApiPropertySchema(p)),
     };

Also applies to: 144-155

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts` around lines 85
- 98, Both update() and buildPayload() duplicate mapping logic from
PropertySchema objects (patch.inputSchema / patch.outputSchema) into API fields
(payload.input_schema / payload.output_schema); extract that mapping into a
single helper function (e.g., mapPropertySchemaToApiField or formatSchemaFields)
that takes a PropertySchema and returns { name, type, description, enum_values
}, then replace the inline .map(...) calls in both buildPayload() and update()
to call this helper for inputSchema and outputSchema; ensure the helper is
exported/placed near the evaluator utilities so both call sites can reuse it and
update any types if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts`:
- Around line 64-66: Validate that the evaluator identifier is non-empty and not
just whitespace before building URLs: in methods like get, listExecutions, and
getExecution (and any other methods that accept an identifier) check that
identifier.trim().length > 0 and throw a clear Error (e.g., "Invalid evaluator
identifier") if it fails; only after validation use
encodeURIComponent(identifier) to construct paths such as
`/v2/evaluators/${encodeURIComponent(identifier)}` or
`/v2/evaluators/${encodeURIComponent(identifier)}/executions`.
- Around line 49-58: The list() output mapping assumes data.evaluators is an
array; guard against non-array payloads by checking
Array.isArray(data.evaluators) before mapping. In evaluator.ts update the items
assignment used by the mapping in list() (currently `items: any[] =
data.evaluators ?? []`) to set items to data.evaluators only when
Array.isArray(...) is true, otherwise use an empty array; keep the mapping that
returns id, slug, name, description, source, inputSchema and outputSchema
unchanged.

---

Nitpick comments:
In `@packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts`:
- Around line 85-98: Both update() and buildPayload() duplicate mapping logic
from PropertySchema objects (patch.inputSchema / patch.outputSchema) into API
fields (payload.input_schema / payload.output_schema); extract that mapping into
a single helper function (e.g., mapPropertySchemaToApiField or
formatSchemaFields) that takes a PropertySchema and returns { name, type,
description, enum_values }, then replace the inline .map(...) calls in both
buildPayload() and update() to call this helper for inputSchema and
outputSchema; ensure the helper is exported/placed near the evaluator utilities
so both call sites can reuse it and update any types if needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1429ab7f-59a6-4b03-9223-c28c0c7e8e4e

📥 Commits

Reviewing files that changed from the base of the PR and between 077fdb5 and 7276037.

📒 Files selected for processing (5)
  • packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts
  • packages/traceloop-sdk/src/lib/client/traceloop-client.ts
  • packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts
  • packages/traceloop-sdk/src/lib/node-server-sdk.ts
  • packages/traceloop-sdk/test/standalone-evaluators.test.ts
✅ Files skipped from review due to trivial changes (4)
  • packages/traceloop-sdk/src/lib/client/traceloop-client.ts
  • packages/traceloop-sdk/src/lib/node-server-sdk.ts
  • packages/traceloop-sdk/test/standalone-evaluators.test.ts
  • packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts

Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts Outdated
Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts Outdated
@lenatraceloop
Copy link
Copy Markdown
Member Author

Also addressed the nitpick about snake_case test fixtures — updated the list and get mock responses to use raw wire-format keys (input_schema, output_schema, enum_values) so the transformApiResponse mapping path is actually exercised in the tests. Thanks for the thorough review! 🙏

@lenatraceloop lenatraceloop force-pushed the lena/standalone-evaluator-sdk-methods branch from 4f40f41 to 19c6f72 Compare March 23, 2026 16:28
@lenatraceloop lenatraceloop force-pushed the lena/standalone-evaluator-sdk-methods branch 2 times, most recently from a93341e to 85eacd9 Compare March 24, 2026 13:29
Comment thread packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts Outdated
Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts Outdated
Comment thread packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts Outdated
Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts Outdated
Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts Outdated
Comment thread packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts Outdated
Comment thread packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts Outdated
Comment thread packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts Outdated
Comment thread packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts Outdated
Copy link
Copy Markdown
Contributor

@nina-kollman nina-kollman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add a sample app that demonstrates the flow of creating and patching an evaluator
  2. In JS you need to add jsdocs to make it user friendly - as detaild as possible

JSDocs Added.

Sample app results:
lena.badinter@MREMD84ACDDC sample-app % npm run run:standalone_evaluators

sample-app@0.0.1 run:standalone_evaluators
npm run build && node dist/src/sample_standalone_evaluators.js

sample-app@0.0.1 build

tsc --build tsconfig.json

🚀 Standalone Evaluator SDK — Full Coverage Sample App
API: https://api.traceloop.dev

── 1. CREATE ──────────────────────────────────────────────────
✅ create with all fields → {"id":"cmnbnq18z000g01siegvop9m8","slug":"quality-eval-1774782576827"}
✅ create with minimal fields → {"id":"cmnbnq1n1000i01siea0uelog","slug":"minimal-evaluator-1774782577670"}
✅ create with enum output schema → {"id":"cmnbnq1rj000k01siuxgyolo0","slug":"enum-evaluator-1774782578164"}
✅ create with zero LLM params (temperature=0) → {"id":"cmnbnq1w2000m01sim5qtt6rl"}
✅ create missing name (correctly threw) → Key: 'CreateCustomEvaluatorInput.Name' Error:Field validation for 'Name' failed on the 'required' tag
✅ create missing provider (correctly threw) → Key: 'CreateCustomEvaluatorInput.Provider' Error:Field validation for 'Provider' failed on the 'required' tag
✅ create missing model (correctly threw) → Key: 'CreateCustomEvaluatorInput.Model' Error:Field validation for 'Model' failed on the 'required' tag
✅ create missing messages (correctly threw) → messages is required
✅ create missing input_schema (correctly threw) → input_schema is required
✅ create missing output_schema (correctly threw) → output_schema is required
✅ create input_schema property missing name (correctly threw) → input_schema validation failed: property name is required

── 2. LIST ────────────────────────────────────────────────────
✅ list all evaluators (316 found)
✅ list item has required fields → {"id":"cmhvvwu9w000001w72ccqzc4l","name":"cc","slug":"cc","type":"WORD_COUNT","source":"custom"}
✅ list custom evaluators (281 found)
✅ all listed items have source=custom → true
✅ list prebuilt evaluators (35 found)
✅ all listed items have source=prebuilt → true
✅ list with invalid source (correctly threw) → invalid source parameter: must be 'prebuilt' or 'custom'

── 3. GET ─────────────────────────────────────────────────────
✅ get by ID → {"id":"cmnbnq18z000g01siegvop9m8","slug":"quality-eval-1774782576827","provider":"openai"}
✅ get by slug → {"id":"cmnbnq18z000g01siegvop9m8","slug":"quality-eval-1774782576827"}
⚠️ No prebuilt evaluator with provider found, skipping MBT get test
✅ get returns all required fields → "yes"
✅ get non-existent ID (correctly threw) → Evaluator not found
✅ get non-existent slug (correctly threw) → Evaluator not found
✅ get empty identifier (correctly threw) → Evaluator identifier must be a non-empty string
✅ get whitespace identifier (correctly threw) → Evaluator identifier must be a non-empty string

── 4. UPDATE ──────────────────────────────────────────────────
✅ update name only → {"id":"cmnbnq18z000g01siegvop9m8"}
✅ update by slug → {"id":"cmnbnq18z000g01siegvop9m8"}
✅ update with duplicate name (correctly threw) → evaluator with this name already exists in this project
✅ update config and output_schema → {"id":"cmnbnq18z000g01siegvop9m8"}
✅ update input_schema only → {"id":"cmnbnq18z000g01siegvop9m8"}
✅ update output_schema and config → {"id":"cmnbnq18z000g01siegvop9m8"}
✅ update all fields → {"id":"cmnbnq18z000g01siegvop9m8"}
✅ update with temperature=0 → {"id":"cmnbnq18z000g01siegvop9m8"}
✅ update config without output_schema (correctly threw) (correctly threw) → config and output_schema must be provided together
✅ update output_schema without config (correctly threw) (correctly threw) → config and output_schema must be provided together
✅ update with empty body (correctly threw) → at least one field must be provided for evaluator patch
✅ update non-existent evaluator (correctly threw) → Evaluator not found
✅ update empty identifier (correctly threw) → Evaluator identifier must be a non-empty string

── 5. RUN ─────────────────────────────────────────────────────
✅ run by ID with minimal input → {"result":{"passed":true,"reason":"The statement is clear, accurate, and provides a scientific explanation for the color of the sky."}}
✅ run by slug → {"result":{"passed":true,"reason":"The statement is factually correct and clearly conveys accurate information."}}
✅ run with multiple input fields → {"result":{"passed":true,"reason":"The statement is correct; water boils at 100 degrees Celsius at standard atmospheric pressure (1 atm)."}}
✅ run second evaluator → {"result":{"error":"Error code: 400"}}
✅ run with empty input (correctly threw) → input is required for custom evaluator execution
✅ run non-existent evaluator (correctly threw) → Evaluator not found
✅ run empty identifier (correctly threw) → Evaluator identifier must be a non-empty string
✅ run whitespace identifier (correctly threw) → Evaluator identifier must be a non-empty string

══════════════════════════════════════════════════════════════
Results: 46 passed, 0 failed
✅ All tests passed!
lena.badinter@MREMD84ACDDC sample-app %

@lenatraceloop lenatraceloop force-pushed the lena/standalone-evaluator-sdk-methods branch from 1c33acb to 97b92f7 Compare March 26, 2026 20:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts`:
- Around line 235-239: The run() return currently sets result: data which
returns the whole API response; change it to return the actual evaluation
payload by using data.result (with a safe fallback) so that executionId remains
executionId and result contains only the evaluation output; update the return in
the run() function to set result: data.result ?? data to preserve backward
compatibility if result is missing.
- Around line 207-209: The returned object is extracting the id from a nested
evaluator object but the API/test uses a flat evaluatorId field; in evaluator.ts
update the extraction after handleResponse to read data.evaluatorId (e.g. return
{ id: data.evaluatorId }) instead of data.evaluator?.id, and adjust any related
typing in the enclosing method (the function that currently does "const data =
await this.handleResponse(response); return { id: ... }") so it matches the
create() behavior and the test mock.
- Around line 175-201: The update() payload is incorrectly nesting fields under
payload.config and payload.config.llm_config (see payload.config,
config.llm_config) while tests expect the same flat structure as
create()/buildPayload(); fix by flattening the patch into the top-level payload
(move messages, provider, model, temperature, max_tokens, top_p,
frequency_penalty, presence_penalty, etc. out of payload.config and
payload.config.llm_config into payload) or simply reuse buildPayload(patch) from
buildPayload() to construct the request body so update() and create() share the
same flat contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9402e111-d7da-48b5-917a-7a8b41e2a6e1

📥 Commits

Reviewing files that changed from the base of the PR and between 85eacd9 and 97b92f7.

📒 Files selected for processing (5)
  • packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts
  • packages/traceloop-sdk/src/lib/client/traceloop-client.ts
  • packages/traceloop-sdk/src/lib/interfaces/evaluator.interface.ts
  • packages/traceloop-sdk/src/lib/node-server-sdk.ts
  • packages/traceloop-sdk/test/standalone-evaluators.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/traceloop-sdk/src/lib/client/traceloop-client.ts
  • packages/traceloop-sdk/src/lib/node-server-sdk.ts

Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts
Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts
Comment thread packages/traceloop-sdk/src/lib/client/evaluator/evaluator.ts
@lenatraceloop lenatraceloop force-pushed the lena/standalone-evaluator-sdk-methods branch 2 times, most recently from ee0d65c to 8a5e1e7 Compare March 26, 2026 21:02
@lenatraceloop lenatraceloop force-pushed the lena/standalone-evaluator-sdk-methods branch from 349c861 to c7f199d Compare March 29, 2026 12:13
@lenatraceloop lenatraceloop merged commit dd89e8d into main Mar 29, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants