feat(sdk): add synonym field mapping for experiment evaluator inputs by lenatraceloop · Pull Request #895 · traceloop/openllmetry-js

lenatraceloop · 2026-03-17T10:10:16Z

Task output fields are now normalized before evaluator invocation.
If a task returns a synonym (e.g. 'answer', 'response') instead of
the canonical field name (e.g. 'completion', 'text'), the value is
remapped and the original key is removed — matching Python SDK behavior.

Also adds stopOnError support in runLocally, which was defined in the
interface but never consumed.

validateEvaluatorInput (in generated/evaluators/mbt-evaluators.ts) was intentionally left untouched. It is auto-generated code and would be overwritten on the next codegen run. It also serves a different purpose — a public low-level
utility for exact-field presence checks. Synonym normalization is applied upstream in the experiment flow (experiment.ts → validateAndNormalizeTaskOutput) before task output reaches any evaluator, so validateEvaluatorInput always
receives already-normalized fields.

Summary by CodeRabbit

New Features
- Field synonym support: evaluator inputs accept multiple field name variants (e.g., answer, completion, response) with automatic normalization.
- Task output validation: required fields are validated and normalized with actionable suggestions when missing.
- Local run control: new stopOnError option to halt on first per-row error.
Tests
- Comprehensive unit tests added for synonym mapping, normalization, validation, suggestions, and help formatting.

CLAassistant · 2026-03-17T10:10:25Z

All committers have signed the CLA.

coderabbitai · 2026-03-17T10:10:36Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a field synonym mapping and normalization module for evaluator inputs, integrates validation/normalization into the experiment execution flow (local and GitHub), introduces per-row error handling and a stopOnError option, and adds unit tests covering synonym resolution, normalization, suggestions, and errors.

Changes

Cohort / File(s)	Summary
Field Mapping System `packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts`	New module: defines synonym groups, provides `getFieldSynonyms`, `normalizeTaskOutput`, `getFieldSuggestions`, `formatFieldHelp`, `validateAndNormalizeTaskOutput`, and `EvaluatorWithRequiredFields` interface; builds normalized outputs and actionable missing-field errors.
Experiment Integration `packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`	Integrates `validateAndNormalizeTaskOutput` into task execution paths (local & GitHub), updates `executeTasksLocally` signature to accept evaluators for validation, adds per-row try/catch error handling, and introduces `stopOnError` local-run option.
Field Mapping Tests `packages/traceloop-sdk/test/field-mapping.test.ts`	New comprehensive tests for synonym lookup, normalization behavior (including exact-match precedence), suggestion/help formatting, and `validateAndNormalizeTaskOutput` error and success cases.

Sequence Diagram

sequenceDiagram
    participant Exp as Experiment
    participant FM as FieldMapping
    participant Eval as Evaluators

    Exp->>Exp: For each task row
    Exp->>FM: validateAndNormalizeTaskOutput(taskOutput, evaluators)
    FM->>FM: Aggregate requiredInputFields
    FM->>FM: normalizeTaskOutput(taskOutput, requiredFields)
    FM->>FM: Validate per-evaluator requirements

    alt Missing required fields
        FM-->>Exp: Throw detailed error with suggestions & help
    else All required fields satisfied
        FM-->>Exp: Return normalized output
        Exp->>Eval: Run evaluators with normalized output
        Eval-->>Exp: Return evaluation results
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped through synonym lanes today,
Mapped answers, prompts, and text in play,
Normalized outputs, tidy and neat,
Per-row checks with errors kept sweet,
A rabbit's cheer for fields aligned — hooray! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(sdk): add synonym field mapping for experiment evaluator inputs' accurately and specifically describes the main change—adding a synonym field mapping system for evaluator inputs in the SDK.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch scratch/lena.badinter/js-sdk-evaluators-align-python

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/sample-app/src/sample_synonym_experiment.ts`:
- Around line 105-109: The forEach callback on the errors array uses an
expression-bodied arrow that returns console.log and triggers the lint rule;
change the callback passed to errors.forEach in sample_synonym_experiment.ts to
a block-bodied function (e.g., errors.forEach((e: string, i: number) => {
console.log(`  ${i + 1}. ${e}`); });) so the callback does not implicitly return
the console.log call and the lint warning is avoided.

In `@packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts`:
- Around line 44-76: The membership checks using the `in` operator on taskOutput
and normalized can capture inherited prototype properties and drop legitimate
passthrough fields; update the checks in the first pass and second pass to use
own-property checks (e.g., replace `requiredField in taskOutput`, `synonym in
taskOutput`, and `key in normalized` with
Object.prototype.hasOwnProperty.call(taskOutput, requiredField),
Object.prototype.hasOwnProperty.call(taskOutput, synonym), and
Object.prototype.hasOwnProperty.call(normalized, key) respectively) so mapping
in requiredFields, synonym lookup in getSynonyms, and passthrough preservation
into normalized only consider own properties; keep mappedKeys logic unchanged.

In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`:
- Around line 144-198: The catch block currently only pushes error messages to
taskErrors so failed rows are omitted from taskResults; update the catch to
construct and push a TaskResponse for the failed row (set input to row, output
to null/empty, include metadata.rowId and timestamps, and set an error field to
the derived errorMsg) before pushing to taskErrors, and only rethrow if
stopOnError is true; this mirrors executeTasksLocally() behavior and makes the
TaskResponse.error branch reachable for local runs (refer to TaskResponse,
taskResults, taskErrors, stopOnError, and executeTasksLocally).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 863ad48b-c499-45fb-a541-2ec680cd7a9a

📥 Commits

Reviewing files that changed from the base of the PR and between 9df89a0 and 2646577.

⛔ Files ignored due to path filters (1)

packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts is excluded by !**/generated/**

📒 Files selected for processing (5)

packages/sample-app/package.json
packages/sample-app/src/sample_synonym_experiment.ts
packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/test/field-mapping.test.ts

coderabbitai

🧹 Nitpick comments (1)

packages/sample-app/src/sample_synonym_experiment.ts (1)

34-43: Consider validating TRACELOOP_API_KEY for consistency.

datasetSlug is validated with a helpful error message, but TRACELOOP_API_KEY uses a non-null assertion without validation. If the key is missing, the SDK may throw a less informative error downstream.

💡 Suggested improvement

 if (!datasetSlug) {
   console.error("TRACELOOP_DATASET_SLUG is required");
   process.exit(1);
 }

+if (!process.env.TRACELOOP_API_KEY) {
+  console.error("TRACELOOP_API_KEY is required");
+  process.exit(1);
+}
+
 const traceloop = new TraceloopClient({
-  apiKey: process.env.TRACELOOP_API_KEY!,
+  apiKey: process.env.TRACELOOP_API_KEY,
   appName: "synonym-field-mapping-sample",
   baseUrl: process.env.TRACELOOP_API_ENDPOINT,
 });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/sample-app/src/sample_synonym_experiment.ts` around lines 34 - 43,
Validate the TRACELOOP_API_KEY before constructing TraceloopClient: check
process.env.TRACELOOP_API_KEY (the symbol referenced in the TraceloopClient
instantiation) and if missing, log a clear error (similar to the datasetSlug
check) and exit process with code 1; then pass the validated key into new
TraceloopClient so the non-null assertion is removed and downstream errors are
avoided.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/sample-app/src/sample_synonym_experiment.ts`:
- Around line 34-43: Validate the TRACELOOP_API_KEY before constructing
TraceloopClient: check process.env.TRACELOOP_API_KEY (the symbol referenced in
the TraceloopClient instantiation) and if missing, log a clear error (similar to
the datasetSlug check) and exit process with code 1; then pass the validated key
into new TraceloopClient so the non-null assertion is removed and downstream
errors are avoided.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c4521a0-474b-418f-86af-c460e1219899

📥 Commits

Reviewing files that changed from the base of the PR and between 2646577 and aa4f476.

📒 Files selected for processing (2)

packages/sample-app/src/sample_synonym_experiment.ts
packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)

134-142: Consider extracting evaluator-validation filtering into one helper.

The same filter logic is duplicated in local and GitHub flows; a shared helper would reduce drift risk.

Also applies to: 507-512

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts` around lines
134 - 142, Extract the duplicated filter into a single helper like
`getEvaluatorsForValidation(evaluators: unknown[]):
EvaluatorWithRequiredFields[]` and replace both occurrences that currently
compute `evaluatorsForValidation` (the block using the type guard `(e): e is
EvaluatorWithConfig => typeof e === "object" && !!(e as
EvaluatorWithConfig).requiredInputFields?.length`) with calls to that helper;
ensure the helper returns the same `EvaluatorWithRequiredFields[]` type and
lives in a shared module so both the local and GitHub flows can import it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`:
- Around line 507-512: In runInGithub, ensure evaluators are validated before
any filter/map that assumes objects: guard against malformed entries (null,
non-object) by invoking validateRunOptions or adding an explicit check on the
evaluators array and each evaluator (e.g., confirm typeof evaluator === "object"
&& evaluator !== null) before using
EvaluatorWithConfig/EvaluatorWithRequiredFields logic; if validation fails,
throw a clear validation error instead of letting lines that reference (e as
EvaluatorWithConfig).requiredInputFields or similar throw a runtime exception.

---

Nitpick comments:
In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`:
- Around line 134-142: Extract the duplicated filter into a single helper like
`getEvaluatorsForValidation(evaluators: unknown[]):
EvaluatorWithRequiredFields[]` and replace both occurrences that currently
compute `evaluatorsForValidation` (the block using the type guard `(e): e is
EvaluatorWithConfig => typeof e === "object" && !!(e as
EvaluatorWithConfig).requiredInputFields?.length`) with calls to that helper;
ensure the helper returns the same `EvaluatorWithRequiredFields[]` type and
lives in a shared module so both the local and GitHub flows can import it.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 079efb6a-871b-4f72-af71-67d15a62df0d

📥 Commits

Reviewing files that changed from the base of the PR and between aa4f476 and 841af9c.

📒 Files selected for processing (2)

packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)

135-142: Extract normalization prep into one helper to avoid logic drift.

The evaluator filtering + normalization block is duplicated in two execution paths. A shared helper will keep behavior aligned and reduce maintenance risk.

♻️ Proposed refactor

+  private getEvaluatorsForValidation(
+    evaluators: EvaluatorDetails[],
+  ): EvaluatorWithRequiredFields[] {
+    return evaluators.filter(
+      (e): e is EvaluatorWithConfig =>
+        typeof e === "object" &&
+        !!(e as EvaluatorWithConfig).requiredInputFields?.length,
+    );
+  }
+
+  private normalizeTaskOutputIfNeeded(
+    rawOutput: unknown,
+    evaluatorsForValidation: EvaluatorWithRequiredFields[],
+  ): Record<string, any> {
+    const output = rawOutput as Record<string, any>;
+    return evaluatorsForValidation.length > 0
+      ? (validateAndNormalizeTaskOutput(
+          output,
+          evaluatorsForValidation,
+        ) as Record<string, any>)
+      : output;
+  }
@@
-      const evaluatorsForValidation: EvaluatorWithRequiredFields[] =
-        evaluators.filter(
-          (e): e is EvaluatorWithConfig =>
-            typeof e === "object" &&
-            !!(e as EvaluatorWithConfig).requiredInputFields?.length,
-        );
+      const evaluatorsForValidation =
+        this.getEvaluatorsForValidation(evaluators);
@@
-          let taskOutput = rawOutput as Record<string, any>;
-          if (evaluatorsForValidation.length > 0) {
-            taskOutput = validateAndNormalizeTaskOutput(
-              taskOutput,
-              evaluatorsForValidation,
-            ) as Record<string, any>;
-          }
+          const taskOutput = this.normalizeTaskOutputIfNeeded(
+            rawOutput,
+            evaluatorsForValidation,
+          );
@@
-    const evaluatorsForValidation = evaluators.filter(
-      (e): e is EvaluatorWithConfig =>
-        typeof e === "object" &&
-        !!(e as EvaluatorWithConfig).requiredInputFields?.length,
-    );
+    const evaluatorsForValidation =
+      this.getEvaluatorsForValidation(evaluators);
@@
-          let output = rawOutput as Record<string, any>;
-          if (evaluatorsForValidation.length > 0) {
-            output = validateAndNormalizeTaskOutput(
-              output,
-              evaluatorsForValidation,
-            ) as Record<string, any>;
-          }
+          const output = this.normalizeTaskOutputIfNeeded(
+            rawOutput,
+            evaluatorsForValidation,
+          );

Also applies to: 150-157, 436-440, 447-454

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts` around lines
135 - 142, The evaluator filtering + normalization logic is duplicated; extract
it into a single helper (e.g., prepareEvaluatorsForValidation or
filterAndNormalizeEvaluators) that accepts the evaluators array and returns
EvaluatorWithRequiredFields[] by applying the current type guard (typeof e ===
"object" && !!(e as EvaluatorWithConfig).requiredInputFields?.length) and any
normalization steps; replace the inline blocks that build
evaluatorsForValidation (the occurrences around the variable
evaluatorsForValidation and the duplicated blocks at the other noted locations)
with calls to this helper so all paths use the same logic and keep behavior
aligned.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts`:
- Around line 177-178: Prettier is failing due to mixed string
concatenation/quotes for the message assembly in field-mapping.ts; replace the
two separate additions to the variable message with a single properly formatted
template literal that includes both the Task output and Hint text (e.g., use
message += `\n\nTask output contains:
${JSON.stringify(Object.keys(taskOutput))}\n\nHint: Update your task function to
return an object with the required fields.`) so the block is consistently
formatted and passes pnpm prettier --check ..

---

Nitpick comments:
In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`:
- Around line 135-142: The evaluator filtering + normalization logic is
duplicated; extract it into a single helper (e.g.,
prepareEvaluatorsForValidation or filterAndNormalizeEvaluators) that accepts the
evaluators array and returns EvaluatorWithRequiredFields[] by applying the
current type guard (typeof e === "object" && !!(e as
EvaluatorWithConfig).requiredInputFields?.length) and any normalization steps;
replace the inline blocks that build evaluatorsForValidation (the occurrences
around the variable evaluatorsForValidation and the duplicated blocks at the
other noted locations) with calls to this helper so all paths use the same logic
and keep behavior aligned.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 37bd8558-eb0d-46c6-98c5-bc4d5f775439

📥 Commits

Reviewing files that changed from the base of the PR and between 841af9c and 88a9dd9.

📒 Files selected for processing (3)

packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/test/field-mapping.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

packages/traceloop-sdk/test/field-mapping.test.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`:
- Around line 137-142: The filter predicate around evaluators should validate
that requiredInputFields is an actual string[] before treating the evaluator as
an EvaluatorWithConfig: replace the inline check in the evaluators.filter calls
with a reusable type guard (e.g., isEvaluatorWithRequiredFields or
isEvaluatorWithConfig) that returns true only when typeof e === "object" &&
Array.isArray((e as any).requiredInputFields) && (e as
any).requiredInputFields.every(item => typeof item === "string"); update both
call sites (the current filter at the top and the similar one around lines
436-440) to use that type guard so normalization/validation only runs on
safely-typed requiredInputFields.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 16a19fee-ea0a-406a-9933-0c14e3ba888e

📥 Commits

Reviewing files that changed from the base of the PR and between 88a9dd9 and 11a2212.

⛔ Files ignored due to path filters (1)

packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.ts is excluded by !**/generated/**

📒 Files selected for processing (3)

packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts
packages/traceloop-sdk/test/field-mapping.test.ts

🚧 Files skipped from review as they are similar to previous changes (2)

packages/traceloop-sdk/test/field-mapping.test.ts
packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts

nina-kollman

I can also see that the require

…895)

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

Comment thread packages/sample-app/src/sample_synonym_experiment.ts Outdated

Comment thread packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts

Comment thread packages/traceloop-sdk/src/lib/client/experiment/experiment.ts

lenatraceloop added a commit that referenced this pull request Mar 17, 2026

fix(sdk): address CodeRabbit review comments on PR #895

aa4f476

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

lenatraceloop added a commit that referenced this pull request Mar 17, 2026

fix(sdk): address CodeRabbit review comments 2 on PR #895

841af9c

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

Comment thread packages/traceloop-sdk/src/lib/client/experiment/experiment.ts Outdated

nina-kollman reviewed Mar 17, 2026

View reviewed changes

Comment thread packages/sample-app/src/sample_synonym_experiment.ts Outdated

nina-kollman reviewed Mar 17, 2026

View reviewed changes

Comment thread packages/sample-app/src/sample_synonym_experiment.ts Outdated

nina-kollman reviewed Mar 17, 2026

View reviewed changes

Comment thread packages/sample-app/package.json

nina-kollman reviewed Mar 17, 2026

View reviewed changes

Comment thread packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts

nina-kollman reviewed Mar 17, 2026

View reviewed changes

Comment thread packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts Outdated