feat(sdk): add synonym field mapping for experiment evaluator inputs#895
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a field synonym mapping and normalization module for evaluator inputs, integrates validation/normalization into the experiment execution flow (local and GitHub), introduces per-row error handling and a Changes
Sequence DiagramsequenceDiagram
participant Exp as Experiment
participant FM as FieldMapping
participant Eval as Evaluators
Exp->>Exp: For each task row
Exp->>FM: validateAndNormalizeTaskOutput(taskOutput, evaluators)
FM->>FM: Aggregate requiredInputFields
FM->>FM: normalizeTaskOutput(taskOutput, requiredFields)
FM->>FM: Validate per-evaluator requirements
alt Missing required fields
FM-->>Exp: Throw detailed error with suggestions & help
else All required fields satisfied
FM-->>Exp: Return normalized output
Exp->>Eval: Run evaluators with normalized output
Eval-->>Exp: Return evaluation results
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/sample-app/src/sample_synonym_experiment.ts`:
- Around line 105-109: The forEach callback on the errors array uses an
expression-bodied arrow that returns console.log and triggers the lint rule;
change the callback passed to errors.forEach in sample_synonym_experiment.ts to
a block-bodied function (e.g., errors.forEach((e: string, i: number) => {
console.log(` ${i + 1}. ${e}`); });) so the callback does not implicitly return
the console.log call and the lint warning is avoided.
In `@packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts`:
- Around line 44-76: The membership checks using the `in` operator on taskOutput
and normalized can capture inherited prototype properties and drop legitimate
passthrough fields; update the checks in the first pass and second pass to use
own-property checks (e.g., replace `requiredField in taskOutput`, `synonym in
taskOutput`, and `key in normalized` with
Object.prototype.hasOwnProperty.call(taskOutput, requiredField),
Object.prototype.hasOwnProperty.call(taskOutput, synonym), and
Object.prototype.hasOwnProperty.call(normalized, key) respectively) so mapping
in requiredFields, synonym lookup in getSynonyms, and passthrough preservation
into normalized only consider own properties; keep mappedKeys logic unchanged.
In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`:
- Around line 144-198: The catch block currently only pushes error messages to
taskErrors so failed rows are omitted from taskResults; update the catch to
construct and push a TaskResponse for the failed row (set input to row, output
to null/empty, include metadata.rowId and timestamps, and set an error field to
the derived errorMsg) before pushing to taskErrors, and only rethrow if
stopOnError is true; this mirrors executeTasksLocally() behavior and makes the
TaskResponse.error branch reachable for local runs (refer to TaskResponse,
taskResults, taskErrors, stopOnError, and executeTasksLocally).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 863ad48b-c499-45fb-a541-2ec680cd7a9a
⛔ Files ignored due to path filters (1)
packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.tsis excluded by!**/generated/**
📒 Files selected for processing (5)
packages/sample-app/package.jsonpackages/sample-app/src/sample_synonym_experiment.tspackages/traceloop-sdk/src/lib/client/evaluator/field-mapping.tspackages/traceloop-sdk/src/lib/client/experiment/experiment.tspackages/traceloop-sdk/test/field-mapping.test.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/sample-app/src/sample_synonym_experiment.ts (1)
34-43: Consider validatingTRACELOOP_API_KEYfor consistency.
datasetSlugis validated with a helpful error message, butTRACELOOP_API_KEYuses a non-null assertion without validation. If the key is missing, the SDK may throw a less informative error downstream.💡 Suggested improvement
if (!datasetSlug) { console.error("TRACELOOP_DATASET_SLUG is required"); process.exit(1); } +if (!process.env.TRACELOOP_API_KEY) { + console.error("TRACELOOP_API_KEY is required"); + process.exit(1); +} + const traceloop = new TraceloopClient({ - apiKey: process.env.TRACELOOP_API_KEY!, + apiKey: process.env.TRACELOOP_API_KEY, appName: "synonym-field-mapping-sample", baseUrl: process.env.TRACELOOP_API_ENDPOINT, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/sample-app/src/sample_synonym_experiment.ts` around lines 34 - 43, Validate the TRACELOOP_API_KEY before constructing TraceloopClient: check process.env.TRACELOOP_API_KEY (the symbol referenced in the TraceloopClient instantiation) and if missing, log a clear error (similar to the datasetSlug check) and exit process with code 1; then pass the validated key into new TraceloopClient so the non-null assertion is removed and downstream errors are avoided.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/sample-app/src/sample_synonym_experiment.ts`:
- Around line 34-43: Validate the TRACELOOP_API_KEY before constructing
TraceloopClient: check process.env.TRACELOOP_API_KEY (the symbol referenced in
the TraceloopClient instantiation) and if missing, log a clear error (similar to
the datasetSlug check) and exit process with code 1; then pass the validated key
into new TraceloopClient so the non-null assertion is removed and downstream
errors are avoided.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3c4521a0-474b-418f-86af-c460e1219899
📒 Files selected for processing (2)
packages/sample-app/src/sample_synonym_experiment.tspackages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)
134-142: Consider extracting evaluator-validation filtering into one helper.The same filter logic is duplicated in local and GitHub flows; a shared helper would reduce drift risk.
Also applies to: 507-512
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts` around lines 134 - 142, Extract the duplicated filter into a single helper like `getEvaluatorsForValidation(evaluators: unknown[]): EvaluatorWithRequiredFields[]` and replace both occurrences that currently compute `evaluatorsForValidation` (the block using the type guard `(e): e is EvaluatorWithConfig => typeof e === "object" && !!(e as EvaluatorWithConfig).requiredInputFields?.length`) with calls to that helper; ensure the helper returns the same `EvaluatorWithRequiredFields[]` type and lives in a shared module so both the local and GitHub flows can import it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`:
- Around line 507-512: In runInGithub, ensure evaluators are validated before
any filter/map that assumes objects: guard against malformed entries (null,
non-object) by invoking validateRunOptions or adding an explicit check on the
evaluators array and each evaluator (e.g., confirm typeof evaluator === "object"
&& evaluator !== null) before using
EvaluatorWithConfig/EvaluatorWithRequiredFields logic; if validation fails,
throw a clear validation error instead of letting lines that reference (e as
EvaluatorWithConfig).requiredInputFields or similar throw a runtime exception.
---
Nitpick comments:
In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`:
- Around line 134-142: Extract the duplicated filter into a single helper like
`getEvaluatorsForValidation(evaluators: unknown[]):
EvaluatorWithRequiredFields[]` and replace both occurrences that currently
compute `evaluatorsForValidation` (the block using the type guard `(e): e is
EvaluatorWithConfig => typeof e === "object" && !!(e as
EvaluatorWithConfig).requiredInputFields?.length`) with calls to that helper;
ensure the helper returns the same `EvaluatorWithRequiredFields[]` type and
lives in a shared module so both the local and GitHub flows can import it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 079efb6a-871b-4f72-af71-67d15a62df0d
📒 Files selected for processing (2)
packages/traceloop-sdk/src/lib/client/experiment/experiment.tspackages/traceloop-sdk/src/lib/interfaces/experiment.interface.ts
8a6cb52 to
50cfab3
Compare
50cfab3 to
88a9dd9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/traceloop-sdk/src/lib/client/experiment/experiment.ts (1)
135-142: Extract normalization prep into one helper to avoid logic drift.The evaluator filtering + normalization block is duplicated in two execution paths. A shared helper will keep behavior aligned and reduce maintenance risk.
♻️ Proposed refactor
+ private getEvaluatorsForValidation( + evaluators: EvaluatorDetails[], + ): EvaluatorWithRequiredFields[] { + return evaluators.filter( + (e): e is EvaluatorWithConfig => + typeof e === "object" && + !!(e as EvaluatorWithConfig).requiredInputFields?.length, + ); + } + + private normalizeTaskOutputIfNeeded( + rawOutput: unknown, + evaluatorsForValidation: EvaluatorWithRequiredFields[], + ): Record<string, any> { + const output = rawOutput as Record<string, any>; + return evaluatorsForValidation.length > 0 + ? (validateAndNormalizeTaskOutput( + output, + evaluatorsForValidation, + ) as Record<string, any>) + : output; + } @@ - const evaluatorsForValidation: EvaluatorWithRequiredFields[] = - evaluators.filter( - (e): e is EvaluatorWithConfig => - typeof e === "object" && - !!(e as EvaluatorWithConfig).requiredInputFields?.length, - ); + const evaluatorsForValidation = + this.getEvaluatorsForValidation(evaluators); @@ - let taskOutput = rawOutput as Record<string, any>; - if (evaluatorsForValidation.length > 0) { - taskOutput = validateAndNormalizeTaskOutput( - taskOutput, - evaluatorsForValidation, - ) as Record<string, any>; - } + const taskOutput = this.normalizeTaskOutputIfNeeded( + rawOutput, + evaluatorsForValidation, + ); @@ - const evaluatorsForValidation = evaluators.filter( - (e): e is EvaluatorWithConfig => - typeof e === "object" && - !!(e as EvaluatorWithConfig).requiredInputFields?.length, - ); + const evaluatorsForValidation = + this.getEvaluatorsForValidation(evaluators); @@ - let output = rawOutput as Record<string, any>; - if (evaluatorsForValidation.length > 0) { - output = validateAndNormalizeTaskOutput( - output, - evaluatorsForValidation, - ) as Record<string, any>; - } + const output = this.normalizeTaskOutputIfNeeded( + rawOutput, + evaluatorsForValidation, + );Also applies to: 150-157, 436-440, 447-454
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts` around lines 135 - 142, The evaluator filtering + normalization logic is duplicated; extract it into a single helper (e.g., prepareEvaluatorsForValidation or filterAndNormalizeEvaluators) that accepts the evaluators array and returns EvaluatorWithRequiredFields[] by applying the current type guard (typeof e === "object" && !!(e as EvaluatorWithConfig).requiredInputFields?.length) and any normalization steps; replace the inline blocks that build evaluatorsForValidation (the occurrences around the variable evaluatorsForValidation and the duplicated blocks at the other noted locations) with calls to this helper so all paths use the same logic and keep behavior aligned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts`:
- Around line 177-178: Prettier is failing due to mixed string
concatenation/quotes for the message assembly in field-mapping.ts; replace the
two separate additions to the variable message with a single properly formatted
template literal that includes both the Task output and Hint text (e.g., use
message += `\n\nTask output contains:
${JSON.stringify(Object.keys(taskOutput))}\n\nHint: Update your task function to
return an object with the required fields.`) so the block is consistently
formatted and passes pnpm prettier --check ..
---
Nitpick comments:
In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`:
- Around line 135-142: The evaluator filtering + normalization logic is
duplicated; extract it into a single helper (e.g.,
prepareEvaluatorsForValidation or filterAndNormalizeEvaluators) that accepts the
evaluators array and returns EvaluatorWithRequiredFields[] by applying the
current type guard (typeof e === "object" && !!(e as
EvaluatorWithConfig).requiredInputFields?.length) and any normalization steps;
replace the inline blocks that build evaluatorsForValidation (the occurrences
around the variable evaluatorsForValidation and the duplicated blocks at the
other noted locations) with calls to this helper so all paths use the same logic
and keep behavior aligned.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 37bd8558-eb0d-46c6-98c5-bc4d5f775439
📒 Files selected for processing (3)
packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.tspackages/traceloop-sdk/src/lib/client/experiment/experiment.tspackages/traceloop-sdk/test/field-mapping.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/traceloop-sdk/test/field-mapping.test.ts
eb1d8c6 to
11a2212
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/traceloop-sdk/src/lib/client/experiment/experiment.ts`:
- Around line 137-142: The filter predicate around evaluators should validate
that requiredInputFields is an actual string[] before treating the evaluator as
an EvaluatorWithConfig: replace the inline check in the evaluators.filter calls
with a reusable type guard (e.g., isEvaluatorWithRequiredFields or
isEvaluatorWithConfig) that returns true only when typeof e === "object" &&
Array.isArray((e as any).requiredInputFields) && (e as
any).requiredInputFields.every(item => typeof item === "string"); update both
call sites (the current filter at the top and the similar one around lines
436-440) to use that type guard so normalization/validation only runs on
safely-typed requiredInputFields.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 16a19fee-ea0a-406a-9933-0c14e3ba888e
⛔ Files ignored due to path filters (1)
packages/traceloop-sdk/src/lib/generated/evaluators/mbt-evaluators.tsis excluded by!**/generated/**
📒 Files selected for processing (3)
packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.tspackages/traceloop-sdk/src/lib/client/experiment/experiment.tspackages/traceloop-sdk/test/field-mapping.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/traceloop-sdk/test/field-mapping.test.ts
- packages/traceloop-sdk/src/lib/client/evaluator/field-mapping.ts
cc56053 to
720e31b
Compare
80e9acd to
11643bd
Compare
23df3b5 to
c062cc5
Compare
nina-kollman
left a comment
There was a problem hiding this comment.
I can also see that the require
c062cc5 to
a9afe53
Compare
Task output fields are now normalized before evaluator invocation.
If a task returns a synonym (e.g. 'answer', 'response') instead of
the canonical field name (e.g. 'completion', 'text'), the value is
remapped and the original key is removed — matching Python SDK behavior.
Also adds stopOnError support in runLocally, which was defined in the
interface but never consumed.
validateEvaluatorInput (in generated/evaluators/mbt-evaluators.ts) was intentionally left untouched. It is auto-generated code and would be overwritten on the next codegen run. It also serves a different purpose — a public low-level
utility for exact-field presence checks. Synonym normalization is applied upstream in the experiment flow (experiment.ts → validateAndNormalizeTaskOutput) before task output reaches any evaluator, so validateEvaluatorInput always
receives already-normalized fields.
Summary by CodeRabbit
New Features
Tests