-
Notifications
You must be signed in to change notification settings - Fork 67
feat(experiment): Add experiment capabilities #672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
b8bd068
init project
29d7670
add feature
25d3cce
wip
c57ec7a
wip
485e6cb
jsonl works
d7e460d
rows works
649d704
wip
31253a6
wip
3dcd132
trigger works
980acc3
with response
09babfd
wip
627be48
wip
b1b12a1
result is shown
20d23b9
git push
9b14fc8
correct
81bc188
fix
711962a
evaluator slug optional
75678eb
support evaluator name + verison
75c9ce2
add evaluator ids
a13783a
pr commq
caa5b1d
added taskInput
e95d21a
no stream client
ce184ea
comm
7f42a35
pretty
d345f30
change
0a38764
no logs
169ba47
with loader
8ae47c2
pretty
nina-kollman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| OpenLLMetry-JS is a JavaScript/TypeScript observability framework for LLM applications, built on OpenTelemetry. It provides instrumentation for major LLM providers (OpenAI, Anthropic, etc.) and vector databases, with a unified SDK for easy integration. | ||
|
|
||
| ## Development Commands | ||
|
|
||
| ### Building | ||
| ```bash | ||
| # Build all packages | ||
| pnpm nx run-many -t build | ||
| # or | ||
| pnpm nx run-many --targets=build | ||
|
|
||
| # Build affected packages only | ||
| pnpm nx affected -t build | ||
| ``` | ||
|
|
||
| ### Testing | ||
| Each package has its own test command: | ||
| ```bash | ||
| # Test individual packages | ||
| cd packages/traceloop-sdk | ||
| pnpm test | ||
|
|
||
| # Test specific instrumentation | ||
| cd packages/instrumentation-openai | ||
| pnpm test | ||
| ``` | ||
|
|
||
| ### Linting | ||
| ```bash | ||
| # Lint individual packages | ||
| cd packages/[package-name] | ||
| pnpm lint | ||
|
|
||
| # Fix lint issues | ||
| pnpm lint:fix | ||
| ``` | ||
|
|
||
| ### Publishing | ||
|
nina-kollman marked this conversation as resolved.
Outdated
|
||
| ```bash | ||
| pnpm nx run-many --targets=build | ||
| pnpm lerna publish --no-private | ||
| ``` | ||
|
|
||
| ## Architecture | ||
|
|
||
| ### Monorepo Structure | ||
| - **Lerna + Nx**: Manages multiple packages with shared tooling | ||
| - **packages/**: Contains all publishable packages and internal tooling | ||
| - **Rollup**: Used for building packages with TypeScript compilation | ||
|
|
||
| ### Core Packages | ||
|
|
||
| #### `traceloop-sdk` (Main SDK) | ||
| - **Path**: `packages/traceloop-sdk/` | ||
| - **Exports**: `@traceloop/node-server-sdk` | ||
| - **Purpose**: Primary entry point that orchestrates all instrumentations | ||
| - **Key Files**: | ||
| - `src/lib/tracing/decorators.ts`: Workflow and task decorators (`@workflow`, `@task`, `@agent`) | ||
| - `src/lib/tracing/tracing.ts`: Core tracing utilities and span management | ||
| - `src/lib/node-server-sdk.ts`: Main initialization logic | ||
|
|
||
| #### Instrumentation Packages | ||
| Each follows the pattern: `packages/instrumentation-[provider]/` | ||
| - **OpenAI**: `@traceloop/instrumentation-openai` | ||
| - **Anthropic**: `@traceloop/instrumentation-anthropic` | ||
| - **Bedrock**: `@traceloop/instrumentation-bedrock` | ||
| - **Vector DBs**: Pinecone, Chroma, Qdrant packages | ||
| - **Frameworks**: LangChain, LlamaIndex packages | ||
|
|
||
| #### `ai-semantic-conventions` | ||
| - **Path**: `packages/ai-semantic-conventions/` | ||
| - **Purpose**: OpenTelemetry semantic conventions for AI/LLM spans | ||
| - **Key File**: `src/SemanticAttributes.ts` - defines all span attribute constants | ||
|
|
||
| ### Instrumentation Pattern | ||
| All instrumentations extend `InstrumentationBase` from `@opentelemetry/instrumentation`: | ||
| 1. **Hook Registration**: Wrap target library functions using `InstrumentationModuleDefinition` | ||
| 2. **Span Creation**: Create spans with appropriate semantic attributes | ||
| 3. **Data Extraction**: Extract request/response data and token usage | ||
| 4. **Error Handling**: Capture and record errors appropriately | ||
|
|
||
| ### Testing Strategy | ||
| - **Polly.js**: Records HTTP interactions for consistent test execution | ||
| - **ts-mocha**: TypeScript test runner | ||
| - **Recordings**: Stored in `recordings/` folders for replay testing | ||
|
|
||
| ## Key Patterns | ||
|
|
||
| ### Workspace Dependencies | ||
| Packages reference each other using `workspace:*` in package.json, managed by pnpm workspaces. | ||
|
|
||
| ### Decorator Usage | ||
| ```typescript | ||
| // Workflow spans | ||
| @workflow("my-workflow") | ||
| async function myWorkflow() { } | ||
|
|
||
| // Task spans | ||
| @task("my-task") | ||
| async function myTask() { } | ||
| ``` | ||
|
|
||
| ### Manual Instrumentation | ||
| ```typescript | ||
| import { trace } from "@traceloop/node-server-sdk"; | ||
| const span = trace.withLLMSpan("my-llm-call", () => { | ||
| // LLM operations | ||
| }); | ||
| ``` | ||
|
|
||
| ### Telemetry Configuration | ||
| - Anonymous telemetry enabled by default | ||
| - Opt-out via `TRACELOOP_TELEMETRY=FALSE` environment variable | ||
| - Only collected in SDK, not individual instrumentations | ||
|
|
||
| ## Common Development Tasks | ||
|
|
||
| ### Adding New LLM Provider | ||
| 1. Create new instrumentation package in `packages/instrumentation-[provider]/` | ||
| 2. Implement instrumentation extending `InstrumentationBase` | ||
| 3. Add to main SDK dependencies in `packages/traceloop-sdk/package.json` | ||
| 4. Register in SDK initialization | ||
|
|
||
| ### Running Single Test | ||
| ```bash | ||
| cd packages/[package-name] | ||
| pnpm test -- --grep "test name pattern" | ||
| ``` | ||
|
|
||
| ### Debugging Instrumentations | ||
| Enable OpenTelemetry debug logging: | ||
| ```bash | ||
| export OTEL_LOG_LEVEL=debug | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| # Experiment Sample Application | ||
|
|
||
| This sample app demonstrates the new experiment functionality in OpenLLMetry-JS SDK. | ||
|
|
||
| ## 🚀 Quick Start | ||
|
|
||
| ### Prerequisites | ||
| - Node.js >= 14 | ||
| - pnpm | ||
|
|
||
| ### Setup | ||
| 1. Copy environment variables: | ||
| ```bash | ||
| cp .env.example .env | ||
| ``` | ||
|
|
||
| 2. Fill in your API keys in `.env`: | ||
| ```bash | ||
| TRACELOOP_API_KEY=your_traceloop_api_key_here | ||
| OPENAI_API_KEY=your_openai_api_key_here | ||
| ``` | ||
|
|
||
| 3. Build and run: | ||
| ```bash | ||
| # Run JSONL parsing tests (no API keys required) | ||
| npm run run:experiment_test | ||
|
|
||
| # Run experiment example (requires API keys) | ||
| npm run run:experiment | ||
| ``` | ||
|
|
||
| ## 🧪 Experiment Features Implemented | ||
|
|
||
| ### Core Components | ||
| - **Experiment Client**: `client.experiment.run(taskFunction, options)` | ||
| - **Evaluator Client**: `client.evaluator.runExperimentEvaluator(...)` | ||
| - **SSE Streaming**: Real-time progress updates via Server-Sent Events | ||
| - **Dataset Integration**: Works with existing dataset functionality | ||
| - **JSONL Support**: Parse dataset versions in JSONL format | ||
|
|
||
| ### Example Usage | ||
| ```typescript | ||
| import * as traceloop from "@traceloop/node-server-sdk"; | ||
|
|
||
| traceloop.initialize({ apiKey: "your-key", appName: "my-app" }); | ||
| const client = traceloop.getClient(); | ||
|
|
||
| // Define your experiment task | ||
| const myTask: ExperimentTaskFunction = async (input) => { | ||
| // Your task logic here | ||
| return { result: "processed", input }; | ||
| }; | ||
|
|
||
| // Run experiment | ||
| const results = await client.experiment.run(myTask, { | ||
| datasetSlug: "my-dataset", | ||
| datasetVersion: "v1", | ||
| evaluators: [{ name: "accuracy" }], | ||
| experimentSlug: "my-experiment", | ||
| stopOnError: false, | ||
| waitForResults: true, | ||
| concurrency: 3 | ||
| }); | ||
|
|
||
| console.log(\`Results: \${results.results.length}\`); | ||
| console.log(\`Errors: \${results.errors.length}\`); | ||
| ``` | ||
|
|
||
| ## 🐛 Debug Configuration | ||
|
|
||
| ### VS Code Debugging | ||
| Three debug configurations are available: | ||
|
|
||
| 1. **Debug Experiment Example**: Full debug with real API calls | ||
| 2. **Debug Simple Experiment Test**: Debug the JSONL parsing tests | ||
| 3. **Debug Experiment Example (Mock Mode)**: Debug with mocked API responses | ||
|
|
||
| ### Environment Variables | ||
| - \`MOCK_MODE=true\`: Enable mock responses for debugging without API keys | ||
| - \`DEBUG=traceloop:*\`: Enable debug logging | ||
| - \`OTEL_LOG_LEVEL=debug\`: Enable OpenTelemetry debug logs | ||
|
|
||
| ### Running in Mock Mode | ||
| ```bash | ||
| MOCK_MODE=true npm run run:experiment | ||
| ``` | ||
|
|
||
| ## 📁 Files Structure | ||
|
|
||
| - \`src/experiment_example.ts\`: Main experiment demonstration | ||
| - \`src/medical_prompts.ts\`: Example prompt templates for healthcare experiments | ||
| - \`src/simple_experiment_test.ts\`: JSONL parsing tests | ||
| - \`.vscode/launch.json\`: Debug configurations | ||
| - \`.vscode/tasks.json\`: Build and run tasks | ||
|
|
||
| ## 🔧 Available Scripts | ||
|
|
||
| - \`npm run build\`: Build TypeScript | ||
| - \`npm run run:experiment\`: Run experiment example | ||
| - \`npm run run:experiment_test\`: Run JSONL parsing tests | ||
| - \`npm run lint\`: Run ESLint | ||
| - \`npm run lint:fix\`: Fix ESLint issues | ||
|
|
||
| ## 📊 Experiment Types Demonstrated | ||
|
|
||
| ### Medical Question Experiments | ||
| - **Refuse Advice Strategy**: Redirects users to medical professionals | ||
| - **Provide Info Strategy**: Educational responses with disclaimers | ||
| - **Comparison**: Side-by-side evaluation of different approaches | ||
|
|
||
| ### Sentiment Analysis Experiment | ||
| - **Task**: Analyze text sentiment | ||
| - **Evaluators**: Accuracy and confidence calibration | ||
| - **Concurrency**: Demonstrates parallel processing | ||
|
|
||
| ## 🔗 Integration Points | ||
|
|
||
| The experiment feature integrates with: | ||
| - **Existing Dataset API**: Uses \`client.datasets.get()\` | ||
| - **TraceloopClient**: Available as \`client.experiment\` | ||
| - **OpenTelemetry**: All experiments are traced | ||
| - **Error Handling**: Configurable error propagation | ||
| - **Type Safety**: Full TypeScript support |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.