traceloop · nina-kollman · Aug 22, 2025 · Aug 22, 2025 · Aug 22, 2025 · Aug 22, 2025
diff --git a/.gitignore b/.gitignore
@@ -20,6 +20,7 @@ node_modules
 
 # IDE - VSCode
 .vscode/*
+.vscode/
 !.vscode/settings.json
 !.vscode/tasks.json
 !.vscode/launch.json
@@ -46,4 +47,7 @@ Thumbs.db
 # env
 .env*
 chroma.sqlite3
-chroma.log
+chroma.log
+
+# claude
+.claude
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,140 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+OpenLLMetry-JS is a JavaScript/TypeScript observability framework for LLM applications, built on OpenTelemetry. It provides instrumentation for major LLM providers (OpenAI, Anthropic, etc.) and vector databases, with a unified SDK for easy integration.
+
+## Development Commands
+
+### Building
+```bash
+# Build all packages
+pnpm nx run-many -t build
+# or
+pnpm nx run-many --targets=build
+
+# Build affected packages only
+pnpm nx affected -t build
+```
+
+### Testing
+Each package has its own test command:
+```bash
+# Test individual packages
+cd packages/traceloop-sdk
+pnpm test
+
+# Test specific instrumentation
+cd packages/instrumentation-openai
+pnpm test
+```
+
+### Linting
+```bash
+# Lint individual packages
+cd packages/[package-name]
+pnpm lint
+
+# Fix lint issues
+pnpm lint:fix
+```
+
+### Publishing
+```bash
+pnpm nx run-many --targets=build
+pnpm lerna publish --no-private
+```
+
+## Architecture
+
+### Monorepo Structure
+- **Lerna + Nx**: Manages multiple packages with shared tooling
+- **packages/**: Contains all publishable packages and internal tooling
+- **Rollup**: Used for building packages with TypeScript compilation
+
+### Core Packages
+
+#### `traceloop-sdk` (Main SDK)
+- **Path**: `packages/traceloop-sdk/`
+- **Exports**: `@traceloop/node-server-sdk`
+- **Purpose**: Primary entry point that orchestrates all instrumentations
+- **Key Files**:
+  - `src/lib/tracing/decorators.ts`: Workflow and task decorators (`@workflow`, `@task`, `@agent`)
+  - `src/lib/tracing/tracing.ts`: Core tracing utilities and span management
+  - `src/lib/node-server-sdk.ts`: Main initialization logic
+
+#### Instrumentation Packages
+Each follows the pattern: `packages/instrumentation-[provider]/`
+- **OpenAI**: `@traceloop/instrumentation-openai`
+- **Anthropic**: `@traceloop/instrumentation-anthropic`  
+- **Bedrock**: `@traceloop/instrumentation-bedrock`
+- **Vector DBs**: Pinecone, Chroma, Qdrant packages
+- **Frameworks**: LangChain, LlamaIndex packages
+
+#### `ai-semantic-conventions`
+- **Path**: `packages/ai-semantic-conventions/`
+- **Purpose**: OpenTelemetry semantic conventions for AI/LLM spans
+- **Key File**: `src/SemanticAttributes.ts` - defines all span attribute constants
+
+### Instrumentation Pattern
+All instrumentations extend `InstrumentationBase` from `@opentelemetry/instrumentation`:
+1. **Hook Registration**: Wrap target library functions using `InstrumentationModuleDefinition`
+2. **Span Creation**: Create spans with appropriate semantic attributes
+3. **Data Extraction**: Extract request/response data and token usage
+4. **Error Handling**: Capture and record errors appropriately
+
+### Testing Strategy
+- **Polly.js**: Records HTTP interactions for consistent test execution
+- **ts-mocha**: TypeScript test runner
+- **Recordings**: Stored in `recordings/` folders for replay testing
+
+## Key Patterns
+
+### Workspace Dependencies
+Packages reference each other using `workspace:*` in package.json, managed by pnpm workspaces.
+
+### Decorator Usage
+```typescript
+// Workflow spans
+@workflow("my-workflow")
+async function myWorkflow() { }
+
+// Task spans  
+@task("my-task")
+async function myTask() { }
+```
+
+### Manual Instrumentation
+```typescript
+import { trace } from "@traceloop/node-server-sdk";
+const span = trace.withLLMSpan("my-llm-call", () => {
+  // LLM operations
+});
+```
+
+### Telemetry Configuration
+- Anonymous telemetry enabled by default
+- Opt-out via `TRACELOOP_TELEMETRY=FALSE` environment variable
+- Only collected in SDK, not individual instrumentations
+
+## Common Development Tasks
+
+### Adding New LLM Provider
+1. Create new instrumentation package in `packages/instrumentation-[provider]/`
+2. Implement instrumentation extending `InstrumentationBase`
+3. Add to main SDK dependencies in `packages/traceloop-sdk/package.json`
+4. Register in SDK initialization
+
+### Running Single Test
+```bash
+cd packages/[package-name]
+pnpm test -- --grep "test name pattern"
+```
+
+### Debugging Instrumentations
+Enable OpenTelemetry debug logging:
+```bash
+export OTEL_LOG_LEVEL=debug
+```
diff --git a/packages/sample-app/README.md b/packages/sample-app/README.md
@@ -0,0 +1,123 @@
+# Experiment Sample Application
+
+This sample app demonstrates the new experiment functionality in OpenLLMetry-JS SDK.
+
+## 🚀 Quick Start
+
+### Prerequisites
+- Node.js >= 14
+- pnpm
+
+### Setup
+1. Copy environment variables:
+   ```bash
+   cp .env.example .env
+   ```
+
+2. Fill in your API keys in `.env`:
+   ```bash
+   TRACELOOP_API_KEY=your_traceloop_api_key_here
+   OPENAI_API_KEY=your_openai_api_key_here
+   ```
+
+3. Build and run:
+   ```bash
+   # Run JSONL parsing tests (no API keys required)
+   npm run run:experiment_test
+
+   # Run experiment example (requires API keys)
+   npm run run:experiment
+   ```
+
+## 🧪 Experiment Features Implemented
+
+### Core Components
+- **Experiment Client**: `client.experiment.run(taskFunction, options)`
+- **Evaluator Client**: `client.evaluator.runExperimentEvaluator(...)`
+- **SSE Streaming**: Real-time progress updates via Server-Sent Events
+- **Dataset Integration**: Works with existing dataset functionality
+- **JSONL Support**: Parse dataset versions in JSONL format
+
+### Example Usage
+```typescript
+import * as traceloop from "@traceloop/node-server-sdk";
+
+traceloop.initialize({ apiKey: "your-key", appName: "my-app" });
+const client = traceloop.getClient();
+
+// Define your experiment task
+const myTask: ExperimentTaskFunction = async (input) => {
+  // Your task logic here
+  return { result: "processed", input };
+};
+
+// Run experiment
+const results = await client.experiment.run(myTask, {
+  datasetSlug: "my-dataset",
+  datasetVersion: "v1",
+  evaluators: [{ name: "accuracy" }],
+  experimentSlug: "my-experiment",
+  stopOnError: false,
+  waitForResults: true,
+  concurrency: 3
+});
+
+console.log(\`Results: \${results.results.length}\`);
+console.log(\`Errors: \${results.errors.length}\`);
+```
+
+## 🐛 Debug Configuration
+
+### VS Code Debugging
+Three debug configurations are available:
+
+1. **Debug Experiment Example**: Full debug with real API calls
+2. **Debug Simple Experiment Test**: Debug the JSONL parsing tests
+3. **Debug Experiment Example (Mock Mode)**: Debug with mocked API responses
+
+### Environment Variables
+- \`MOCK_MODE=true\`: Enable mock responses for debugging without API keys
+- \`DEBUG=traceloop:*\`: Enable debug logging
+- \`OTEL_LOG_LEVEL=debug\`: Enable OpenTelemetry debug logs
+
+### Running in Mock Mode
+```bash
+MOCK_MODE=true npm run run:experiment
+```
+
+## 📁 Files Structure
+
+- \`src/experiment_example.ts\`: Main experiment demonstration
+- \`src/medical_prompts.ts\`: Example prompt templates for healthcare experiments
+- \`src/simple_experiment_test.ts\`: JSONL parsing tests
+- \`.vscode/launch.json\`: Debug configurations
+- \`.vscode/tasks.json\`: Build and run tasks
+
+## 🔧 Available Scripts
+
+- \`npm run build\`: Build TypeScript
+- \`npm run run:experiment\`: Run experiment example
+- \`npm run run:experiment_test\`: Run JSONL parsing tests
+- \`npm run lint\`: Run ESLint
+- \`npm run lint:fix\`: Fix ESLint issues
+
+## 📊 Experiment Types Demonstrated
+
+### Medical Question Experiments
+- **Refuse Advice Strategy**: Redirects users to medical professionals
+- **Provide Info Strategy**: Educational responses with disclaimers
+- **Comparison**: Side-by-side evaluation of different approaches
+
+### Sentiment Analysis Experiment  
+- **Task**: Analyze text sentiment
+- **Evaluators**: Accuracy and confidence calibration
+- **Concurrency**: Demonstrates parallel processing
+
+## 🔗 Integration Points
+
+The experiment feature integrates with:
+- **Existing Dataset API**: Uses \`client.datasets.get()\`
+- **TraceloopClient**: Available as \`client.experiment\`
+- **OpenTelemetry**: All experiments are traced
+- **Error Handling**: Configurable error propagation
+- **Type Safety**: Full TypeScript support
diff --git a/packages/sample-app/package.json b/packages/sample-app/package.json
@@ -33,6 +33,8 @@
     "run:image_generation": "npm run build && node dist/src/sample_openai_image_generation.js",
     "run:sample_edit": "npm run build && node dist/src/test_edit_only.js",
     "run:sample_generate": "npm run build && node dist/src/test_generate_only.js",
+    "run:experiment": "npm run build && node dist/src/experiment_example.js",
+    "run:experiment_test": "npm run build && node dist/src/simple_experiment_test.js",
     "dev:image_generation": "pnpm --filter @traceloop/instrumentation-openai build && pnpm --filter @traceloop/node-server-sdk build && npm run build && node dist/src/sample_openai_image_generation.js",
     "lint": "eslint .",
     "lint:fix": "eslint . --fix"
@@ -71,6 +73,7 @@
     "langchain": "^0.3.30",
     "llamaindex": "^0.11.19",
     "openai": "^5.12.2",
+    "eventsource": "^3.0.2",
     "zod": "^3.25.76"
   },
   "private": true,