-
Notifications
You must be signed in to change notification settings - Fork 67
feat(experiment): Add experiment capabilities #672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 17 commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
b8bd068
init project
29d7670
add feature
25d3cce
wip
c57ec7a
wip
485e6cb
jsonl works
d7e460d
rows works
649d704
wip
31253a6
wip
3dcd132
trigger works
980acc3
with response
09babfd
wip
627be48
wip
b1b12a1
result is shown
20d23b9
git push
9b14fc8
correct
81bc188
fix
711962a
evaluator slug optional
75678eb
support evaluator name + verison
75c9ce2
add evaluator ids
a13783a
pr commq
caa5b1d
added taskInput
e95d21a
no stream client
ce184ea
comm
7f42a35
pretty
d345f30
change
0a38764
no logs
169ba47
with loader
8ae47c2
pretty
nina-kollman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| OpenLLMetry-JS is a JavaScript/TypeScript observability framework for LLM applications, built on OpenTelemetry. It provides instrumentation for major LLM providers (OpenAI, Anthropic, etc.) and vector databases, with a unified SDK for easy integration. | ||
|
|
||
| ## Development Commands | ||
|
|
||
| ### Building | ||
| ```bash | ||
| # Build all packages | ||
| pnpm nx run-many -t build | ||
| # or | ||
| pnpm nx run-many --targets=build | ||
|
|
||
| # Build affected packages only | ||
| pnpm nx affected -t build | ||
| ``` | ||
|
|
||
| ### Testing | ||
| Each package has its own test command: | ||
| ```bash | ||
| # Test individual packages | ||
| cd packages/traceloop-sdk | ||
| pnpm test | ||
|
|
||
| # Test specific instrumentation | ||
| cd packages/instrumentation-openai | ||
| pnpm test | ||
| ``` | ||
|
|
||
| ### Linting | ||
| ```bash | ||
| # Lint individual packages | ||
| cd packages/[package-name] | ||
| pnpm lint | ||
|
|
||
| # Fix lint issues | ||
| pnpm lint:fix | ||
| ``` | ||
|
|
||
| ### Publishing | ||
|
nina-kollman marked this conversation as resolved.
Outdated
|
||
| ```bash | ||
| pnpm nx run-many --targets=build | ||
| pnpm lerna publish --no-private | ||
| ``` | ||
|
|
||
| ## Architecture | ||
|
|
||
| ### Monorepo Structure | ||
| - **Lerna + Nx**: Manages multiple packages with shared tooling | ||
| - **packages/**: Contains all publishable packages and internal tooling | ||
| - **Rollup**: Used for building packages with TypeScript compilation | ||
|
|
||
| ### Core Packages | ||
|
|
||
| #### `traceloop-sdk` (Main SDK) | ||
| - **Path**: `packages/traceloop-sdk/` | ||
| - **Exports**: `@traceloop/node-server-sdk` | ||
| - **Purpose**: Primary entry point that orchestrates all instrumentations | ||
| - **Key Files**: | ||
| - `src/lib/tracing/decorators.ts`: Workflow and task decorators (`@workflow`, `@task`, `@agent`) | ||
| - `src/lib/tracing/tracing.ts`: Core tracing utilities and span management | ||
| - `src/lib/node-server-sdk.ts`: Main initialization logic | ||
|
|
||
| #### Instrumentation Packages | ||
| Each follows the pattern: `packages/instrumentation-[provider]/` | ||
| - **OpenAI**: `@traceloop/instrumentation-openai` | ||
| - **Anthropic**: `@traceloop/instrumentation-anthropic` | ||
| - **Bedrock**: `@traceloop/instrumentation-bedrock` | ||
| - **Vector DBs**: Pinecone, Chroma, Qdrant packages | ||
| - **Frameworks**: LangChain, LlamaIndex packages | ||
|
|
||
| #### `ai-semantic-conventions` | ||
| - **Path**: `packages/ai-semantic-conventions/` | ||
| - **Purpose**: OpenTelemetry semantic conventions for AI/LLM spans | ||
| - **Key File**: `src/SemanticAttributes.ts` - defines all span attribute constants | ||
|
|
||
| ### Instrumentation Pattern | ||
| All instrumentations extend `InstrumentationBase` from `@opentelemetry/instrumentation`: | ||
| 1. **Hook Registration**: Wrap target library functions using `InstrumentationModuleDefinition` | ||
| 2. **Span Creation**: Create spans with appropriate semantic attributes | ||
| 3. **Data Extraction**: Extract request/response data and token usage | ||
| 4. **Error Handling**: Capture and record errors appropriately | ||
|
|
||
| ### Testing Strategy | ||
| - **Polly.js**: Records HTTP interactions for consistent test execution | ||
| - **ts-mocha**: TypeScript test runner | ||
| - **Recordings**: Stored in `recordings/` folders for replay testing | ||
|
|
||
| ## Key Patterns | ||
|
|
||
| ### Workspace Dependencies | ||
| Packages reference each other using `workspace:*` in package.json, managed by pnpm workspaces. | ||
|
|
||
| ### Decorator Usage | ||
| ```typescript | ||
| // Workflow spans | ||
| @workflow("my-workflow") | ||
| async function myWorkflow() { } | ||
|
|
||
| // Task spans | ||
| @task("my-task") | ||
| async function myTask() { } | ||
| ``` | ||
|
|
||
| ### Manual Instrumentation | ||
| ```typescript | ||
| import { trace } from "@traceloop/node-server-sdk"; | ||
| const span = trace.withLLMSpan("my-llm-call", () => { | ||
| // LLM operations | ||
| }); | ||
| ``` | ||
|
|
||
| ### Telemetry Configuration | ||
| - Anonymous telemetry enabled by default | ||
| - Opt-out via `TRACELOOP_TELEMETRY=FALSE` environment variable | ||
| - Only collected in SDK, not individual instrumentations | ||
|
|
||
| ## Common Development Tasks | ||
|
|
||
| ### Adding New LLM Provider | ||
| 1. Create new instrumentation package in `packages/instrumentation-[provider]/` | ||
| 2. Implement instrumentation extending `InstrumentationBase` | ||
| 3. Add to main SDK dependencies in `packages/traceloop-sdk/package.json` | ||
| 4. Register in SDK initialization | ||
|
|
||
| ### Running Single Test | ||
| ```bash | ||
| cd packages/[package-name] | ||
| pnpm test -- --grep "test name pattern" | ||
| ``` | ||
|
|
||
| ### Debugging Instrumentations | ||
| Enable OpenTelemetry debug logging: | ||
| ```bash | ||
| export OTEL_LOG_LEVEL=debug | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| /** | ||
| * Medical prompt templates for experiment examples | ||
| * These templates demonstrate different approaches to handling medical questions | ||
| */ | ||
|
|
||
| /** | ||
| * Prompt template that provides comprehensive medical information | ||
| * This approach gives detailed, educational responses about health topics | ||
| */ | ||
| export function provideMedicalInfoPrompt(question: string): string { | ||
| return `You are a health educator providing comprehensive medical information. | ||
|
|
||
| Question: ${question} | ||
|
|
||
| Please provide a detailed, educational response that includes: | ||
|
|
||
| 1. **Clear, factual explanation** of the medical concept or condition | ||
| 2. **Key benefits and considerations** related to the topic | ||
| 3. **Specific recommendations** based on current medical knowledge | ||
| 4. **Important disclaimers** about consulting healthcare professionals | ||
| 5. **Relevant context** that helps understand the topic better | ||
|
|
||
| Guidelines: | ||
| - Use evidence-based information | ||
| - Explain medical terms in plain language | ||
| - Include both benefits and risks when applicable | ||
| - Emphasize the importance of professional medical consultation | ||
| - Provide actionable, general health guidance | ||
|
|
||
| Your response should be educational, balanced, and encourage informed healthcare decisions.`; | ||
| } | ||
|
|
||
| /** | ||
| * Prompt template that refuses to give medical advice | ||
| * This approach redirects users to appropriate medical professionals | ||
| */ | ||
| export function refuseMedicalAdvicePrompt(question: string): string { | ||
| return `You are a helpful AI assistant with a strict policy about medical advice. | ||
|
|
||
| Question: ${question} | ||
|
|
||
| I understand you're seeking information about a health-related topic, but I cannot provide medical advice, diagnosis, or treatment recommendations. | ||
|
|
||
| Instead, I'd like to: | ||
|
|
||
| 1. **Acknowledge your concern** - Your health questions are important and valid | ||
| 2. **Explain why I can't advise** - Medical situations require professional evaluation | ||
| 3. **Suggest appropriate resources**: | ||
| - Consult your primary care physician | ||
| - Contact a relevant medical specialist | ||
| - Call a nurse hotline if available | ||
| - Visit an urgent care or emergency room if urgent | ||
|
|
||
| 4. **Provide general wellness information** if applicable (without specific medical advice) | ||
| 5. **Encourage professional consultation** for personalized care | ||
|
|
||
| Your health is important, and qualified medical professionals are best equipped to provide the specific guidance you need. | ||
|
|
||
| Is there anything else I can help you with that doesn't involve medical advice?`; | ||
| } | ||
|
|
||
|
|
||
| /** | ||
| * Example prompt categories for experiment testing | ||
| */ | ||
| export const PROMPT_CATEGORIES = { | ||
| PROVIDE_INFO: 'provide' as const, | ||
| REFUSE_ADVICE: 'refuse' as const, | ||
| MENTAL_HEALTH: 'mental-health' as const, | ||
| FITNESS: 'fitness' as const, | ||
| NUTRITION: 'nutrition' as const, | ||
| } as const; | ||
|
|
||
| export type PromptCategory = typeof PROMPT_CATEGORIES[keyof typeof PROMPT_CATEGORIES]; |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.