-
Notifications
You must be signed in to change notification settings - Fork 7
Expand file tree
/
Copy patheval.ru
More file actions
88 lines (79 loc) · 2.59 KB
/
eval.ru
File metadata and controls
88 lines (79 loc) · 2.59 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
#!/usr/bin/env ruby
# frozen_string_literal: true
# Dev Server - Basic Example
#
# This example demonstrates how to set up a dev server for remote evals
# that receives evaluation requests from the Braintrust web UI.
#
# 1. Define evaluators (subclass or inline)
# 2. Pass them to the Rack app and start serving
#
# Usage:
# # Start the server (requires rack and a Rack-compatible server like puma):
# bundle exec appraisal server rackup examples/server/eval.ru -p 8300 -o 0.0.0.0
require "bundler/setup"
require "braintrust"
require "braintrust/server"
# --- Step 1: Define evaluators ---
#
# Evaluators define the task (the code under evaluation) and local scorers.
# They can reference any application code — models, services, database queries, etc.
# Subclass pattern: override #task and #scorers methods.
class FoodClassifier < Braintrust::Eval::Evaluator
def task
->(input:) {
case input.to_s.downcase
when /apple|banana|orange|grape/ then "fruit"
when /carrot|broccoli|spinach/ then "vegetable"
else "unknown"
end
}
end
def scorers
[
Braintrust::Scorer.new("exact_match") { |expected:, output:|
(output == expected) ? 1.0 : 0.0
}
]
end
end
# Inline pattern: pass task and scorers as constructor arguments.
# The task declares `parameters:` to receive runtime values from the Playground UI.
# When run remotely, users can override max_length in the Playground.
text_summarizer = Braintrust::Eval::Evaluator.new(
task: ->(input:, parameters:) {
max_length = parameters["max_length"] || 100
words = input.to_s.split
summary = words.first(max_length).join(" ")
summary += "..." if words.length > max_length
summary
},
scorers: [
Braintrust::Scorer.new("length_check") { |input:, output:|
(output.to_s.length < input.to_s.length) ? 1.0 : 0.0
}
],
parameters: {
"max_length" => {type: "number", default: 100, description: "Maximum number of words in summary"}
}
)
# --- Step 2: Initialize Braintrust tracing ---
#
# Call Braintrust.init with blocking_login: true to ensure the SDK
# has reached a ready state.
#
# Requires BRAINTRUST_API_KEY env var (or pass api_key: directly).
Braintrust.init(blocking_login: true)
# --- Step 3: Start the server ---
#
# Mount the Rack app. The server handles:
# - GET / → health check
# - POST /list → list evaluators
# - POST /eval → execute an evaluation (SSE streaming response)
# - OPTIONS * → CORS preflight
run Braintrust::Server::Rack.app(
evaluators: {
"food-classifier" => FoodClassifier.new,
"text-summarizer" => text_summarizer
}
)