A load testing suite for running benchmarks on self-hosted Braintrust data planes.
This suite currently supports three types of tests:
-
Load Test: Spawns simulated users to bombard the data plane with logs, simulating production traffic
-
Large Eval Test: Generates a large synthetic dataset and runs an eval against it
-
Functional Test: Exercises core API create/read/delete flows across key Braintrust resources
The suite can be extended to support additional test types in the future, and that is a goal.
Each test is highly configurable via the braintest.yaml config file. The tests should be configured to simulate a customer's expected load and usage patterns. We want to ensure that the infra Braintrust is hosted on can handle the customer's use case, and size up components accordingly if the tests fail.
-
Install uv if you don't have it:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Install dependencies:
uv sync
-
Activate the virutal env uv creates if it isn't already activated
source .venv/bin/activate -
Create a
.envfile (seeexample.envfor reference) -
Configure
braintest.yamlwith your environment details and test parameters. -
Execute the test suite:
python main.py
-
If you are running over SSH on a remote server, use
nohupso the test keeps running if your session disconnects:nohup python main.py &This will write output to a default log file. To write
nohupoutput to a specific file:nohup python main.py > loadtest.out 2>&1 &
- No actual LLM calls are made in any of these tests. Everything is mocked. The purpose is to load test Braintrust infra, not the LLM provider.