-
Notifications
You must be signed in to change notification settings - Fork 174
Dev: ASV Benchmarks
ASV is a benchmarking tool that is used to benchmark and compare the performance of the library over time.
Example users are Numpy, Arrow, SciPy.
The benchmarks get run automatically in the following cases:
- nightly on the master branch, and on push to master - this updates the performance graphs
- on push on any branch with open PR - this benchmarks the branch in PR against the master branch and if there is a regression of more than 15% the benchmarks fail
Normally, ASV keeps track of the results in JSON files, but we are transforming them into data frames and store them in an ArcticDB database. There is a special script that helps.
The benchmarking for PRs runs benchmarks using the code in the PR, and looks up baseline measurements from the most recent master commit in our ASV database. The logic for this is in transform_asv_results.py --mode extract-recent. If there are no results that are at most two days old, the benchmarking step will fail and we need to trigger a manual benchmarking run of master to populate the results.
All of the code is located in the benchmarks folder.
If you have made any changes to the benchmarks, you need to update and push the updated benchmarks.json file.
To do this, run python python/utils/asv_checks.py from the project root directory.
It is best to make benchmark changes in a standalone PR, merge it, and then wait for the database to be populated with master results. Since we look up master results from the database on PR builds, if you change benchmarks in the same change as logic then we will not be able to do a meaningful comparison.
There is a workflow that automatically benchmarks the latest master commit every night and on push to master. If you need to run it manually, you can issue a manual build from here and click on the Run workflow menu. This will start a build that will benchmark only the latest version.
If you have made changes to the benchmarks, you might need to regenerate all of the benchmarks. You will need to start a new build manually on master and select the benchmark_all_tags option.
To run ASV locally, you first need to make sure that you have some prerequisites installed:
- asv
- virtualenv
Some ASV benchmarks use files stored in git lfs. In order to be able to run all benchmarks you also need to install git-lfs. Either via sudo apt-get install git-lfs or by following the instructions here.
After git-lfs is installed you must pull the files stored in lfs.
cd <arcticdb-root>
git lfs pullAfter that you can simply run:
python -m asv run -v --show-stderr HEAD^! => if you want to benchmark only the latest commitTo run a subset of benchmarks, use --bench <regex>.
After running this once, if you are just changing the benchmarks, and not ArcticDB code itself, you can run the updated benchmarks without committing and rebuilding with:
python3 -m asv run --python=python/.asv/env/<some hash>/bin/python -v --show-stderrwhere the path should be obvious from the first ASV run from HEAD^!.
During development you might want to run only some tests and not always compile from the HEAD of the branch. To do that you can use this line:
asv run --python=same -v --show-stderr --bench .*myTest.*This will run the benchmark from the same venv you're running.
If you want to benchmark more than one commit (e.g. if you have added new benchmarks), it might be better to run them on a GH Runner instead of locally.
You will again need to change the asv.conf.json file to point to your branch instead on master (e.g. "branches": ["some_branch"], ).
Then push your changes and start a manual build from here. Make sure to select your branch.
Many of our benchmarks parameterize over storages, with a parameter like storages = [Storage.LMDB, Storage.AMAZON].
It is important that this parameter is a constant, so that our benchmarks.json file contains entries for each storage.
The is_storage_enabled() function controls which storages run:
| Environment Variable | Default | Description |
|---|---|---|
ARCTICDB_STORAGE_LMDB |
1 (enabled) |
Run benchmarks against LMDB |
ARCTICDB_STORAGE_AWS_S3 |
0 (disabled) |
Run benchmarks against AWS S3 |
There are various functions to create a library on a given storage:
-
create_library(storage, library_options)- Creates a single library for the given storage. ReturnsNoneif the storage is not enabled. -
create_libraries(storage, library_names, library_options)- Creates multiple named libraries. -
create_libraries_across_storages(storages, library_options)- Creates one library per storage, returning aDict[Storage, Optional[Library]].
Note that these return a None Library object if the storage is not enabled. The benchmark should detect this at setup time and raise SkipNotImplemented. Here is an example:
from asv_runner.benchmarks.mark import SkipNotImplemented
from benchmarks.environment_setup import Storage, create_libraries_across_storages
class ModificationFunctions:
# 1. Define the storage parameter list
storages = [Storage.LMDB, Storage.AMAZON]
# 2. Define all parameters and their names
rows_and_cols = [(1_000_000, 2), (10_000_000, 2)]
params = [rows_and_cols, storages]
param_names = ["rows_and_cols", "storage"]
# 3. setup_cache runs ONCE per benchmark class (shared across all param combos)
def setup_cache(self):
# Create libraries for all enabled storages
lib_for_storage = create_libraries_across_storages(ModificationFunctions.storages)
# Optionally pre-populate data
for storage in ModificationFunctions.storages:
lib = lib_for_storage[storage]
if lib is None:
continue # Storage not enabled
lib.write("sym", some_dataframe)
return lib_for_storage # Pickled and passed to setup by ASV
# 4. setup runs BEFORE each benchmark method for each parameter combination
def setup(self, libs_for_storage, rows_and_cols, storage):
self.lib = libs_for_storage[storage]
if self.lib is None:
raise SkipNotImplemented # Crucial: Skip the benchmark if the storage is not enabled
# Prepare test data...
# 5. Write a benchmark
def time_write(self, *args):
self.lib.write("sym", self.df)To run S3 benchmarks locally:
-
Create an S3 bucket:
aws s3 mb s3://<bucket-name> --region eu-west-2
-
Create an
aws.envfile:ARCTICDB_STORAGE_AWS_S3=1 ARCTICDB_REAL_S3_ACCESS_KEY=<access-key> ARCTICDB_REAL_S3_SECRET_KEY=<secret-key> ARCTICDB_REAL_S3_BUCKET=<bucket-name> ARCTICDB_REAL_S3_ENDPOINT=https://s3.eu-west-2.amazonaws.com ARCTICDB_REAL_S3_REGION=eu-west-2 ARCTICDB_REAL_S3_CLEAR=0
-
Export the variables:
export $(cat aws.env | xargs)
-
Run benchmarks:
cd python python -m asv run --show-stderr --bench ModificationFunctions HEAD^!
-
Clean up:
aws s3 rb s3://<bucket-name> --region eu-west-2 --force
These run by default on the master benchmarking runs. They do not run by default on PRs, which just use LMDB. The flow is:
1. Create unique bucket -> arcticdb-asv-data-<timestamp>-<random>
2. Set ARCTICDB_STORAGE_AWS_S3=1 (if storage=REAL or ALL)
3. Run ASV benchmarks
|-- Benchmarks use create_libraries_across_storages()
|-- Uses real_s3_from_environment_variables() for S3
4. Cleanup: Delete bucket with all contents (always runs)
It's important that ASV benchmarks are not flaky and not too slow. This section describes how to investigate these problems.
Want to repeat ASV benchmarks to check they are stable.
- Create an m6i.4xlarge EC2 runner. This is what the CI uses.
- Log in to it.
- Install deps:
sudo apt update
sudo apt-get install build-essential gcc-11 cmake gdb
sudo apt-get install zip pkg-config flex bison libkrb5-dev libsasl2-dev libcurl4-openssl-dev
-
Clone ArcticDB and
git submodule update --init --recursive -
Install Python and install ASV (docs https://github.com/man-group/ArcticDB/wiki/Dev:-Building#setup-for-linux-build-using-wsl)
-
Run some benchmarks:
python -m asv run --bench "resample.Resample.time_resample" -v --show-stderr HEAD^!
It will log the environment that ASV created, you can use that env in future to skip the build step:
python -m asv run --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr
We need to get ASV to save results to its database or it won't report regressions. We can use
--set-commit-hash
to do this. These need to be real hashes in the history.
So that leaves us with commands like:
python -m asv run --set-commit-hash $(git rev-parse HEAD~42) --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr
When developing, remember the -q option to run benchmarks without repeats.
You can then check the comparison across a few benchmark runs to check for any large differences.
You can then run benchmarks repeatedly:
#!/bin/bash
for i in {1..3}; do
commit=$(git rev-parse HEAD~$i)
echo "Running benchmark and storing results under $commit"
/root/miniforge3/bin/python -m asv run --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr --set-commit-hash $commit
done
and then compare them:
#!/bin/bash
for i in {2..3}; do
/root/miniforge3/bin/python -m asv compare -s $(get rev-parse HEAD~1 HEAD~$i) > comparison_$i.txt
done
And then look for comparisons with a large ratio between the repeated runs of the same benchmark. For example, this will look for ones with a ratio less than 0.95 or greater than 1.05:
awk -F'|' 'gsub(/[[:space:]]/,"",$5) && ($5 < 0.95 || $5 > 1.05) && $5 != "Ratio" && $5 != ""' comparison*.txt | sort -t'|' -k5 -n
You can tune any suspicious benchmarks, then repeat this analysis to see whether they appear to be more stable.
transform_asv_results.py includes --mode analyze to check where time is spent on a saved ASV run. See that file for docs. It also runs at the end of the benchmarking CI step so you can check its printout there.
ArcticDB Wiki