Skip to content

Dev: ASV Benchmarks

Alex Seaton edited this page Jan 23, 2026 · 6 revisions

What are ASV Benchmarks and how do they work?

ASV is a benchmarking tool that is used to benchmark and compare the performance of the library over time.

Example users are Numpy, Arrow, SciPy.

The benchmarks get run automatically in the following cases:

  • nightly on the master branch, and on push to master - this updates the performance graphs
  • on push on any branch with open PR - this benchmarks the branch in PR against the master branch and if there is a regression of more than 15% the benchmarks fail

Normally, ASV keeps track of the results in JSON files, but we are transforming them into data frames and store them in an ArcticDB database. There is a special script that helps.

The benchmarking for PRs runs benchmarks using the code in the PR, and looks up baseline measurements from the most recent master commit in our ASV database. The logic for this is in transform_asv_results.py --mode extract-recent. If there are no results that are at most two days old, the benchmarking step will fail and we need to trigger a manual benchmarking run of master to populate the results.

Adding new benchmarks

All of the code is located in the benchmarks folder.

If you have made any changes to the benchmarks, you need to update and push the updated benchmarks.json file.

To do this, run python python/utils/asv_checks.py from the project root directory.

It is best to make benchmark changes in a standalone PR, merge it, and then wait for the database to be populated with master results. Since we look up master results from the database on PR builds, if you change benchmarks in the same change as logic then we will not be able to do a meaningful comparison.

Running the benchmarks on master

There is a workflow that automatically benchmarks the latest master commit every night and on push to master. If you need to run it manually, you can issue a manual build from here and click on the Run workflow menu. This will start a build that will benchmark only the latest version.

If you have made changes to the benchmarks, you might need to regenerate all of the benchmarks. You will need to start a new build manually on master and select the benchmark_all_tags option.

Running the benchmarks on a non-master branch

Local Run

To run ASV locally, you first need to make sure that you have some prerequisites installed:

  • asv
  • virtualenv

Some ASV benchmarks use files stored in git lfs. In order to be able to run all benchmarks you also need to install git-lfs. Either via sudo apt-get install git-lfs or by following the instructions here.

After git-lfs is installed you must pull the files stored in lfs.

cd <arcticdb-root>
git lfs pull

After that you can simply run:

python -m asv run -v --show-stderr HEAD^! => if you want to benchmark only the latest commit

To run a subset of benchmarks, use --bench <regex>.

After running this once, if you are just changing the benchmarks, and not ArcticDB code itself, you can run the updated benchmarks without committing and rebuilding with:

python3 -m asv run --python=python/.asv/env/<some hash>/bin/python -v --show-stderr

where the path should be obvious from the first ASV run from HEAD^!.

Running specific tests

During development you might want to run only some tests and not always compile from the HEAD of the branch. To do that you can use this line:

asv run --python=same -v --show-stderr --bench .*myTest.*

This will run the benchmark from the same venv you're running.

GitHub Actions Run

If you want to benchmark more than one commit (e.g. if you have added new benchmarks), it might be better to run them on a GH Runner instead of locally.

You will again need to change the asv.conf.json file to point to your branch instead on master (e.g. "branches": ["some_branch"], ).

Then push your changes and start a manual build from here. Make sure to select your branch.

Benchmarks against multiple storages

Many of our benchmarks parameterize over storages, with a parameter like storages = [Storage.LMDB, Storage.AMAZON].

It is important that this parameter is a constant, so that our benchmarks.json file contains entries for each storage.

The is_storage_enabled() function controls which storages run:

Environment Variable Default Description
ARCTICDB_STORAGE_LMDB 1 (enabled) Run benchmarks against LMDB
ARCTICDB_STORAGE_AWS_S3 0 (disabled) Run benchmarks against AWS S3

There are various functions to create a library on a given storage:

  • create_library(storage, library_options) - Creates a single library for the given storage. Returns None if the storage is not enabled.
  • create_libraries(storage, library_names, library_options) - Creates multiple named libraries.
  • create_libraries_across_storages(storages, library_options) - Creates one library per storage, returning a Dict[Storage, Optional[Library]].

Note that these return a None Library object if the storage is not enabled. The benchmark should detect this at setup time and raise SkipNotImplemented. Here is an example:

from asv_runner.benchmarks.mark import SkipNotImplemented
from benchmarks.environment_setup import Storage, create_libraries_across_storages

class ModificationFunctions:
    # 1. Define the storage parameter list
    storages = [Storage.LMDB, Storage.AMAZON]

    # 2. Define all parameters and their names
    rows_and_cols = [(1_000_000, 2), (10_000_000, 2)]
    params = [rows_and_cols, storages]
    param_names = ["rows_and_cols", "storage"]

    # 3. setup_cache runs ONCE per benchmark class (shared across all param combos)
    def setup_cache(self):
        # Create libraries for all enabled storages
        lib_for_storage = create_libraries_across_storages(ModificationFunctions.storages)

        # Optionally pre-populate data
        for storage in ModificationFunctions.storages:
            lib = lib_for_storage[storage]
            if lib is None:
                continue  # Storage not enabled
            lib.write("sym", some_dataframe)

        return lib_for_storage  # Pickled and passed to setup by ASV

    # 4. setup runs BEFORE each benchmark method for each parameter combination
    def setup(self, libs_for_storage, rows_and_cols, storage):
        self.lib = libs_for_storage[storage]
        if self.lib is None:
            raise SkipNotImplemented  # Crucial: Skip the benchmark if the storage is not enabled

        # Prepare test data...

    # 5. Write a benchmark
    def time_write(self, *args):
        self.lib.write("sym", self.df)

Running real storage benchmarks locally

To run S3 benchmarks locally:

  1. Create an S3 bucket:

    aws s3 mb s3://<bucket-name> --region eu-west-2
  2. Create an aws.env file:

    ARCTICDB_STORAGE_AWS_S3=1
    ARCTICDB_REAL_S3_ACCESS_KEY=<access-key>
    ARCTICDB_REAL_S3_SECRET_KEY=<secret-key>
    ARCTICDB_REAL_S3_BUCKET=<bucket-name>
    ARCTICDB_REAL_S3_ENDPOINT=https://s3.eu-west-2.amazonaws.com
    ARCTICDB_REAL_S3_REGION=eu-west-2
    ARCTICDB_REAL_S3_CLEAR=0
  3. Export the variables:

    export $(cat aws.env | xargs)
  4. Run benchmarks:

    cd python
    python -m asv run --show-stderr --bench ModificationFunctions HEAD^!
  5. Clean up:

    aws s3 rb s3://<bucket-name> --region eu-west-2 --force

Real storage benchmarks in the CI

These run by default on the master benchmarking runs. They do not run by default on PRs, which just use LMDB. The flow is:

1. Create unique bucket -> arcticdb-asv-data-<timestamp>-<random>
2. Set ARCTICDB_STORAGE_AWS_S3=1 (if storage=REAL or ALL)
3. Run ASV benchmarks
   |-- Benchmarks use create_libraries_across_storages()
       |-- Uses real_s3_from_environment_variables() for S3
4. Cleanup: Delete bucket with all contents (always runs)

Flaky or slow ASV benchmarks

It's important that ASV benchmarks are not flaky and not too slow. This section describes how to investigate these problems.

Flakiness

Want to repeat ASV benchmarks to check they are stable.

  • Create an m6i.4xlarge EC2 runner. This is what the CI uses.
  • Log in to it.
  • Install deps:
sudo apt update
sudo apt-get install build-essential gcc-11 cmake gdb
sudo apt-get install zip pkg-config flex bison libkrb5-dev libsasl2-dev libcurl4-openssl-dev
python -m asv run --bench "resample.Resample.time_resample" -v --show-stderr HEAD^!

It will log the environment that ASV created, you can use that env in future to skip the build step:

python -m asv run --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr 

We need to get ASV to save results to its database or it won't report regressions. We can use

--set-commit-hash

to do this. These need to be real hashes in the history.

So that leaves us with commands like:

python -m asv run --set-commit-hash $(git rev-parse HEAD~42) --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr 

When developing, remember the -q option to run benchmarks without repeats.

You can then check the comparison across a few benchmark runs to check for any large differences.

You can then run benchmarks repeatedly:

#!/bin/bash

for i in {1..3}; do
  commit=$(git rev-parse HEAD~$i)
  echo "Running benchmark and storing results under $commit"
  /root/miniforge3/bin/python -m asv run --python=/root/ArcticDB/python/.asv/env/28ce2c79fdbca74891d3623705fc0783/bin/python --bench "resample.Resample.time_resample" -v --show-stderr --set-commit-hash $commit
done

and then compare them:

#!/bin/bash

for i in {2..3}; do
  /root/miniforge3/bin/python -m asv compare -s $(get rev-parse HEAD~1 HEAD~$i) > comparison_$i.txt
done

And then look for comparisons with a large ratio between the repeated runs of the same benchmark. For example, this will look for ones with a ratio less than 0.95 or greater than 1.05:

awk -F'|' 'gsub(/[[:space:]]/,"",$5) && ($5 < 0.95 || $5 > 1.05) && $5 != "Ratio" && $5 != ""' comparison*.txt | sort -t'|' -k5 -n

You can tune any suspicious benchmarks, then repeat this analysis to see whether they appear to be more stable.

Slowness

transform_asv_results.py includes --mode analyze to check where time is spent on a saved ASV run. See that file for docs. It also runs at the end of the benchmarking CI step so you can check its printout there.

Clone this wiki locally