Airspeed Velocity

Benchmarking with asv

Airspeed Velocity (asv) seems to be the best existing way to run a few benchmarks and get progress over time. It’s used by numpy, pandas, scikit-image, xarray, and a few others. Pandas even seems to have spent time/money to set up a “collection” of benchmarks here:


  1. Install with pip, run asf quickstart to make a template asv.conf.json (which you need to fill out)
  2. Run asv machine or asv machine --yes to have it save info about the computer you are about to run things on
  3. (After writing your benchmarks) Run benchmarks with asv run
  4. Compile the results with asv publish

Naming the benchmark tests

The benchmarks almost look like normal pytest or UnitTest files, except all the names have time_ or mem_ as the function names… And that’s because asv will use that to figure out what it’s tracking

Using it in CI

You can run against a specific git commit, or a tag, like

asv run v0.1^!

Getting some version to run the benchmark in Github actions seems not too hard, and there’s an even nicer way to do this on Github using PR labels([1])

    types: [labeled]

    if: ${{ == 'run-benchmark' && github.event_name == 'pull_request' }}

This lets you label a PR with run-benchmark and have it trigger the workflow, which I think is very nice.

Also during this investigation, I saw that you can run Github Actions on a timer like a cronjob It’s called the schedule event to trigger a workflow.

Running the benchmarks on every tagged release

asv run accepts multiple ways to specify commits to run the benchmarks on. The common ones people use are something like HEAD~1...HEAD, But you can also pass a list of specific hashes… So we can use the command git show-ref --tags to list all the tagged commits

git show-ref --tags --hash > hashlist.txt

then pass this to asv run:

asv run HASHFILE:tagged_commits.txt

Skipping commits that we’ve already benchmarked

Using the --skip-existing-commits, you could theoretically run

asv run --skip-existing-commits ALL

every time, and it would only take a really long time once.

Using it on a giant server with many CPUs

If you don’t want to use all the cores available to the benchmarks, you can specify --cpu-affinity 0-8 when we only want it to use 8 CPUs. This is key for running the benchmarks on our 224 CPU cluster when the production code may only have 8 or 16 cores.

How do we save the results?

There seem to be many ways to save the resulting data points, but scikit-image seems to use the Github Artifacts to store data. They actually made a custom jupyter notebook to download and parse the artifacts to write up the report on whether it was useful to test their code on the Github Actions machines.

More examples in the wild
