Airspeed Velocity
Benchmarking with asv
Airspeed Velocity (asv) seems to be the best existing way to run a few benchmarks and get progress over time. It’s used by numpy, pandas, scikit-image, xarray, and a few others. Pandas even seems to have spent time/money to set up a “collection” of benchmarks here: https://asv-runner.github.io/asv-collection/pandas/
Basics
- Install with
pip, runasf quickstartto make a templateasv.conf.json(which you need to fill out) - Run
asv machineorasv machine --yesto have it save info about the computer you are about to run things on - (After writing your benchmarks) Run benchmarks with
asv run- This will install the environment that you specified, which will be reused.
- Compile the results with
asv publish
Naming the benchmark tests
The benchmarks almost look like normal pytest or UnitTest files, except all the names have time_ or mem_ as the function names… And that’s because asv will use that to figure out what it’s tracking
time_<blah>will run a function (and call the result whatever you’ve named this function). and measure the time
Using it in CI
You can run against a specific git commit, or a tag, like
asv run v0.1^!
Getting some version to run the benchmark in Github actions seems not too hard, and there’s an even nicer way to do this on Github using PR labels([1])
on:
pull_request:
types: [labeled]
jobs:
benchmark:
if: ${{ github.event.label.name == 'run-benchmark' && github.event_name == 'pull_request' }}
This lets you label a PR with run-benchmark and have it trigger the workflow, which I think is very nice.
Also during this investigation, I saw that you can run Github Actions on a timer like a cronjob
It’s called the schedule event to trigger a workflow.
Running the benchmarks on every tagged release
asv run accepts multiple ways to specify commits to run the benchmarks on. The common ones people use are something like HEAD~1...HEAD,
But you can also pass a list of specific hashes… So we can use the command git show-ref --tags to list all the tagged commits
git show-ref --tags --hash > hashlist.txt
then pass this to asv run:
asv run HASHFILE:tagged_commits.txt
Skipping commits that we’ve already benchmarked
Using the --skip-existing-commits, you could theoretically run
asv run --skip-existing-commits ALL
every time, and it would only take a really long time once.
Using it on a giant server with many CPUs
If you don’t want to use all the cores available to the benchmarks, you can specify --cpu-affinity 0-8 when we only want it to use 8 CPUs. This is key for running the benchmarks on our 224 CPU cluster when the production code may only have 8 or 16 cores.
How do we save the results?
There seem to be many ways to save the resulting data points, but scikit-image seems to use the Github Artifacts to store data. They actually made a custom jupyter notebook to download and parse the artifacts to write up the report on whether it was useful to test their code on the Github Actions machines.
More examples in the wild
- TQDM has a longer workflow with it
- scikit-image has a regular version, but also a cron version
- Scipy has docs about it here and their graphs are up here
- scikit-learn has a separate repo setup
- Sarsen has a simple setup, but they’ve put semi-real data into git LFS… which is probably why it’s in a separate repo
- napari has a complicated GHA setup
References
- https://labs.quansight.org/blog/2021/08/github-actions-benchmarks