asv
Airspeed Velocity (asv
) seems to be the best existing way to run a few benchmarks and get progress over time. It’s used by numpy
, pandas
, scikit-image
, xarray
, and a few others. Pandas even seems to have spent time/money to set up a “collection” of benchmarks here: https://asv-runner.github.io/asv-collection/pandas/
pip
, run asf quickstart
to make a template asv.conf.json
(which you need to fill out)asv machine
or asv machine --yes
to have it save info about the computer you are about to run things onasv run
asv publish
The benchmarks almost look like normal pytest
or UnitTest
files, except all the names have time_
or mem_
as the function names… And that’s because asv
will use that to figure out what it’s tracking
time_<blah>
will run a function (and call the result whatever you’ve named this function). and measure the timeYou can run against a specific git commit, or a tag, like
asv run v0.1^!
Getting some version to run the benchmark in Github actions seems not too hard, and there’s an even nicer way to do this on Github using PR labels([1])
on:
pull_request:
types: [labeled]
jobs:
benchmark:
if: ${{ github.event.label.name == 'run-benchmark' && github.event_name == 'pull_request' }}
This lets you label a PR with run-benchmark
and have it trigger the workflow, which I think is very nice.
Also during this investigation, I saw that you can run Github Actions on a timer like a cronjob
It’s called the schedule
event to trigger a workflow.
asv run
accepts multiple ways to specify commits to run the benchmarks on. The common ones people use are something like HEAD~1...HEAD
,
But you can also pass a list of specific hashes… So we can use the command git show-ref --tags
to list all the tagged commits
git show-ref --tags --hash > hashlist.txt
then pass this to asv run
:
asv run HASHFILE:tagged_commits.txt
Using the --skip-existing-commits
, you could theoretically run
asv run --skip-existing-commits ALL
every time, and it would only take a really long time once.
If you don’t want to use all the cores available to the benchmarks, you can specify --cpu-affinity 0-8
when we only want it to use 8 CPUs. This is key for running the benchmarks on our 224 CPU cluster when the production code may only have 8 or 16 cores.
There seem to be many ways to save the resulting data points, but scikit-image seems to use the Github Artifacts to store data. They actually made a custom jupyter notebook to download and parse the artifacts to write up the report on whether it was useful to test their code on the Github Actions machines.