Skip to content

Add scheduled benchmark history workflow #38

@bachase

Description

@bachase

Problem or motivation

Clifft has C++ Catch2 benchmarks in tests/test_benchmarks.cc and Python pytest-benchmark cases in tools/bench/, but we do not currently keep a historical record of benchmark results. This makes performance regressions harder to spot during normal maintenance and release prep.

We should add a lightweight scheduled benchmark workflow that records benchmark results over time.

Proposed solution

Add a GitHub Actions workflow that can be run manually and on a nightly schedule. The workflow should run the existing benchmark suites and store their results in a way maintainers can inspect over time.

Suggested direction:

  • Add a new workflow under .github/workflows/.
  • Trigger it with workflow_dispatch and a nightly schedule.
  • Use ubuntu-24.04 for the first version.
  • Run existing C++ and/or Python benchmarks without adding new benchmark fixtures.
  • Store benchmark history outside the docs deployment branch.
  • Add a short docs/development page describing where maintainers can find the benchmark history.

One possible implementation is to use benchmark-action/github-action-benchmark, which supports Catch2 and pytest-benchmark output formats, but contributors may propose another simple approach if it satisfies the requirements.

Acceptance criteria

  • A scheduled/manual GitHub Actions workflow runs existing Clifft benchmarks.
  • Benchmark history is persisted somewhere maintainers can inspect across runs.
  • The implementation does not write to or depend on the existing gh-pages docs deployment branch.
  • The workflow does not run on every pull request.
  • A short docs page explains how to find and interpret the benchmark history.
  • The PR explains any benchmark output format/tooling choices.

Out of scope

  • PR-time benchmark comparison comments.
  • Failing CI on benchmark regressions.
  • Alerting integrations.
  • Self-hosted runners.
  • Adding new benchmark workloads.
  • Reworking the existing benchmark suite.

Open questions

  • Should the first version track both Catch2 and pytest-benchmark results, or start with one suite?
  • What storage/viewing mechanism should we use for the history?
  • Should the history live on a dedicated branch such as bench-data?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions