Skip to content

feat(benchmark/alloc): introduce minimal allocation benchmark#266

Open
s1rk0 wants to merge 9 commits intoalpaka-group:devfrom
s1rk0:feature/alloc-benchmark
Open

feat(benchmark/alloc): introduce minimal allocation benchmark#266
s1rk0 wants to merge 9 commits intoalpaka-group:devfrom
s1rk0:feature/alloc-benchmark

Conversation

@s1rk0
Copy link
Copy Markdown

@s1rk0 s1rk0 commented Sep 16, 2025

Summary

This PR adds a microbenchmark for host memory allocation under benchmark/alloc/.
It measures the cost of:

  1. malloc/new (allocation),
  2. first-use/touch (memset over the whole buffer), and
  3. a trivial read (to prevent dead-store elimination),

so results reflect realistic “allocate + initialize + first access” latency rather than raw allocator timing.

What’s included

  • benchmark/alloc/CMakeLists.txt
  • benchmark/alloc/src/alloc_benchmark.cpp
  • benchmark/CMakeLists.txt: add_subdirectory(alloc/)

All new files carry SPDX headers.

How to build & run

cmake -S . -B build -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
  -Dalpaka_BENCHMARKS=ON -Dalpaka_TESTING=OFF
cmake --build build -j
./build/benchmark/alloc/alloc

@s1rk0 s1rk0 marked this pull request as draft September 16, 2025 09:05
Copy link
Copy Markdown
Member

@psychocoderHPC psychocoderHPC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only small changes are required.

Comment thread benchmark/alloc/CMakeLists.txt Outdated
Comment thread benchmark/alloc/src/alloc_benchmark.cpp Outdated
Comment thread benchmark/alloc/src/alloc_benchmark.cpp Outdated
Comment thread benchmark/alloc/src/alloc_benchmark.cpp Outdated
s1rk0 added 8 commits October 3, 2025 21:51
* new file `benchmark/alloc/src/alloc_benchmark.cpp`
* new file `benchmark/alloc/CMakeLists.txt`
* updated root `benchmark/CMakeLists.txt`
  – `add_subdirectory(alloc/)`
…free

Switch benchmark to touch buffers after allocation:
- zero-initialize via queue memset
- read first byte to enforce observable use
This removes lazy-commit effects and makes results reflect
“allocate & start using” latency/bandwidth, not just allocator call overhead.
Discussed with Jiri on 2025-08-11.
Drop accidental duplicate 'uint8_t retval' line in measureAlloc().
- run clang-format to enforce style
@s1rk0 s1rk0 force-pushed the feature/alloc-benchmark branch from 599eb93 to 3783525 Compare October 6, 2025 16:56
@s1rk0
Copy link
Copy Markdown
Author

s1rk0 commented Oct 6, 2025

Rebased on latest upstream/dev. Addressed review: switched to add_executable + alpaka_finalize, removed host deref of device buffers, moved helpers to alloc_helpers.hpp, renamed templates to T_*. Added cold/hot handling (first run reported separately and excluded from stats).

@s1rk0 s1rk0 requested a review from psychocoderHPC October 6, 2025 17:00
@psychocoderHPC
Copy link
Copy Markdown
Member

@s1rk0 I reformated you PR follwing https://alpaka.readthedocs.io/en/latest/dev/style.html#pre-commit that the style checker is happy too. I will review you coode sasap

@s1rk0 s1rk0 marked this pull request as ready for review October 27, 2025 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants