Skip to content

[Perf] Optimize kernel launch overhead#250

Merged
hughperkins merged 3 commits into
mainfrom
duburcqa/faster_launch_kernel
Oct 27, 2025
Merged

[Perf] Optimize kernel launch overhead#250
hughperkins merged 3 commits into
mainfrom
duburcqa/faster_launch_kernel

Conversation

@duburcqa

@duburcqa duburcqa commented Oct 26, 2025

Copy link
Copy Markdown
Contributor

This brings major simulation speed up for Genesis, using both field and ndarray. It was benchmarked by running the following command on CoreWeave compute node (Nvidia H100):

py-spy record -r 85 -o benchmark.svg -- pytest -m "benchmarks" -n 0 \
    "tests/test_rigid_benchmarks.py::test_speed[False-30000-CG-batched_franka]"
data type BEFORE AFTER RATIO
field 18.4M 22.0M +20%
ndarray 1.86M 3.3M +77%

ndarrays

BEFORE

image

AFTER

image

fields

BEFORE

image

AFTER

image

Comment thread python/gstaichi/lang/util.py
Comment thread python/gstaichi/lang/_template_mapper.py Outdated
@duburcqa duburcqa force-pushed the duburcqa/faster_launch_kernel branch 11 times, most recently from a72f96a to f29eea7 Compare October 27, 2025 13:18
Comment thread python/gstaichi/lang/_template_mapper.py Outdated
Comment thread python/gstaichi/lang/_template_mapper.py Outdated
Comment thread python/gstaichi/lang/_template_mapper.py Outdated
Comment thread python/gstaichi/lang/_template_mapper.py Outdated
Comment thread python/gstaichi/lang/_template_mapper.py Outdated
Comment thread python/gstaichi/lang/_template_mapper.py Outdated
@duburcqa duburcqa force-pushed the duburcqa/faster_launch_kernel branch from f29eea7 to e1f3f92 Compare October 27, 2025 13:40
@duburcqa duburcqa force-pushed the duburcqa/faster_launch_kernel branch from e4537c7 to a1a5bb3 Compare October 27, 2025 14:02

@hughperkins hughperkins left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you 🙌

@duburcqa duburcqa force-pushed the duburcqa/faster_launch_kernel branch 3 times, most recently from a602a5f to 591f094 Compare October 27, 2025 17:05

@hughperkins hughperkins left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See most recent comment.

Comment thread python/gstaichi/lang/_template_mapper_hotpath.py
@duburcqa duburcqa force-pushed the duburcqa/faster_launch_kernel branch from 591f094 to 2ae1b3e Compare October 27, 2025 18:45

@hughperkins hughperkins left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! 🙌

@duburcqa duburcqa force-pushed the duburcqa/faster_launch_kernel branch from 2ae1b3e to d474ed6 Compare October 27, 2025 19:25
@hughperkins hughperkins merged commit f2f3d04 into main Oct 27, 2025
47 checks passed
@hughperkins hughperkins deleted the duburcqa/faster_launch_kernel branch October 27, 2025 22:39
@hughperkins hughperkins changed the title Optimize kernel launch overhead. [Perf] Optimize kernel launch overhead Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants