Epilogue subtiling: store indexing fix, example, and tuple output support in run_example#1907
Epilogue subtiling: store indexing fix, example, and tuple output support in run_example#1907
Conversation
4fbdeac to
bafff2a
Compare
|
@ethche I'm having trouble having the autotuner find the subtiling config. If I run is as is, I often see a slowdown. If I seed epilogue subtiling with some configs to start quick autotuning on and compare with just native quick autotuning without epilogue subtiling, I see pretty substantial speedups. I think the speedups are a bit inflated here since the default quick autotuning baseline lands on a bad config, but I consistently see 30%+ speedups with epilogue subtiling for these kernels. |
Thank you for letting me know. I'll look into this. Two usual issues are:
|
bafff2a to
70a7cb0
Compare
There was a problem hiding this comment.
Why do we need an example file here, shouldn't the autotuner find it automatically?
There was a problem hiding this comment.
The examples are just to illustrate to users that epilogue subtiling is supported and demonstrate cases where it can provide benefits.
helion/_compiler/device_ir.py
Outdated
| EnumFragment(choices=env.config_spec.valid_indexing_types()), | ||
| length=total_count, | ||
| ) | ||
| env.config_spec.store_indexing_start = total_load_count |
There was a problem hiding this comment.
Are we assuming all loads come before all stores? Will this always be the case? Maybe we should split loads and stores in the config or find a cleaner way to represent this?
There was a problem hiding this comment.
You're right. I've added tracking for the indices of stores so that we're not assuming all loads are before stores. This is a smaller change than separating loads and stores in the config.
I do think separate loads and stores is a bit more clear to users, but this is BC breaking. Happy to take this route if you think it is better in the long term.
70a7cb0 to
37769ed
Compare
1902fa6 to
9325329
Compare
9325329 to
eec168a
Compare

No description provided.