[ROCm] Optimize kgemm_4bit_inference_naive for ROCm, use it for batch sizes other than 1 #228
| Job | Run time |
|---|---|
| 20s | |
| 13s | |
| 18s | |
| 18s | |
| 15s | |
| 22s | |
| 25s | |
| 18s | |
| 3m 4s | |
| 3m 10s | |
| 3m 51s | |
| 2m 59s | |
| 3m 47s | |
| 4m 5s | |
| 2m 39s | |
| 3m 45s | |
| 6m 2s | |
| 2m 37s | |
| 15m 33s | |
| 26m 50s | |
| 18m 22s | |
| 22m 18s | |
| 22m 48s | |
| 23m 46s | |
| 15m 14s | |
| 19m 6s | |
| 9m 25s | |
| 4m 34s | |
| 7m 56s | |
| 3m 49s | |
| 7m 30s | |
| 29m 41s | |
| 6m 42s | |
| 6m 25s | |
| 5m 47s | |
| 9m 39s | |
| 4h 53m 53s |