-
Notifications
You must be signed in to change notification settings - Fork 4k
Let lightgbm opencl module support "quantization gradient" function #7154
Description
Summary
It is hoped that the "quantization gradient" function can be supported in the OpenCL-related code.
Motivation
I am using an RTX 4060Ti 16GB graphics card. I trained a model using Lightgbm 4.6 on a group of private data on Windows. Repeating the training with the same parameters would yield different results. However, XGBoost has a default "quantized gradient" feature for int64_t data types in FP32 format. Therefore, it won't do this. But lightgbm is an important development library for me, so I hope it can provide a similar function.
Description
The Lightgbm project offers the "quantized gradient" feature on both CPU and CUDA modules, but the OpenCL module does not provide it. By reviewing the relevant materials, it turns out that OpenCL 3.0 already possesses some functions similar to CUDA, and can achieve a similar effect to a certain extent.
The RTX series graphics cards' performance using FP64 is only 1/64 that of FP32.
It is hoped that the corresponding "quantized gradient" function can be implemented on the opencl module.
References
| CUDA primitives | OpenCL equivalents | Availability |
|---|---|---|
__shfl_down_sync(mask, val, offset) |
sub_group_shuffle_down(val, offset) |
Requires cl_khr_subgroup_shuffle_relative |
atomicAdd(addr, val) (int32) |
atom_add(addr, val) |
Core feature |
__syncthreads() |
barrier(CLK_LOCAL_MEM_FENCE) |
Core feature |
__shared__ |
__local |
Core feature |
blockIdx.x / threadIdx.x |
get_group_id(0) / get_local_id(0) |
Core feature |