MSVC v143 is required to build the package with LLVM from oaitriton.blob.core.windows.net. However, a binary built by a newer MSVC may not work with an older vcredist on the user's computer (see https://learn.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017?view=msvc-170#restrictions , which is a cause of ImportError: DLL load failed while importing libtriton). So the user needs to install the latest vcredist.
Set the binary, include, and library paths of Python, MSVC, Windows SDK, and CUDA in PowerShell (help wanted to automatically find these in CMake):
$Env:Path =
"C:\Windows\System32;" +
"C:\Python312;" +
"C:\Python312\Scripts;" +
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin;" +
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.43.34808\bin\Hostx64\x64;" +
"C:\Program Files (x86)\Windows Kits\10\bin\10.0.26100.0\x64;" +
"C:\Program Files\Git\cmd"
$Env:INCLUDE =
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.43.34808\include;" +
"C:\Program Files (x86)\Windows Kits\10\Include\10.0.26100.0\shared;" +
"C:\Program Files (x86)\Windows Kits\10\Include\10.0.26100.0\ucrt;" +
"C:\Program Files (x86)\Windows Kits\10\Include\10.0.26100.0\um;" +
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\include;" +
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\extras\CUPTI\include"
$Env:LIB =
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.43.34808\lib\x64;" +
"C:\Program Files (x86)\Windows Kits\10\Lib\10.0.26100.0\ucrt\x64;" +
"C:\Program Files (x86)\Windows Kits\10\Lib\10.0.26100.0\um\x64"- CUDA toolkit is only required when building LLVM in the offline build (
TRITON_OFFLINE_BUILD=1) - git is only required when building C++ unit tests (
TRITON_BUILD_UT=1) - cibuildwheel requires the binaries in
C:\Windows\System32\
Then you can either download some dependencies online, or set up an offline build: (When switching between online/offline build, remember to delete CMakeCache.txt)
Download dependencies online
setup.py will download LLVM and JSON into the cache folder set by TRITON_HOME (by default C:\Users\<your username>\.triton\) and link against them. The LLVM is built by https://github.com/triton-lang/triton/blob/main/.github/workflows/llvm-build.yml
A minimal CUDA toolchain (ptxas.exe, cuda.h, cuda.lib) and TinyCC will be downloaded and bundled in the wheel.
If you're in China, make sure to have a good Internet connection.
(For Triton <= 3.1, the pre-built LLVM is not provided. You still need to build LLVM and set LLVM_SYSPATH. Other dependencies can be automatically downloaded.)
Offline build
Enable offline build:
$Env:TRITON_OFFLINE_BUILD = "1"Build LLVM using MSVC according to the instructions of the official Triton:
# Check out the commit according to cmake/llvm-hash.txt (Sadly, you need to rebuild LLVM every week if you want to keep up to date)
cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="mlir;llvm" -DLLVM_TARGETS_TO_BUILD="host;NVPTX;AMDGPU" -DLLVM_BUILD_TOOLS=OFF -DLLVM_CCACHE_BUILD=ON -DLLVM_ENABLE_DIA_SDK=OFF llvm
cmake --build build -j 8 --config Release- See https://github.com/triton-lang/triton?tab=readme-ov-file#building-with-a-custom-llvm and https://github.com/triton-lang/triton/blob/main/.github/workflows/llvm-build.yml
- When cloning LLVM, use
git clone --filter=blob:none https://github.com/llvm/llvm-project.git. You don't want to clone the whole history as it's too large - The official Triton enables
-DLLVM_ENABLE_ASSERTIONS=ONwhen compiling LLVM, and this will increase the binary size of Triton - You may need to add the following compiler options to make MSVC happy, see https://reviews.llvm.org/D90116 and llvm/llvm-project#65255:
diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index c06e661573ed..80b31843f45d 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -821,6 +821,8 @@ if(MSVC)
if (BUILD_SHARED_LIBS)
message(FATAL_ERROR "BUILD_SHARED_LIBS options is not supported on Windows.")
endif()
+ add_compile_options("/utf-8")
+ add_compile_options("/D_SILENCE_NONFLOATING_COMPLEX_DEPRECATION_WARNING")
else()
option(LLVM_LINK_LLVM_DYLIB "Link tools against the libllvm dynamic library" OFF)
option(LLVM_BUILD_LLVM_C_DYLIB "Build libllvm-c re-export library (Darwin only)" OFF)Download JSON according to setup.py:
Set their paths:
$Env:LLVM_SYSPATH = "C:\llvm-project\build"
$Env:JSON_SYSPATH = "C:\json"(For Triton <= 3.1, you also need to download pybind11 and set PYBIND11_SYSPATH according to setup.py)
The CUDA toolchain and TinyCC are not bundled by default in the offline build.
You can disable these if you don't need them: (TRITON_BUILD_BINARY is added in my fork. It can be enabled only if TRITON_BUILD_UT is enabled)
$Env:TRITON_BUILD_BINARY = "0"
$Env:TRITON_BUILD_PROTON = "0"
$Env:TRITON_BUILD_UT = "0"I recommend to use ccache if you installed it:
$Env:TRITON_BUILD_WITH_CCACHE = "1"Clone this repo, checkout release/3.6.x-windows branch, make an editable build using pip:
pip install --no-build-isolation --verbose -e .
# Or `pip install --no-build-isolation --verbose -e python` for Triton <= 3.3Build the wheels: (This is for distributing the wheels to others. You don't need this if you only use Triton on your own computer)
git clean -dfX
$Env:CIBW_BUILD = "{cp310-win_amd64,cp311-win_amd64,cp312-win_amd64,cp313-win_amd64,cp314-win_amd64}"
$Env:CIBW_BUILD_VERBOSITY = "1"
$Env:TRITON_WHEEL_VERSION_SUFFIX = "+windows"
cibuildwheel .
# Or `cibuildwheel python` for Triton <= 3.3If you see errors about defining llvmGetPassPluginInfo when building lib/Instrumentation/PrintLoadStoreMemSpaces.cpp, then you need to replace LLVM_ATTRIBUTE_WEAK with __declspec(dllexport) in include/llvm/Passes/PassPlugin.h of your LLVM, see llvm/llvm-project#115431
GPU is not required to build the package, but is required to run the unit tests.
- Disable Windows Defender. This greatly reduces the time to run everything. See https://github.com/ionuttbara/windows-defender-remover
- Enable Developer Mode of Windows. This allows the runner to create symlinks
- Install environments:
- Nvidia driver (if the machine has GPU. No need to install CUDA toolkit)
- Visual Studio Build Tools (MSVC, Windows SDK)
- Python (disable path length limit when installing)
- Git
- Install the runner: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners
- Create a tag for the runner, and change the value of
runs-on:in the workflow yml to this tag - Start the runner service after setting PATH of Python and Git
- Create a tag for the runner, and change the value of
Then build the wheel and run the unit tests using https://github.com/woct0rdho/triton-windows/blob/readme/.github/workflows/build-and-test-triton.yml
- To implement
dlopen:- For building the package, dlfcn-win32 is added to
thirdparty/and linked in CMake, so I don't need to rewrite it every time - For JIT compilation, in
third_party/nvidia/backend/driver.canddriver.pyit's rewritten withLoadLibrary
- For building the package, dlfcn-win32 is added to
python/triton/windows_utils.pycontains many ways to find the paths of Python, MSVC, Windows SDK, and CUDAIn(This is no longer needed since Triton 3.3)lib/Analysis/Utility.cppandlib/Dialect/TritonGPU/Transforms/Utility.cpp, explicit namespaces are added to support the resolution behaviors of MSVCIn(Upstreamed, see triton-lang#4976 )python/src/interpreter.ccthe GCC built-in__ATOMICmemory orders are replaced withstd::memory_orderIn(Upstreamed, see triton-lang#5351 )third_party/nvidia/backend/driver.py, functionmake_launcher,int64_tshould map toLinPyArg_ParseTuple. This fixes the errorPython int too large to convert to C long.- How TorchInductor is designed to support Windows: pytorch/pytorch#124245