A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).
-
Updated
Mar 4, 2025 - Python
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).
A FlashAttention backwards-over-backwards ⚡🔙🔙
dLLM training implementation on pure jax/flax (w/o pytorch) for Google TPUs(v4/v5e/v6e). #TPUSprint #TRC
Calculate the hash of any input for ZK-Friendly hashes (MiMC & Poseidon) over a variety of Elliptic Curves.
pdf files for alchi https://github.com/milahu/alchi
Variance-stable routing for 2-bit quantized MoE models. Features dynamic phase correction (Armen Guard), syntactic stabilization layer, and recursive residual quantization for efficient inference.
Benchmarking the JAX Pallas implementation of a custom RNN against alternatives
Flash Attention from first principles on TPU using JAX Pallas.
术 (Shu) — The first GPU-accelerated MSM for the Pallas curve. Part of the HanFei 韩非 series.
Repo to hold core components when building a Pallas Systems Website
Lean 4 formalization of the Pasta curves (Pallas and Vesta) for Zcash's Halo 2 — primality proofs and IsElliptic instances
SuperNova (Pasta) proof generator & verifier with CI and frozen fixtures
Add a description, image, and links to the pallas topic page so that developers can more easily learn about it.
To associate your repository with the pallas topic, visit your repo's landing page and select "manage topics."