Pure Elixir parser for the safetensors file format.
This library reads .safetensors files and loads tensors directly into Nx tensors. It's designed for loading ML model weights in Elixir applications.
The safetensors format is the standard for storing ML model weights. It's:
- Safe: No arbitrary code execution (unlike pickle)
- Fast: Memory-mapped access, zero-copy when possible
- Simple: JSON header + raw tensor data
To run ML models in Elixir, we need to load these weights. This library provides that capability without requiring Python.
┌─────────────────────────────────────────────────────────────┐
│ Model Loading Pipeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ .safetensors │ -> │ Safetensors │ -> │ Nx Tensors │ │
│ │ file │ │ (parser) │ │ (EMLX) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ File Format: │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ 8 bytes │ N bytes (JSON) │ tensor data... │ │
│ │ header │ {"tensor_name": │ [raw bytes] │ │
│ │ size │ {dtype, shape, │ │ │
│ │ │ offsets}, ...}│ │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Add to your mix.exs:
def deps do
[
{:safetensors, "~> 0.1"}
]
endOr from GitHub:
def deps do
[
{:safetensors, github: "notactuallytreyanastasio/safetensors_ex"}
]
end# Read all tensors from a file
{:ok, tensors} = Safetensors.read("model.safetensors")
# tensors is a map: %{"tensor_name" => %Nx.Tensor{}}
weight = tensors["model.embed_tokens.weight"]# Only load specific tensors (more memory efficient)
{:ok, tensors} = Safetensors.read("model.safetensors",
only: ["model.embed_tokens.weight", "lm_head.weight"]
)# Get tensor info without loading data
{:ok, metadata} = Safetensors.metadata("model.safetensors")
# metadata: %{
# "tensor_name" => %{
# dtype: :f16,
# shape: [32000, 4096],
# data_offsets: [0, 262144000]
# },
# ...
# }# For models that don't fit in memory, stream tensors
Safetensors.stream("model.safetensors", fn name, tensor ->
# Process each tensor as it's loaded
process_tensor(name, tensor)
end)| Safetensors dtype | Nx type | Notes |
|---|---|---|
F32 |
:f32 |
32-bit float |
F16 |
:f16 |
16-bit float |
BF16 |
:bf16 |
Brain float 16 |
I64 |
:s64 |
Signed 64-bit int |
I32 |
:s32 |
Signed 32-bit int |
I16 |
:s16 |
Signed 16-bit int |
I8 |
:s8 |
Signed 8-bit int |
U8 |
:u8 |
Unsigned 8-bit int |
U32 |
:u32 |
Unsigned 32-bit int (for quantized weights) |
BOOL |
:u8 |
Boolean as u8 |
Tensors are created with the current Nx default backend:
# Load directly to GPU
Nx.default_backend({EMLX.Backend, device: :gpu})
{:ok, tensors} = Safetensors.read("model.safetensors")
# All tensors now on GPUFor 4-bit quantized models (like Qwen3-8B-4bit), weights are stored as :u32 with packed int4 values:
{:ok, tensors} = Safetensors.read("model.safetensors")
# Quantized weights come in triplets
w = tensors["model.layers.0.self_attn.q_proj.weight"] # u32, packed int4
scales = tensors["model.layers.0.self_attn.q_proj.scales"] # f16 or bf16
biases = tensors["model.layers.0.self_attn.q_proj.biases"] # f16 or bf16
# Use with EMLX.quantized_matmul
output = EMLX.quantized_matmul(input, w, scales, biases, true, 64, 4)The safetensors format is simple:
- Header size (8 bytes, little-endian u64): Size of JSON header
- Header (N bytes, UTF-8 JSON): Tensor metadata
- Data (remaining bytes): Raw tensor data, concatenated
Header structure:
{
"__metadata__": {"format": "pt"},
"tensor_name": {
"dtype": "F16",
"shape": [4096, 4096],
"data_offsets": [0, 33554432]
}
}Data offsets are relative to the start of the data section.
case Safetensors.read("model.safetensors") do
{:ok, tensors} ->
# Use tensors
{:error, :file_not_found} ->
# File doesn't exist
{:error, :invalid_header} ->
# Corrupted or invalid file
{:error, {:unsupported_dtype, dtype}} ->
# Unknown data type
end
- Memory-mapped: Large files are memory-mapped for efficiency
- Lazy loading: Tensors loaded on-demand when streaming
- Zero-copy: When possible, data is used directly without copying
This is used by:
- bobby_posts: Loads Qwen3-8B-4bit weights
- bumblebee: Model weight loading (Bumblebee has its own loader, but this works standalone)
MIT