Skip to content

notactuallytreyanastasio/safetensors_ex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Safetensors

Pure Elixir parser for the safetensors file format.

This library reads .safetensors files and loads tensors directly into Nx tensors. It's designed for loading ML model weights in Elixir applications.

Why This Exists

The safetensors format is the standard for storing ML model weights. It's:

  • Safe: No arbitrary code execution (unlike pickle)
  • Fast: Memory-mapped access, zero-copy when possible
  • Simple: JSON header + raw tensor data

To run ML models in Elixir, we need to load these weights. This library provides that capability without requiring Python.

The Big Picture

┌─────────────────────────────────────────────────────────────┐
│                  Model Loading Pipeline                      │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐   │
│  │ .safetensors │ -> │  Safetensors │ -> │  Nx Tensors  │   │
│  │    file      │    │   (parser)   │    │   (EMLX)     │   │
│  └──────────────┘    └──────────────┘    └──────────────┘   │
│                                                              │
│  File Format:                                                │
│  ┌────────────────────────────────────────────────────────┐ │
│  │ 8 bytes  │  N bytes (JSON)  │  tensor data...          │ │
│  │ header   │  {"tensor_name": │  [raw bytes]             │ │
│  │  size    │   {dtype, shape, │                          │ │
│  │          │    offsets}, ...}│                          │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Installation

Add to your mix.exs:

def deps do
  [
    {:safetensors, "~> 0.1"}
  ]
end

Or from GitHub:

def deps do
  [
    {:safetensors, github: "notactuallytreyanastasio/safetensors_ex"}
  ]
end

Usage

Reading a File

# Read all tensors from a file
{:ok, tensors} = Safetensors.read("model.safetensors")

# tensors is a map: %{"tensor_name" => %Nx.Tensor{}}
weight = tensors["model.embed_tokens.weight"]

Reading Specific Tensors

# Only load specific tensors (more memory efficient)
{:ok, tensors} = Safetensors.read("model.safetensors",
  only: ["model.embed_tokens.weight", "lm_head.weight"]
)

Inspecting Metadata

# Get tensor info without loading data
{:ok, metadata} = Safetensors.metadata("model.safetensors")

# metadata: %{
#   "tensor_name" => %{
#     dtype: :f16,
#     shape: [32000, 4096],
#     data_offsets: [0, 262144000]
#   },
#   ...
# }

Streaming Large Models

# For models that don't fit in memory, stream tensors
Safetensors.stream("model.safetensors", fn name, tensor ->
  # Process each tensor as it's loaded
  process_tensor(name, tensor)
end)

Supported Data Types

Safetensors dtype Nx type Notes
F32 :f32 32-bit float
F16 :f16 16-bit float
BF16 :bf16 Brain float 16
I64 :s64 Signed 64-bit int
I32 :s32 Signed 32-bit int
I16 :s16 Signed 16-bit int
I8 :s8 Signed 8-bit int
U8 :u8 Unsigned 8-bit int
U32 :u32 Unsigned 32-bit int (for quantized weights)
BOOL :u8 Boolean as u8

Backend Support

Tensors are created with the current Nx default backend:

# Load directly to GPU
Nx.default_backend({EMLX.Backend, device: :gpu})
{:ok, tensors} = Safetensors.read("model.safetensors")
# All tensors now on GPU

Quantized Models

For 4-bit quantized models (like Qwen3-8B-4bit), weights are stored as :u32 with packed int4 values:

{:ok, tensors} = Safetensors.read("model.safetensors")

# Quantized weights come in triplets
w = tensors["model.layers.0.self_attn.q_proj.weight"]      # u32, packed int4
scales = tensors["model.layers.0.self_attn.q_proj.scales"]  # f16 or bf16
biases = tensors["model.layers.0.self_attn.q_proj.biases"]  # f16 or bf16

# Use with EMLX.quantized_matmul
output = EMLX.quantized_matmul(input, w, scales, biases, true, 64, 4)

File Format Details

The safetensors format is simple:

  1. Header size (8 bytes, little-endian u64): Size of JSON header
  2. Header (N bytes, UTF-8 JSON): Tensor metadata
  3. Data (remaining bytes): Raw tensor data, concatenated

Header structure:

{
  "__metadata__": {"format": "pt"},
  "tensor_name": {
    "dtype": "F16",
    "shape": [4096, 4096],
    "data_offsets": [0, 33554432]
  }
}

Data offsets are relative to the start of the data section.

Error Handling

case Safetensors.read("model.safetensors") do
  {:ok, tensors} ->
    # Use tensors
  {:error, :file_not_found} ->
    # File doesn't exist
  {:error, :invalid_header} ->
    # Corrupted or invalid file
  {:error, {:unsupported_dtype, dtype}} ->
    # Unknown data type
end

Performance

  • Memory-mapped: Large files are memory-mapped for efficiency
  • Lazy loading: Tensors loaded on-demand when streaming
  • Zero-copy: When possible, data is used directly without copying

Relationship to Other Projects

This is used by:

  • bobby_posts: Loads Qwen3-8B-4bit weights
  • bumblebee: Model weight loading (Bumblebee has its own loader, but this works standalone)

License

MIT

About

Safetensors file format parser for Elixir - read ML model weights

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages