This repository contains a 1-day Proof of Concept (PoC) exploitation chain for CVE-2026-7482, an unauthenticated Out-of-Bounds (OOB) Read vulnerability in Ollama's GGUF model loader (versions prior to 0.17.1).
Note: This is a 1-day research reproduction. I did not discover the original CVE. This PoC was engineered based on the public advisory details to demonstrate the mechanics of the vulnerability for educational and defensive research purposes.
By supplying a maliciously crafted, truncated GGUF file to the /api/create endpoint, an attacker can force the quantization parser in fs/ggml/gguf.go and server/quantization.go to read past the allocated heap buffer. The leaked memory is then exfiltrated by pushing the resulting model artifact to an attacker-controlled Docker registry via the /api/push endpoint.
During this 1-day research, reproducing the crash was trivial, but achieving stable exfiltration without crashing the server or hitting API validation blocks required specific architectural forging:
- Frontend Validation Bypass: The payload must be tagged as
F16(general.file_type = 1) to satisfy the Ollama API's strict pre-flight checks. - Quantizer Coercion: We request a
Q4_K_Mdown-quantization. Because the payload is seen as F16, the C++ggmlbackend is forced to process the payload rather than performing a safe, 1:1 memory copy. - Perfect Block Alignment: The target tensor (
token_embd.weight) must be shaped as a 2D matrix where the innermost dimension is exactly256(e.g.,[num_rows, 256]). This strictly aligns withQ4_K_Mblock requirements, preventing the backend from skipping the layer. - Physical Truncation: The physical file is truncated to 32 bytes. When the matrix multiplication loop runs, it hits EOF and over-reads directly into the adjacent heap space.
pip install requests numpy ggufYou also need a publicly accessible HTTP listener (like Ngrok) to catch the exfiltrated Docker layer pushes.
1. Start the Rogue Registry Start the listener to catch the leaked memory blobs.
sudo python3 registry.py2. Forge the Malicious Payload
Generate the truncated GGUF file. You can adjust TARGET_LEAK_SIZE_MB inside the script to control how much heap memory is scraped per request. (Recommended: 0.5MB to 2.0MB to avoid segfaulting unmapped pages).
python3 forge.py3. Fire the Exploit
Edit exploit.py to include your target IP and your rogue registry URL, then execute:
python3 exploit.py4. Analyze the Artifact
The registry will drop the leaked heap dumps into the exfils/ directory.
Note on Data Integrity (The Quantization Trap): While the exploit successfully captures and exfiltrates up to several megabytes of server heap memory, the data is subjected to Ollama's Q4_K_M down-quantization algorithm during the OOB read. The backend casts the raw memory bytes to float16 and applies a lossy 4-bit block compression scheme. Consequently, the leaked memory is mathematically mangled. Standard ASCII extraction tools will yield binary garbage, making plaintext credential recovery practically unviable via this specific coercion path.
This project is for educational and authorized vulnerability research purposes only. Do not use this tool against systems you do not own or have explicit permission to test.