Skip to content

Llama() silently accepts and discards embedding kwarg; .embed() then raises confusingly #2210

@emptyngton

Description

@emptyngton

Prerequisites

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

When constructing Llama with the older spelling embedding=True (singular — the parameter name in 0.2.x), one of two things should happen:

  1. The kwarg is accepted as a deprecated alias of embeddings and a DeprecationWarning is emitted, OR
  2. A TypeError is raised at construction time, surfacing the issue at the call site rather than swallowing it silently.

Current Behavior

Neither happens. embedding=True is silently swallowed via **kwargs, context_params.embeddings stays at its default False, and the failure surfaces much later — deep inside .embed() — with a misleading error message that suggests the user didn't pass the flag, when in fact they did (just under the historical name).

RuntimeError: Llama model must be created with embeddings=True to call this method

This is especially painful for users integrating older libraries that haven't migrated to the new spelling yet — the error points at the wrong thing.

Environment and Context

  • Hardware: x86_64, NVIDIA GeForce RTX 4090
  • OS: Windows 10 22H2
  • Python 3.12.9
  • llama-cpp-python 0.3.36 (CUDA 12.8 prebuilt wheel)
$ python --version
Python 3.12.9

$ pip show llama-cpp-python | findstr Version
Version: 0.3.36

Failure Information (for bugs)

The constructor's **kwargs swallows unknown keyword arguments with no warning, so a typo or stale parameter name produces a delayed, confusing failure rather than an immediate error.

Steps to Reproduce

from llama_cpp import Llama

# Pass the older `embedding` (singular) instead of `embeddings` (plural).
m = Llama(model_path="path/to/model.gguf", embedding=True)
m.embed("hello")

Result:

RuntimeError: Llama model must be created with embeddings=True to call this method

Even though embedding=True was passed at construction. The fix is to either accept embedding as a deprecated alias or to validate kwargs strictly.

Failure Logs

Traceback (most recent call last):
  File "...\Lib\site-packages\llama_cpp\llama.py", line 1602, in embed
    raise RuntimeError(
RuntimeError: Llama model must be created with embeddings=True to call this method

Hit while integrating Tencent's HY-Motion text-to-motion model — the hymotion package's text encoder still uses the older embedding= spelling, so anyone running it against llama-cpp-python 0.3.x sees this confusing failure at first inference instead of at construction. Workaround in our case is a runtime monkey-patch that translates embeddingembeddings in Llama.__init__, but that doesn't help anyone else.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions