Llama() silently accepts and discards `embedding` kwarg; .embed() then raises confusingly

# Prerequisites

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the README.md.
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

# Expected Behavior

When constructing `Llama` with the older spelling `embedding=True` (singular — the parameter name in 0.2.x), one of two things should happen:

1. The kwarg is accepted as a deprecated alias of `embeddings` and a `DeprecationWarning` is emitted, OR
2. A `TypeError` is raised at construction time, surfacing the issue at the call site rather than swallowing it silently.

# Current Behavior

Neither happens. `embedding=True` is silently swallowed via `**kwargs`, `context_params.embeddings` stays at its default `False`, and the failure surfaces much later — deep inside `.embed()` — with a misleading error message that suggests the user *didn't* pass the flag, when in fact they did (just under the historical name).

```
RuntimeError: Llama model must be created with embeddings=True to call this method
```

This is especially painful for users integrating older libraries that haven't migrated to the new spelling yet — the error points at the wrong thing.

# Environment and Context

- Hardware: x86_64, NVIDIA GeForce RTX 4090
- OS: Windows 10 22H2
- Python 3.12.9
- llama-cpp-python 0.3.36 (CUDA 12.8 prebuilt wheel)

```
$ python --version
Python 3.12.9

$ pip show llama-cpp-python | findstr Version
Version: 0.3.36
```

# Failure Information (for bugs)

The constructor's `**kwargs` swallows unknown keyword arguments with no warning, so a typo or stale parameter name produces a delayed, confusing failure rather than an immediate error.

# Steps to Reproduce

```python
from llama_cpp import Llama

# Pass the older `embedding` (singular) instead of `embeddings` (plural).
m = Llama(model_path="path/to/model.gguf", embedding=True)
m.embed("hello")
```

Result:
```
RuntimeError: Llama model must be created with embeddings=True to call this method
```

Even though `embedding=True` was passed at construction. The fix is to either accept `embedding` as a deprecated alias or to validate kwargs strictly.

# Failure Logs

```
Traceback (most recent call last):
  File "...\Lib\site-packages\llama_cpp\llama.py", line 1602, in embed
    raise RuntimeError(
RuntimeError: Llama model must be created with embeddings=True to call this method
```

Hit while integrating Tencent's HY-Motion text-to-motion model — the `hymotion` package's text encoder still uses the older `embedding=` spelling, so anyone running it against llama-cpp-python 0.3.x sees this confusing failure at first inference instead of at construction. Workaround in our case is a runtime monkey-patch that translates `embedding` → `embeddings` in `Llama.__init__`, but that doesn't help anyone else.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama() silently accepts and discards `embedding` kwarg; .embed() then raises confusingly #2210

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Llama() silently accepts and discards embedding kwarg; .embed() then raises confusingly #2210

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Llama() silently accepts and discards `embedding` kwarg; .embed() then raises confusingly #2210