Skip to content

Commit 1b5fade

Browse files
committed
fix(embed): mark all tokens as output to suppress llama.cpp "overriding" INFO
Force logits_all=True in Llama.embed() so per-token batch.logits[i] flags are all set, regardless of pooling type. Previously, when pooling != NONE, add_sequence flipped most tokens to logits[i]=False, and llama.cpp printed init: embeddings required but some input tokens were not marked as outputs -> overriding once per embed input and silently overrode the flags. Pooling type only changes how per-token outputs are read back in decode_batch (llama_get_embeddings vs llama_get_embeddings_seq), not whether they are produced — so this aligns the per-token flags with what llama.cpp already needed and removes the noisy per-input override message. Fixes #2208.
1 parent f774690 commit 1b5fade

1 file changed

Lines changed: 7 additions & 1 deletion

File tree

llama_cpp/llama.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1040,7 +1040,13 @@ def embed(
10401040

10411041
# get pooling information
10421042
pooling_type = self.pooling_type()
1043-
logits_all = pooling_type == llama_cpp.LLAMA_POOLING_TYPE_NONE
1043+
# In embedding mode every input token must be marked as an output, regardless of
1044+
# pooling type. llama.cpp would otherwise override per-token `logits[i]` and emit
1045+
# "embeddings required but some input tokens were not marked as outputs ->
1046+
# overriding" once per input. Pooling NONE vs MEAN/CLS only changes how the
1047+
# per-token outputs are read back (see decode_batch below), not whether they are
1048+
# produced. See abetlen/llama-cpp-python#2208.
1049+
logits_all = True
10441050

10451051
if self.context_params.embeddings is False:
10461052
raise RuntimeError(

0 commit comments

Comments
 (0)