Commit 1b5fade
committed
fix(embed): mark all tokens as output to suppress llama.cpp "overriding" INFO
Force logits_all=True in Llama.embed() so per-token batch.logits[i] flags are
all set, regardless of pooling type. Previously, when pooling != NONE,
add_sequence flipped most tokens to logits[i]=False, and llama.cpp printed
init: embeddings required but some input tokens were not marked as outputs -> overriding
once per embed input and silently overrode the flags.
Pooling type only changes how per-token outputs are read back in decode_batch
(llama_get_embeddings vs llama_get_embeddings_seq), not whether they are
produced — so this aligns the per-token flags with what llama.cpp already
needed and removes the noisy per-input override message.
Fixes #2208.1 parent f774690 commit 1b5fade
1 file changed
Lines changed: 7 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1040 | 1040 | | |
1041 | 1041 | | |
1042 | 1042 | | |
1043 | | - | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
1044 | 1050 | | |
1045 | 1051 | | |
1046 | 1052 | | |
| |||
0 commit comments