FinchPress Scorer

### Press
This proposal aims to implement the Finch Press Scorer, following the approach described in [FINCH: Prompt-guided Key-Value Cache Compression](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00716/125280/FINCH-Prompt-guided-Key-Value-Cache-Compression). The Finch Press scorer computes attention scores to determine which key-value states to retain, leveraging the cross-attention between the question and the context to guide the compression.

### Motivation
The Finch Press Scorer is conceptually similar to SnapKV Press, but with a key difference:

SnapKV Press calculates the cross-attention between the last k tokens and the context.
Finch Press, on the other hand, calculates the cross-attention between the question and the context.
This fundamental distinction raises an important design question: how should we clearly separate the context from the question within the compression mechanism?

We propose to introduce a separator token ([SEP]) to distinguish between the context and the question. With an API usage similar to:
```python
from transformers import pipeline
from kvpress import FinchPress

device = "cuda:0"
model = "meta-llama/Llama-3.1-8B-Instruct"
model_kwargs = {"attn_implementation": "flash_attention_2"}
pipe = pipeline("kv-press-text-generation", model=model, device=device, model_kwargs=model_kwargs)

context = "A very long text you want to compress once and for all"
question = "\nA question about the compressed context"

# Introduce a separator token
tokenizer.add_token("[SEP]")
sep_token_id = len(tokenizer)

press = FinchPress(compression_ratio=0.5, sep_token_id=sep_token_id)
concatenated_context = context + "[SEP]" + question
answer = pipe(concatenated_context, question="", press=press)["answer"]
```
Internally, FinchPress will split the context from the question using the provided `sep_token_id` and apply its scoring mechanism accordingly.
We are open to suggestions on alternative ways to handle context-question separation efficiently and in a way that remains compliant with the philosophy of KVPress.

Contributors
Implementation will be handled by:
myself
@miriam-16
@eliaFaure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FinchPress Scorer #59

Press

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FinchPress Scorer #59

Description

Press

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions