Skip to content

FinchPress Scorer #59

@giulio98

Description

@giulio98

Press

This proposal aims to implement the Finch Press Scorer, following the approach described in FINCH: Prompt-guided Key-Value Cache Compression. The Finch Press scorer computes attention scores to determine which key-value states to retain, leveraging the cross-attention between the question and the context to guide the compression.

Motivation

The Finch Press Scorer is conceptually similar to SnapKV Press, but with a key difference:

SnapKV Press calculates the cross-attention between the last k tokens and the context.
Finch Press, on the other hand, calculates the cross-attention between the question and the context.
This fundamental distinction raises an important design question: how should we clearly separate the context from the question within the compression mechanism?

We propose to introduce a separator token ([SEP]) to distinguish between the context and the question. With an API usage similar to:

from transformers import pipeline
from kvpress import FinchPress

device = "cuda:0"
model = "meta-llama/Llama-3.1-8B-Instruct"
model_kwargs = {"attn_implementation": "flash_attention_2"}
pipe = pipeline("kv-press-text-generation", model=model, device=device, model_kwargs=model_kwargs)

context = "A very long text you want to compress once and for all"
question = "\nA question about the compressed context"

# Introduce a separator token
tokenizer.add_token("[SEP]")
sep_token_id = len(tokenizer)

press = FinchPress(compression_ratio=0.5, sep_token_id=sep_token_id)
concatenated_context = context + "[SEP]" + question
answer = pipe(concatenated_context, question="", press=press)["answer"]

Internally, FinchPress will split the context from the question using the provided sep_token_id and apply its scoring mechanism accordingly.
We are open to suggestions on alternative ways to handle context-question separation efficiently and in a way that remains compliant with the philosophy of KVPress.

Contributors
Implementation will be handled by:
myself
@miriam-16
@eliaFaure

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions