🕵 LLM Fingerprint

Identify LLMs by their response fingerprints

Research Question

Is it possible to identify which LLM generated a response by analyzing its semantic patterns across multiple standardized prompts?

Approach

LLM Fingerprint uses semantic similarity patterns across multiple prompts to create model-specific "fingerprints":

Fingerprint Creation: Generate multiple responses from known LLMs using standardized prompts
- Fixed sampling parameters to ensure consistent sampling behavior
- Multiple response samples per prompt to account for sampling variance
- Several distinct prompts to capture model characteristics
Similarity Analysis: Measure semantic similarity within and between prompt response groups
- Within-prompt similarity reveals consistency characteristics
- Cross-prompt similarity patterns create a unique model signature
Model Identification: Match patterns from unknown models against the fingerprint database
- Generate responses from the unknown model using the same standardized prompts
- Compare similarity patterns with known models
- Identify the closest matching fingerprint

Usage

Set required environment variables. See .envrc.example for more details.

Creating Model Fingerprints

# Generate samples for known model responses
llm-fingerprint generate \
  --language-model "model-1" "model-2" "model-3" \
  --prompts-path "./data/prompts/prompts_general_v1.jsonl" \
  --samples-path "samples.jsonl" \
  --samples-num 4

# Upload samples to ChromaDB
llm-fingerprint upload \
  --language-model "embedding-model" \
  --samples-path "samples.jsonl" \
  --collection-name "samples"

Identifying Unknown Models

# Generate samples for unknown model (or use an external service)
# Let's suppose the we don't know we are using model-2
llm-fingerprint generate \
  --language-model "model-2" \
  --prompts-path "./data/prompts/prompts_single_v1.jsonl" \
  --samples-path "unk-samples.jsonl" \
  --samples-num 1

# Query ChromaDB for model identification
llm-fingerprint query \
  --language-model "embedding-model" \
  --samples-path "unk-samples.jsonl" \
  --results-path "results.jsonl" \
  --results-num 2

# matches.jsonl will contain the results
# {"model": "model-2", "score": ... }
# {"model": "model-1", "score": ... }

Installation

The preferred way to install llm-fingerprint is using uv (although you can also use pip).

# Clone the repository
git clone https://github.com/S1M0N38/llm-fingerprint.git
cd llm-fingerprint

# Create a virtual environment
uv venv

# Install the package
uv sync # --all-groups # for installing ml and dev groups

Requirements

Python 3.11+
OpenAI-compatible API endpoints (/chat/completions and /embeddings)
Access to ChromaDB (locally or hosted)

Contributing

This toy/research project is still in its early stages, and I welcome any feedback, suggestions, and contributions! If you're interested in discussing ideas or have questions about the approach, please start a conversation in GitHub Discussions.

For detailed information on setting up your development environment, understanding the project structure, and the contribution workflow, please refer to CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.github/workflows		.github/workflows
config		config
data/prompts		data/prompts
src/llm_fingerprint		src/llm_fingerprint
tests		tests
.envrc.example		.envrc.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕵 LLM Fingerprint

Research Question

Approach

Usage

Creating Model Fingerprints

Identifying Unknown Models

Installation

Requirements

Contributing

About

Uh oh!

Releases 10

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🕵 LLM Fingerprint

Research Question

Approach

Usage

Creating Model Fingerprints

Identifying Unknown Models

Installation

Requirements

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Uh oh!

Contributors

Uh oh!

Languages