Is it possible to identify which LLM generated a response by analyzing its semantic patterns across multiple standardized prompts?
LLM Fingerprint uses semantic similarity patterns across multiple prompts to create model-specific "fingerprints":
-
Fingerprint Creation: Generate multiple responses from known LLMs using standardized prompts
- Fixed sampling parameters to ensure consistent sampling behavior
- Multiple response samples per prompt to account for sampling variance
- Several distinct prompts to capture model characteristics
-
Similarity Analysis: Measure semantic similarity within and between prompt response groups
- Within-prompt similarity reveals consistency characteristics
- Cross-prompt similarity patterns create a unique model signature
-
Model Identification: Match patterns from unknown models against the fingerprint database
- Generate responses from the unknown model using the same standardized prompts
- Compare similarity patterns with known models
- Identify the closest matching fingerprint
Set required environment variables. See .envrc.example for more details.
# Generate samples for known model responses
llm-fingerprint generate \
--language-model "model-1" "model-2" "model-3" \
--prompts-path "./data/prompts/prompts_general_v1.jsonl" \
--samples-path "samples.jsonl" \
--samples-num 4
# Upload samples to ChromaDB
llm-fingerprint upload \
--language-model "embedding-model" \
--samples-path "samples.jsonl" \
--collection-name "samples"# Generate samples for unknown model (or use an external service)
# Let's suppose the we don't know we are using model-2
llm-fingerprint generate \
--language-model "model-2" \
--prompts-path "./data/prompts/prompts_single_v1.jsonl" \
--samples-path "unk-samples.jsonl" \
--samples-num 1
# Query ChromaDB for model identification
llm-fingerprint query \
--language-model "embedding-model" \
--samples-path "unk-samples.jsonl" \
--results-path "results.jsonl" \
--results-num 2
# matches.jsonl will contain the results
# {"model": "model-2", "score": ... }
# {"model": "model-1", "score": ... }The preferred way to install llm-fingerprint is using uv (although you can also use pip).
# Clone the repository
git clone https://github.com/S1M0N38/llm-fingerprint.git
cd llm-fingerprint
# Create a virtual environment
uv venv
# Install the package
uv sync # --all-groups # for installing ml and dev groups- Python 3.11+
- OpenAI-compatible API endpoints (
/chat/completionsand/embeddings) - Access to ChromaDB (locally or hosted)
This toy/research project is still in its early stages, and I welcome any feedback, suggestions, and contributions! If you're interested in discussing ideas or have questions about the approach, please start a conversation in GitHub Discussions.
For detailed information on setting up your development environment, understanding the project structure, and the contribution workflow, please refer to CONTRIBUTING.md.