Skip to content

aymenhmaidiwastaken/promptcheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PromptCheck

Pytest for your LLM prompts. Catch regressions before production.

Python 3.10+ MIT License Version

PromptCheck Demo


PromptCheck lets you write test suites for LLM prompts in YAML, run them against any model, and get deterministic pass/fail results with rich terminal output. Think of it as unit testing for AI — define expected behaviors, assert on outputs, and catch regressions before they ship.

Features

  • YAML test definitions — readable, version-controllable test files
  • Multi-provider support — OpenAI, Anthropic, and Ollama out of the box
  • Rich assertion library — contains, equals, regex, length, JSON schema, JSON path, semantic similarity, LLM-as-judge, latency, and cost assertions
  • Jinja2 prompt templates — parameterize prompts with {{variables}}
  • Detailed failure reports — see exactly what failed and why
  • Multiple output formats — terminal (Rich), JSON, and HTML reports
  • Cost and latency tracking — built-in performance assertions
  • Tag-based filtering — run specific test subsets with --tag

Installation

# From source
git clone https://github.com/aymenhmaidiwastaken/promptcheck.git
cd promptcheck
pip install -e .

# With semantic similarity support
pip install -e ".[semantic]"

Quick Start

1. Create a prompt template

# prompts/sentiment.txt
Classify the sentiment of the following text as positive, negative, or neutral.
Respond with a single word.

Text: {{input}}

2. Write a test file

# tests/sentiment.test.yaml
name: Sentiment Analysis
prompt: prompts/sentiment.txt
model: openai:gpt-4o-mini

tests:
  - name: Positive sentiment
    input: "I love this product! Best purchase ever!"
    assert:
      - type: contains
        value: "positive"

  - name: Negative sentiment
    input: "Terrible experience. Complete waste of money."
    assert:
      - type: contains
        value: "negative"

  - name: Response is concise
    input: "Pretty good overall, would recommend."
    assert:
      - type: length
        max: 20

3. Run the tests

promptcheck run tests/

Assertions

Type Description Config
contains Output contains substring value, case_insensitive
not_contains Output does not contain substring value
equals Output exactly matches value, strip
regex Output matches pattern value
length Output length within bounds min, max
json_schema Output validates against schema schema
json_path JSON path returns expected value path, value
semantic Semantic similarity above threshold value, threshold
llm_judge LLM evaluates output quality criteria, model
latency Response time under limit max_ms
cost API cost under limit max_cost

Providers

Configure which LLM to test against using the model field:

model: openai:gpt-4o-mini      # OpenAI
model: anthropic:claude-sonnet-4-20250514  # Anthropic
model: ollama:llama3            # Ollama (local)

Set API keys as environment variables:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

CLI Commands

# Run all tests in a directory
promptcheck run tests/

# Run with a different model
promptcheck run tests/ --model openai:gpt-4o

# Filter by tags
promptcheck run tests/ --tag sentiment

# Output as JSON
promptcheck run tests/ --output json

# Generate HTML report
promptcheck run tests/ --output html

# Initialize a new test file
promptcheck init

# Show version
promptcheck version

Project Structure

promptcheck/
  cli.py              # Typer CLI commands
  config.py            # Configuration loading
  core/
    loader.py          # YAML test file parser
    executor.py        # Test case execution engine
    runner.py          # Test suite orchestrator
    result.py          # Result data structures
  assertions/
    registry.py        # Assertion type registry
    string.py          # contains, equals, regex, length
    json_assertions.py # json_schema, json_path
    semantic.py        # Semantic similarity
    llm_judge.py       # LLM-as-judge evaluation
    performance.py     # latency, cost
  providers/
    registry.py        # Provider registry
    openai.py          # OpenAI provider
    anthropic.py       # Anthropic provider
    ollama.py          # Ollama provider
  reporters/
    terminal.py        # Rich terminal output
    json_reporter.py   # JSON file output
    html.py            # HTML report generation

License

MIT

About

Pytest for your LLM prompts. Catch regressions before production.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages