PromptCheck

Pytest for your LLM prompts. Catch regressions before production.

PromptCheck lets you write test suites for LLM prompts in YAML, run them against any model, and get deterministic pass/fail results with rich terminal output. Think of it as unit testing for AI — define expected behaviors, assert on outputs, and catch regressions before they ship.

Features

YAML test definitions — readable, version-controllable test files
Multi-provider support — OpenAI, Anthropic, and Ollama out of the box
Rich assertion library — contains, equals, regex, length, JSON schema, JSON path, semantic similarity, LLM-as-judge, latency, and cost assertions
Jinja2 prompt templates — parameterize prompts with {{variables}}
Detailed failure reports — see exactly what failed and why
Multiple output formats — terminal (Rich), JSON, and HTML reports
Cost and latency tracking — built-in performance assertions
Tag-based filtering — run specific test subsets with --tag

Installation

# From source
git clone https://github.com/aymenhmaidiwastaken/promptcheck.git
cd promptcheck
pip install -e .

# With semantic similarity support
pip install -e ".[semantic]"

Quick Start

1. Create a prompt template

# prompts/sentiment.txt
Classify the sentiment of the following text as positive, negative, or neutral.
Respond with a single word.

Text: {{input}}

2. Write a test file

# tests/sentiment.test.yaml
name: Sentiment Analysis
prompt: prompts/sentiment.txt
model: openai:gpt-4o-mini

tests:
  - name: Positive sentiment
    input: "I love this product! Best purchase ever!"
    assert:
      - type: contains
        value: "positive"

  - name: Negative sentiment
    input: "Terrible experience. Complete waste of money."
    assert:
      - type: contains
        value: "negative"

  - name: Response is concise
    input: "Pretty good overall, would recommend."
    assert:
      - type: length
        max: 20

3. Run the tests

promptcheck run tests/

Assertions

Type	Description	Config
`contains`	Output contains substring	`value`, `case_insensitive`
`not_contains`	Output does not contain substring	`value`
`equals`	Output exactly matches	`value`, `strip`
`regex`	Output matches pattern	`value`
`length`	Output length within bounds	`min`, `max`
`json_schema`	Output validates against schema	`schema`
`json_path`	JSON path returns expected value	`path`, `value`
`semantic`	Semantic similarity above threshold	`value`, `threshold`
`llm_judge`	LLM evaluates output quality	`criteria`, `model`
`latency`	Response time under limit	`max_ms`
`cost`	API cost under limit	`max_cost`

Providers

Configure which LLM to test against using the model field:

model: openai:gpt-4o-mini      # OpenAI
model: anthropic:claude-sonnet-4-20250514  # Anthropic
model: ollama:llama3            # Ollama (local)

Set API keys as environment variables:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

CLI Commands

# Run all tests in a directory
promptcheck run tests/

# Run with a different model
promptcheck run tests/ --model openai:gpt-4o

# Filter by tags
promptcheck run tests/ --tag sentiment

# Output as JSON
promptcheck run tests/ --output json

# Generate HTML report
promptcheck run tests/ --output html

# Initialize a new test file
promptcheck init

# Show version
promptcheck version

Project Structure

promptcheck/
  cli.py              # Typer CLI commands
  config.py            # Configuration loading
  core/
    loader.py          # YAML test file parser
    executor.py        # Test case execution engine
    runner.py          # Test suite orchestrator
    result.py          # Result data structures
  assertions/
    registry.py        # Assertion type registry
    string.py          # contains, equals, regex, length
    json_assertions.py # json_schema, json_path
    semantic.py        # Semantic similarity
    llm_judge.py       # LLM-as-judge evaluation
    performance.py     # latency, cost
  providers/
    registry.py        # Provider registry
    openai.py          # OpenAI provider
    anthropic.py       # Anthropic provider
    ollama.py          # Ollama provider
  reporters/
    terminal.py        # Rich terminal output
    json_reporter.py   # JSON file output
    html.py            # HTML report generation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
examples		examples
promptcheck		promptcheck
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PromptCheck

Features

Installation

Quick Start

1. Create a prompt template

2. Write a test file

3. Run the tests

Assertions

Providers

CLI Commands

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PromptCheck

Features

Installation

Quick Start

1. Create a prompt template

2. Write a test file

3. Run the tests

Assertions

Providers

CLI Commands

Project Structure

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages