TokenWise

Token usage optimization toolkit — count tokens, compress prompts, estimate API costs, and track LLM token budgets. Works with GPT-4, Claude, Llama, Gemini, Mistral, and more.

Why TokenWise

Token cost and context-window limits show up everywhere in modern AI systems, but most teams still handle them with scattered scripts, rough estimates, and provider-specific logic.

TokenWise is designed to make those concerns easier to manage in one place:

estimate token usage before a request goes out
compare cost across model families
compress prompts when budgets are tight
track spend over time instead of treating cost as an afterthought

What It Covers

token counting heuristics across major model families
prompt optimization and budget-aware trimming
cost estimation for input and output tokens
usage tracking with alerts and reporting
batch prompt cleanup workflows

Architecture

graph LR
    A[Your Code] --> B[TokenWise]
    B --> C[TokenCounter]
    B --> D[TokenOptimizer]
    B --> E[CostEstimator]
    B --> F[UsageTracker]
    B --> G[BatchOptimizer]
    C --> C1[count]
    C --> C2[count_messages]
    C --> C3[fits_context]
    D --> D1[optimize]
    D --> D2[optimize_to_budget]
    D --> D3[savings_report]
    E --> E1[estimate]
    E --> E2[compare_models]
    F --> F1[track]
    F --> F2[check_budget]
    F --> F3[get_report]
    G --> G1[optimize_batch]
    G --> G2[deduplicate_prompts]
    B --> H[Config]
    H --> H1[Model Pricing]
    H --> H2[Budget Settings]

Quickstart

Installation

pip install tokenwise

Or install from source:

git clone https://github.com/MukundaKatta/TokenWise.git
cd TokenWise
pip install -e .

Basic Usage

from tokenwise import TokenCounter, TokenOptimizer, CostEstimator, UsageTracker

# Count tokens
counter = TokenCounter()
tokens = counter.count("Hello, how can I help you today?", model="gpt-4")
print(f"Token count: {tokens}")

# Estimate cost
estimator = CostEstimator()
cost = estimator.estimate(tokens, model="gpt-4")
print(f"Estimated cost: ${cost:.6f}")

# Optimize a prompt
optimizer = TokenOptimizer()
report = optimizer.savings_report(
    "Please kindly just basically explain what AI actually is in my opinion."
)
print(f"Saved {report['tokens_saved']} tokens ({report['savings_pct']}%)")

# Track usage with budget alerts
tracker = UsageTracker()
tracker.track(
    request="Explain quantum computing in simple terms.",
    response="Quantum computing uses qubits instead of classical bits..."
)
print(f"Total spend: ${tracker.total_cost():.6f}")

Multi-step Budget Breakdown

from tokenwise import BudgetTracker

tracker = BudgetTracker()
tracker.add_step("draft", request="Write a landing page headline", response="Fast AI workflows for teams.")
tracker.add_step("review", request="Critique the headline", response="Shorten the second clause.")

report = tracker.get_report(warning_threshold_usd=0.01)
print(report.total_cost)
print(report.pricing_version)
for step in report.steps:
    print(step.name, step.total_tokens, step.total_cost)

CLI

# Count tokens
tokenwise count "Hello, how can I help you today?"

# Estimate cost
tokenwise cost "Hello, how can I help you today?" --model gpt-4

# Compare costs across all models
tokenwise cost "Hello, how can I help you today?" --compare

# Optimize a prompt
tokenwise optimize "Please kindly just basically explain what AI is."

Batch Optimization

from tokenwise import BatchOptimizer

batch = BatchOptimizer()
prompts = [
    "Please kindly explain AI.",
    "In order to understand, basically describe ML.",
    "Please kindly explain AI.",  # duplicate
]

# Deduplicate
unique = batch.deduplicate_prompts(prompts)

# Optimize and get summary
summary = batch.batch_summary(unique)
print(f"Saved {summary['total_tokens_saved']} tokens across {summary['prompt_count']} prompts")

Pricing Data

Model pricing now lives in a versioned package data file at src/tokenwise/data/model_pricing.v1.json.

That gives TokenWise a safer update workflow:

pricing changes are separated from estimator logic
the catalog carries an explicit version
historical reports can point back to the pricing version used at the time

To update pricing, edit the JSON catalog, keep the schema consistent, and run the test suite before publishing.

Pricing Table

Model	Input (per 1K tokens)	Output (per 1K tokens)
GPT-4	$0.0300	$0.0600
GPT-4 Turbo	$0.0100	$0.0300
GPT-4o	$0.0050	$0.0150
GPT-3.5 Turbo	$0.0005	$0.0015
Claude 3 Opus	$0.0150	$0.0750
Claude 3.5 Sonnet	$0.0030	$0.0150
Claude 3 Haiku	$0.00025	$0.00125
Claude 4 Opus	$0.0150	$0.0750
Claude 4 Sonnet	$0.0030	$0.0150
Gemini 1.5 Pro	$0.00125	$0.0050
Gemini 1.5 Flash	$0.000075	$0.0003
Llama 3 70B	$0.00059	$0.00079
Llama 3 8B	$0.00005	$0.00008
Mistral Large	$0.0040	$0.0120
Mistral Small	$0.0010	$0.0030

Features

Token Counting — Heuristic-based token estimation for all major LLM models
Prompt Optimization — Compress prompts by removing filler words and shortening verbose phrases
Cost Estimation — Up-to-date pricing for GPT-4, Claude, Gemini, Llama, Mistral, and more
Usage Tracking — Track token usage with daily/monthly budgets and threshold alerts
Batch Optimization — Optimize and deduplicate lists of prompts in bulk
CLI — Built-in command-line interface powered by Typer and Rich
Model Comparison — Compare token counts and costs across models side-by-side

Who This Is For

developers building AI products with real token budgets
teams comparing providers and model cost tradeoffs
prompt engineers trying to reduce waste without losing clarity
anyone who wants lightweight token tooling without a larger framework

Configuration

Set defaults via environment variables or .env file:

TOKENWISE_DEFAULT_MODEL=gpt-4
TOKENWISE_LOG_LEVEL=INFO
TOKENWISE_COST_MULTIPLIER=1.0

See .env.example for all available options.

Development

make dev       # Install with dev dependencies
make test      # Run tests
make lint      # Lint with ruff
make format    # Auto-format code
make run       # Show CLI help

Project Direction

TokenWise is best when it stays practical: easy to script, easy to embed in existing apps, and focused on the real questions developers ask when shipping LLM-powered systems.

Future improvements can build on that foundation with:

better model-pricing refresh workflows
more benchmark-style prompt comparisons
richer reporting and budget policy options
stronger integration patterns for production AI pipelines

License

MIT License. See LICENSE for details.

Inspired by LLM cost optimization trends and the need for better token management

Built by Officethree Technologies | Made with ❤️ and AI

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
docs		docs
src/tokenwise		src/tokenwise
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TokenWise

Why TokenWise

What It Covers

Architecture

Quickstart

Installation

Basic Usage

Multi-step Budget Breakdown

CLI

Batch Optimization

Pricing Data

Pricing Table

Features

Who This Is For

Configuration

Development

Project Direction

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TokenWise

Why TokenWise

What It Covers

Architecture

Quickstart

Installation

Basic Usage

Multi-step Budget Breakdown

CLI

Batch Optimization

Pricing Data

Pricing Table

Features

Who This Is For

Configuration

Development

Project Direction

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages