Cache-aware routing: factor prompt caching into cost decisions #1598
SebConejo
started this conversation in
Feature request
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The problem
Several users have asked about this, so writing it up properly.
Right now, Manifest picks the cheapest provider for a given model based on list price per token. But list price tells only part of the story. Providers like DeepSeek, Anthropic, and OpenAI offer prompt caching, where repeated prefixes (system prompts, long context) get cached and subsequent requests cost significantly less.
DeepSeek, for example, gives a large discount on cached input tokens. So a model that looks more expensive on paper might actually be cheaper in practice if the user's requests hit the cache.
Manifest has no visibility into this today. It picks the cheapest option by sticker price and moves on.
Why it matters
Two concrete scenarios where this breaks down:
1. The router picks the wrong provider. Say Model X costs $1/M input tokens on Provider A and $1.20/M on Provider B. Manifest routes to A. But the user has been sending the same long system prompt repeatedly, and Provider B caches it at $0.30/M for cached tokens. Provider B is actually cheaper for this workload, but Manifest can't see that.
2. Users bypass Manifest to get cache benefits. If someone knows their workload benefits from caching on a specific provider, they skip the router and go direct. Same problem as batch: Manifest loses relevance for a chunk of real-world usage.
As one user put it: with BYOK, their requests were getting routed to another paid model, breaking the cache they'd built up on a different provider.
What this could look like
A few options, from simple to involved:
x-cache-hint: truethat tells the router "my system prompt is stable, prefer providers with caching."None of these are trivial, but even the simplest version (just including cache pricing in the cost model) would be a real improvement.
👍 if this would help your workflow. Curious how others are handling this today.
Beta Was this translation helpful? Give feedback.
All reactions