Cache-aware routing: factor prompt caching into cost decisions #1598

SebConejo · 2026-04-17T01:21:11Z

SebConejo
Apr 17, 2026
Maintainer

The problem

Several users have asked about this, so writing it up properly.

Right now, Manifest picks the cheapest provider for a given model based on list price per token. But list price tells only part of the story. Providers like DeepSeek, Anthropic, and OpenAI offer prompt caching, where repeated prefixes (system prompts, long context) get cached and subsequent requests cost significantly less.

DeepSeek, for example, gives a large discount on cached input tokens. So a model that looks more expensive on paper might actually be cheaper in practice if the user's requests hit the cache.

Manifest has no visibility into this today. It picks the cheapest option by sticker price and moves on.

Why it matters

Two concrete scenarios where this breaks down:

1. The router picks the wrong provider. Say Model X costs $1/M input tokens on Provider A and $1.20/M on Provider B. Manifest routes to A. But the user has been sending the same long system prompt repeatedly, and Provider B caches it at $0.30/M for cached tokens. Provider B is actually cheaper for this workload, but Manifest can't see that.

2. Users bypass Manifest to get cache benefits. If someone knows their workload benefits from caching on a specific provider, they skip the router and go direct. Same problem as batch: Manifest loses relevance for a chunk of real-world usage.

As one user put it: with BYOK, their requests were getting routed to another paid model, breaking the cache they'd built up on a different provider.

What this could look like

A few options, from simple to involved:

Expose cache pricing in the routing model. If Manifest knows that Provider B charges less for cached tokens, and the user signals (or Manifest detects) that their prefix is cacheable, factor that into the cost comparison.
Let users hint at cacheability. A header or parameter like x-cache-hint: true that tells the router "my system prompt is stable, prefer providers with caching."
Provider affinity for cache continuity. Once a user's requests start hitting a cache on a given provider, keep routing there instead of bouncing between providers and breaking the cache.

None of these are trivial, but even the simplest version (just including cache pricing in the cost model) would be a real improvement.

👍 if this would help your workflow. Curious how others are handling this today.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache-aware routing: factor prompt caching into cost decisions #1598

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Cache-aware routing: factor prompt caching into cost decisions #1598

Uh oh!

SebConejo Apr 17, 2026 Maintainer

The problem

Why it matters

What this could look like

Replies: 0 comments

SebConejo
Apr 17, 2026
Maintainer