Prompt caches are per-model, so retrying a refused request on a fallback model normally re-writes the whole cached prefix at the write rate. Anthropic's fallback credit re-bills it as a cache read instead.
Stop Paying Twice for Your Cache When a Refusal Forces a Model Switch
Unlock this tip — and 50 more
This is one of 51 advanced, fact-checked tactics reserved for Pro. Get the full 73-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.
Prefer to browse? The 22 Beginner tips are free forever.
More in Prompt Caching & Reuse
Freeze the Prefix: One Stray Timestamp Kills Your Whole Cache
Prompt caching is a prefix match. A single dynamic byte near the top of your prompt silently invalidates everything after it, so you pay full price every call without realizing it.
Share One Cached System Prompt Across All Your Users
A single per-user byte (name, ID, locale) in the system prompt forks the cache into one entry per user. Strip personalization out of the prefix so every user reads the same cached block.
Cache Your Tool Definitions, Not Just the System Prompt
Tool schemas render before the system prompt, so a non-deterministic tool list silently blocks the cache for everything after it. Sort and freeze the tool array to make tools cacheable.