Every provider's prompt cache keys on an exact byte-for-byte prefix match. The instant anything changes near the start of the prompt, the cache for everything after it is thrown out. The classic mistake is interpolating a timestamp or session ID into the system prompt.
Before (cache never hits):
SYSTEM: You are a support agent. Current time: 2026-06-13T14:22:09Z.
<8,000 tokens of policies, FAQs, examples>
Every request has a new timestamp at the very top, so the 8,000-token block below it re-processes at full price every single call.
After (cache hits):
SYSTEM: You are a support agent.
<8,000 tokens of policies, FAQs, examples> <-- cache_control breakpoint here
USER: Current time is 2026-06-13T14:22:09Z. <question>
Move volatile content (timestamps, UUIDs, the user's actual question) below the stable block. The 8,000 stable tokens now form a reusable prefix.
Why it saves tokens: On Anthropic, cache reads cost roughly 0.1x the base input rate. Reading 8,000 cached tokens instead of re-billing them at full price is about a 90% saving on that span. The catch is that nothing is saved unless the prefix is identical — the model doesn't "mostly" match.
How to verify: Check usage.cache_read_input_tokens on the response. If it stays 0 across repeated calls, a silent invalidator is in your prefix. Common culprits: datetime.now(), uuid4(), or json.dumps() without sort_keys=True (Python dict/JSON ordering can vary, changing the bytes). Diff the raw rendered prompt of two consecutive requests and the offending field jumps out. Fix it once and that block caches for the rest of the conversation.