Freeze the Prefix: One Stray Timestamp Kills Your Whole Cache

up to 90% on the cached portion Beginner 1 min read

Every provider's prompt cache keys on an exact byte-for-byte prefix match. The instant anything changes near the start of the prompt, the cache for everything after it is thrown out. The classic mistake is interpolating a timestamp or session ID into the system prompt.

Before (cache never hits):

SYSTEM: You are a support agent. Current time: 2026-06-13T14:22:09Z.
<8,000 tokens of policies, FAQs, examples>

Every request has a new timestamp at the very top, so the 8,000-token block below it re-processes at full price every single call.

After (cache hits):

SYSTEM: You are a support agent.
<8,000 tokens of policies, FAQs, examples>   <-- cache_control breakpoint here
USER: Current time is 2026-06-13T14:22:09Z. <question>

Move volatile content (timestamps, UUIDs, the user's actual question) below the stable block. The 8,000 stable tokens now form a reusable prefix.

Why it saves tokens: On Anthropic, cache reads cost roughly 0.1x the base input rate. Reading 8,000 cached tokens instead of re-billing them at full price is about a 90% saving on that span. The catch is that nothing is saved unless the prefix is identical — the model doesn't "mostly" match.

How to verify: Check usage.cache_read_input_tokens on the response. If it stays 0 across repeated calls, a silent invalidator is in your prefix. Common culprits: datetime.now(), uuid4(), or json.dumps() without sort_keys=True (Python dict/JSON ordering can vary, changing the bytes). Diff the raw rendered prompt of two consecutive requests and the offending field jumps out. Fix it once and that block caches for the rest of the conversation.

Applies to: Anthropic API (Claude)OpenAI APIGemini API

Don't just read it — build the habit

Get a fresh tip every morning

You're reading a free Beginner tip. Pro unlocks all 106 advanced tactics and sends a new one daily. Try it free for 7 days — then $9/mo, cancel anytime.

Start your 7-day free trial More free tips

More in Prompt Caching & Reuse

♻️Prompt Caching & Reuse Cache reads run roughly 0.1x of base input price; the more users hit the same prefix, the closer your shared instructions get to free

Share One Cached System Prompt Across All Your Users

A single per-user byte (name, ID, locale) in the system prompt forks the cache into one entry per user. Strip personalization out of the prefix so every user reads the same cached block.

Beginner Read →

♻️Prompt Caching & Reuse Often 20-40% off input cost on tool-heavy agents, depending on tool-schema size and call volume

Cache Your Tool Definitions, Not Just the System Prompt

Tool schemas render before the system prompt, so a non-deterministic tool list silently blocks the cache for everything after it. Sort and freeze the tool array to make tools cacheable.

Intermediate Read →

♻️Prompt Caching & Reuse Stops re-sending the transcript each turn and lets the reused prefix hit implicit caching (cache-hit tokens billed at 10% of input price on Gemini 2.5+); compounds with conversation length, best-effort not guaranteed

Carry Gemini Conversation State Server-Side Instead of Resending the Transcript

A multi-turn Gemini agent that rebuilds its contents list re-sends the whole transcript at full input price every turn. The Interactions API holds history server-side; pass previous_interaction_id, send only the new turn, and let the reused prefix hit implicit caching.

Intermediate Read →