The library

128 ways to spend fewer tokens

The 22 Beginner tips are free to read. The 106 advanced tactics unlock with Pro — plus a fresh tip in your inbox every morning.

All ⚙️ Batching & Automation (14) 💻 Coding Assistants (16) 🧠 Context Management (34) 📊 Measurement & Budgeting (14) 🎚️ Model Selection (8) 📐 Output Control (10) ♻️ Prompt Caching & Reuse (17) ✍️ Prompt Engineering (7) 🔎 Retrieval & RAG (8)

♻️Prompt Caching & Reuse up to 90% on the cached portion

Freeze the Prefix: One Stray Timestamp Kills Your Whole Cache

Prompt caching is a prefix match. A single dynamic byte near the top of your prompt silently invalidates everything after it, so you pay full price every call without realizing it.

Beginner 1 min Read →

♻️Prompt Caching & Reuse Cache reads run roughly 0.1x of base input price; the more users hit the same prefix, the closer your shared instructions get to free

Share One Cached System Prompt Across All Your Users

A single per-user byte (name, ID, locale) in the system prompt forks the cache into one entry per user. Strip personalization out of the prefix so every user reads the same cached block.

Beginner 1 min Read →

♻️Prompt Caching & Reuse 🔒 Pro

Cache Your Tool Definitions, Not Just the System Prompt

Tool schemas render before the system prompt, so a non-deterministic tool list silently blocks the cache for everything after it. Sort and freeze the tool array to make tools cacheable.

Intermediate 1 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Carry Gemini Conversation State Server-Side Instead of Resending the Transcript

A multi-turn Gemini agent that rebuilds its contents list re-sends the whole transcript at full input price every turn. The Interactions API holds history server-side; pass previous_interaction_id, send only the new turn, and let the reused prefix hit implicit caching.

Intermediate 2 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Know Your Minimum: Short Prompts Silently Refuse to Cache

Below a model-specific token floor, a cache_control marker does nothing — no error, just a full-price bill. Know the floor before you rely on caching.

Intermediate 1 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Order Your Prompt by Volatility: Tools, then System, then the Question

The model renders tools, then system, then messages. Put your most stable content first and your most volatile content last, or your breakpoints cache nothing reusable.

Intermediate 1 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Compact When the Cache Is Already Cold, Not When the Transcript Gets Big

Most harnesses compact when the transcript crosses a token count, which destroys a warm prefix you already paid to write. Compact instead at the moment a message arrives after the cache TTL has expired — you have to rewrite the whole prefix at write rates anyway, so the forced write simply gets smaller.

Advanced 5 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Content-Hash Your Wire Blocks to Find the Exact One That Breaks the Cache

When your cache silently stops hitting, the usage block proves it broke but not WHERE. Hash every block you put on the wire and diff turn-to-turn: the single MODIFIED block above your breakpoint is the byte that cost you the cache.

Advanced 3 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

On GPT-5.6, Caching Is No Longer Free: Place the Breakpoint Yourself and Name the Key

GPT-5.6 charges 1.25x to write a cache entry — the first generation where caching a prefix can cost more than not caching it. Explicit breakpoints keep per-request text out of the write, and prompt_cache_key keeps matching reliable.

Advanced 4 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Steer a running agent with a mid-conversation system message, not a rewritten system prompt

On Claude Opus 4.8 you can append a role:"system" message to the messages array to change instructions, permissions, or budgets mid-run. It sits after the cached history, so the whole prefix still hits the cache instead of being re-billed at full input price.

Advanced 3 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Rewrite Volatile Prefix Fields to Placeholders So Cold Starts Hit the Cache

A fresh timestamp, request ID, or path at the top of your prefix is never byte-identical twice, so every new session misses the cache. Rewrite those volatile fields to stable placeholders and restore them out of band, turning cold pre-fills into cache hits.

Advanced 3 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Match TTL to Traffic, and Pre-Warm Before the First User Hits

The default 5-minute cache evaporates between bursts. Choose 5-minute vs 1-hour TTL by your traffic gaps, and pre-warm at startup to kill first-request latency.

Advanced 1 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Read the Usage Block to Prove Your Cache Actually Hits

A cache_control marker that silently never hits looks identical to one that works — until you read the three usage token fields and compute your real hit rate.

Advanced 1 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Stop Paying Twice for Your Cache When a Refusal Forces a Model Switch

Prompt caches are per-model, so retrying a refused request on a fallback model normally re-writes the whole cached prefix at the write rate. Anthropic's fallback credit re-bills it as a cache read instead.

Advanced 2 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Reanchor the Cache in Long Agent Turns to Beat the 20-Block Lookback

In long agentic runs, accumulating tool blocks push your cache breakpoint out of Anthropic's 20-block lookback window, so caching silently stops firing. Add a second, trailing breakpoint to keep it alive.

Advanced 2 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Add a Semantic Cache to Skip the Call on Near-Duplicate Queries

Prompt caching only discounts the repeated prefix of an exact request — you still pay for the call. A semantic cache embeds each incoming query and returns a stored answer when a past query is close enough, eliminating the LLM call entirely for paraphrased duplicates.

Advanced 2 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Swap an Agent's Toolset Mid-Conversation Without Busting the Cache

A new Anthropic beta lets an agent add or remove tools between turns without invalidating the cached prefix, so phase-switching agents stop paying a full cold read every time the toolset changes.

Advanced 2 min Unlock →

Like what you see?

Get a fresh one in your inbox — weekly free, daily on Pro.

Subscribe free Go Pro — $9/mo