Tips / ♻️ Prompt Caching & Reuse

Add a Semantic Cache to Skip the Call on Near-Duplicate Queries

~15-40% on total LLM spend at a typical ~40% hit rate; net of embedding cost and varies sharply by query repetitiveness Advanced 2 min read

Prompt caching only discounts the repeated prefix of an exact request — you still pay for the call. A semantic cache embeds each incoming query and returns a stored answer when a past query is close enough, eliminating the LLM call entirely for paraphrased duplicates.

🔒 Pro tip · Advanced

Unlock this tip — and 108 more

This is one of 109 advanced, fact-checked tactics reserved for Pro. Get the full 131-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.

Start your 7-day free trial Already Pro? Sign in

Prefer to browse? The 22 Beginner tips are free forever.

More in Prompt Caching & Reuse

♻️Prompt Caching & Reuse up to 90% on the cached portion

Freeze the Prefix: One Stray Timestamp Kills Your Whole Cache

Prompt caching is a prefix match. A single dynamic byte near the top of your prompt silently invalidates everything after it, so you pay full price every call without realizing it.

Beginner Read →

♻️Prompt Caching & Reuse Cache reads run roughly 0.1x of base input price; the more users hit the same prefix, the closer your shared instructions get to free

Share One Cached System Prompt Across All Your Users

A single per-user byte (name, ID, locale) in the system prompt forks the cache into one entry per user. Strip personalization out of the prefix so every user reads the same cached block.

Beginner Read →

♻️Prompt Caching & Reuse Often 20-40% off input cost on tool-heavy agents, depending on tool-schema size and call volume

Cache Your Tool Definitions, Not Just the System Prompt

Tool schemas render before the system prompt, so a non-deterministic tool list silently blocks the cache for everything after it. Sort and freeze the tool array to make tools cacheable.

Intermediate Read →