60 ways to spend fewer tokens
The 22 Beginner tips are free to read. The 38 advanced tactics unlock with Pro — plus a fresh tip in your inbox every morning.
Cache the Context, Not Just the Answer
Cache the retrieved chunk set keyed by a normalized query, so popular or repeated questions skip the embedding call and vector search and reuse the same context block instead of rebuilding it every time.
Stop Pasting Whole Documents: Retrieve the 3 Chunks That Actually Answer the Question
Dumping a full PDF or knowledge base into every prompt bills you for thousands of tokens the model never needed. Retrieve only the passages relevant to the question instead.
Chunk on Structure, Not Character Count, So You Retrieve Fewer (and Smaller) Chunks
Naive fixed-length chunking splits ideas mid-sentence, forcing you to retrieve more chunks (and more overlap) to capture one answer. Chunk on semantic boundaries to send fewer tokens per query.
Filter by Metadata Before You Search
Attach structured metadata to chunks and apply WHERE-style filters before the vector search runs, so you embed and rank a smaller candidate set and stuff fewer off-topic chunks into the prompt.
Add a Reranker and a Hard Token Budget: Retrieve 20 Candidates, Send Only the Best 3
Vector similarity is approximate, so people inflate top-k to avoid missing the answer. A cheap reranking pass lets you fetch many candidates but send only the few that matter to the expensive LLM.
Embed and Summarize Once: Stop Re-Tokenizing the Same Documents on Every Query
Re-embedding unchanged documents and re-summarizing the same sources on every run quietly burns tokens. Compute these artifacts once, persist them, and reuse provider-side prompt caching for stable context.
Go Hybrid to Retrieve Less
Combine a keyword (BM25) score with the vector score so exact-term matches rank first, letting you lower top-k because the right chunk lands near the top instead of being padded around with semantic near-misses.
Like what you see?
Get a fresh one in your inbox — weekly free, daily on Pro.