Batch requests support prompt caching. When every request shares a big instruction block or document, cache it once and the per-request cost collapses.
Stack Prompt Caching on Top of Your Batch Jobs for Compounding Savings
🔒 Pro tip · Intermediate
Unlock this tip — and 37 more
This is one of 38 advanced, fact-checked tactics reserved for Pro. Get the full 60-tip library, a searchable archive, and a new tip every morning for $9/mo.
Prefer to browse? The 22 Beginner tips are free forever.
More in Batching & Automation
⚙️Batching & Automation
~50% on input + output tokens
Move Every Non-Urgent Job to the Batch API and Pay Half Price
If a job doesn't need an answer in the next few seconds, send it through the Batch API instead of the live endpoint. The exact same request costs half as much.
⚙️Batching & Automation
Often 40-70% fewer input tokens on bulk classification, varies with prompt size
Classify a Whole List in One Call, Not One Row at a Time
Send 20-50 items as a numbered list and get back a JSON array of labels, instead of paying for the same instruction prompt on every single row.
⚙️Batching & Automation
Varies; commonly 20-60% on duplicate-heavy workloads
Deduplicate and Cache Identical Requests Before They Ever Hit the API
Real-world batches are full of repeats. Hash each request, send each unique prompt once, and fan the answer back out to every duplicate.