Stack Prompt Caching on Top of Your Batch Jobs for Compounding Savings

Up to ~90% on the shared prefix, on top of the 50% batch discount Intermediate 1 min read

Batch requests support prompt caching. When every request shares a big instruction block or document, cache it once and the per-request cost collapses.

🔒 Pro tip · Intermediate

Unlock this tip — and 105 more

This is one of 106 advanced, fact-checked tactics reserved for Pro. Get the full 128-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.

Start your 7-day free trial Already Pro? Sign in

Prefer to browse? The 22 Beginner tips are free forever.

More in Batching & Automation

⚙️Batching & Automation ~50% on input + output tokens

Move Every Non-Urgent Job to the Batch API and Pay Half Price

If a job doesn't need an answer in the next few seconds, send it through the Batch API instead of the live endpoint. The exact same request costs half as much.

Beginner Read →

⚙️Batching & Automation Often 40-70% fewer input tokens on bulk classification, varies with prompt size

Classify a Whole List in One Call, Not One Row at a Time

Send 20-50 items as a numbered list and get back a JSON array of labels, instead of paying for the same instruction prompt on every single row.

Beginner Read →

⚙️Batching & Automation Varies; commonly 20-60% on duplicate-heavy workloads

Deduplicate and Cache Identical Requests Before They Ever Hit the API

Real-world batches are full of repeats. Hash each request, send each unique prompt once, and fan the answer back out to every duplicate.

Beginner Read →