Deduplicate and Cache Identical Requests Before They Ever Hit the API

Batch inputs are rarely unique. Support tickets repeat boilerplate, product feeds re-list the same SKU, scraped pages share templates. Every duplicate you send is a token bill you didn't need to pay — the model recomputes an answer you already have.

Before (wasteful):

results = []
for text in items:  # 10,000 items, but only 6,200 are distinct
    results.append(classify(text))  # pays for all 10,000

After (lean):

import hashlib

def key(t): return hashlib.sha256(t.encode()).hexdigest()

unique = {key(t): t for t in items}          # collapse to 6,200 distinct
answers = {k: classify(v) for k, v in unique.items()}  # pay once each
results = [answers[key(t)] for t in items]    # fan back out, free

For work that recurs across runs (a nightly job over a slowly-changing catalog), persist answers to disk or Redis keyed by the hash, and skip anything you've already resolved. A normalization step before hashing — lowercasing, trimming whitespace, stripping volatile fields like timestamps — turns near-duplicates into exact ones and raises your hit rate.

Why it saves tokens: the API is stateless and prices every request independently; it has no idea two prompts are identical. A hash lookup is effectively free and short-circuits the entire round trip — no input or output tokens billed for the duplicate. Savings scale directly with your duplicate rate: a feed that's 40% repeats costs 40% less.

Note this is application-level dedup and is distinct from prompt caching (which discounts a shared prefix across differing requests). The two compose: dedup eliminates identical calls; caching discounts the common context in the calls that remain. Always dedup first — it's cheaper to not send a request than to send a discounted one.

Deduplicate and Cache Identical Requests Before They Ever Hit the API

Get a fresh tip every morning

More in Batching & Automation

Move Every Non-Urgent Job to the Batch API and Pay Half Price

Classify a Whole List in One Call, Not One Row at a Time

Fetch the Data the Agent Always Needs Before the Loop Starts