Batch inputs are rarely unique. Support tickets repeat boilerplate, product feeds re-list the same SKU, scraped pages share templates. Every duplicate you send is a token bill you didn't need to pay — the model recomputes an answer you already have.
Before (wasteful):
results = []
for text in items: # 10,000 items, but only 6,200 are distinct
results.append(classify(text)) # pays for all 10,000
After (lean):
import hashlib
def key(t): return hashlib.sha256(t.encode()).hexdigest()
unique = {key(t): t for t in items} # collapse to 6,200 distinct
answers = {k: classify(v) for k, v in unique.items()} # pay once each
results = [answers[key(t)] for t in items] # fan back out, free
For work that recurs across runs (a nightly job over a slowly-changing catalog), persist answers to disk or Redis keyed by the hash, and skip anything you've already resolved. A normalization step before hashing — lowercasing, trimming whitespace, stripping volatile fields like timestamps — turns near-duplicates into exact ones and raises your hit rate.
Why it saves tokens: the API is stateless and prices every request independently; it has no idea two prompts are identical. A hash lookup is effectively free and short-circuits the entire round trip — no input or output tokens billed for the duplicate. Savings scale directly with your duplicate rate: a feed that's 40% repeats costs 40% less.
Note this is application-level dedup and is distinct from prompt caching (which discounts a shared prefix across differing requests). The two compose: dedup eliminates identical calls; caching discounts the common context in the calls that remain. Always dedup first — it's cheaper to not send a request than to send a discounted one.