Add a single parameter to your OpenAI Responses or Chat Completions calls to pay Batch-API rates without restructuring anything into async batch jobs. You keep a normal synchronous request/response flow and give up only guaranteed speed.
Set service_tier flex for Batch Prices on the Sync Endpoint
๐ Pro tip ยท Intermediate
Unlock this tip โ and 50 more
This is one of 51 advanced, fact-checked tactics reserved for Pro. Get the full 73-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.
Prefer to browse? The 22 Beginner tips are free forever.
More in Model Selection
๐๏ธModel Selection
60-80% on routed traffic
Stop Paying Frontier Prices for Boilerplate Work
Most of your token spend is on tasks a small model handles perfectly. Match the model to the job instead of defaulting to your most expensive option for everything.
๐๏ธModel Selection
40-70% when most queries are easy
Cascade: Try the Cheap Model First, Escalate Only When It Fails
Send every request to a small model first, programmatically check the answer, and only escalate to a frontier model when the cheap one falls short.
๐๏ธModel Selection
30-60% of output tokens on simple tasks
Don't Burn Reasoning Tokens on Tasks That Don't Reason
Model selection isn't just which model โ it's which reasoning mode. Turn thinking down or off for straightforward work and reserve deep reasoning for genuinely hard problems.