Tips / 🎚️ Model Selection

Set service_tier flex for Batch Prices on the Sync Endpoint

~50% on input + output tokens for latency-tolerant workloads; you trade variable latency and occasional free-to-retry 429s Intermediate 2 min read

Add a single parameter to your OpenAI Responses or Chat Completions calls to pay Batch-API rates without restructuring anything into async batch jobs. You keep a normal synchronous request/response flow and give up only guaranteed speed.

🔒 Pro tip · Intermediate

Unlock this tip — and 108 more

This is one of 109 advanced, fact-checked tactics reserved for Pro. Get the full 131-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.

Start your 7-day free trial Already Pro? Sign in

Prefer to browse? The 22 Beginner tips are free forever.

More in Model Selection

🎚️Model Selection 60-80% on routed traffic

Stop Paying Frontier Prices for Boilerplate Work

Most of your token spend is on tasks a small model handles perfectly. Match the model to the job instead of defaulting to your most expensive option for everything.

Beginner Read →

🎚️Model Selection Often cheaper than escalating a tier on reasoning-limited hard tasks (qualitative — depends on the price gap and how many extra reasoning tokens the cheaper model spends)

Buy More Reasoning on the Cheap Model Before You Upgrade the Tier

When a cheap model stumbles on a hard task, the reflex is to jump to the frontier tier. Often the cheaper move is to keep the small model and turn its reasoning effort up — its per-token rate is so low it can brute-reason through the problem and still cost far less.

Intermediate Read →

🎚️Model Selection 40-70% when most queries are easy

Cascade: Try the Cheap Model First, Escalate Only When It Fails

Send every request to a small model first, programmatically check the answer, and only escalate to a frontier model when the cheap one falls short.

Intermediate Read →