Model selection isn't just which model โ it's which reasoning mode. Turn thinking down or off for straightforward work and reserve deep reasoning for genuinely hard problems.
Don't Burn Reasoning Tokens on Tasks That Don't Reason
๐ Pro tip ยท Intermediate
Unlock this tip โ and 37 more
This is one of 38 advanced, fact-checked tactics reserved for Pro. Get the full 60-tip library, a searchable archive, and a new tip every morning for $9/mo.
Prefer to browse? The 22 Beginner tips are free forever.
More in Model Selection
๐๏ธModel Selection
60-80% on routed traffic
Stop Paying Frontier Prices for Boilerplate Work
Most of your token spend is on tasks a small model handles perfectly. Match the model to the job instead of defaulting to your most expensive option for everything.
๐๏ธModel Selection
40-70% when most queries are easy
Cascade: Try the Cheap Model First, Escalate Only When It Fails
Send every request to a small model first, programmatically check the answer, and only escalate to a frontier model when the cheap one falls short.
๐๏ธModel Selection
up to 50-70% on multi-step agent runs
Use a Big Model as the Planner, Small Models as the Workers
In agentic and multi-step pipelines, reserve the frontier model for orchestration and hard reasoning, and delegate bulk subtasks (search, read, extract) to a cheaper model.