Don't Burn Reasoning Tokens on Tasks That Don't Reason

30-60% of output tokens on simple tasks Intermediate 2 min read

Model selection isn't just which model — it's which reasoning mode. Turn thinking down or off for straightforward work and reserve deep reasoning for genuinely hard problems.

🔒 Pro tip · Intermediate

Unlock this tip — and 105 more

This is one of 106 advanced, fact-checked tactics reserved for Pro. Get the full 128-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.

Start your 7-day free trial Already Pro? Sign in

Prefer to browse? The 22 Beginner tips are free forever.

More in Model Selection

🎚️Model Selection 60-80% on routed traffic

Stop Paying Frontier Prices for Boilerplate Work

Most of your token spend is on tasks a small model handles perfectly. Match the model to the job instead of defaulting to your most expensive option for everything.

Beginner Read →

🎚️Model Selection Often cheaper than escalating a tier on reasoning-limited hard tasks (qualitative — depends on the price gap and how many extra reasoning tokens the cheaper model spends)

Buy More Reasoning on the Cheap Model Before You Upgrade the Tier

When a cheap model stumbles on a hard task, the reflex is to jump to the frontier tier. Often the cheaper move is to keep the small model and turn its reasoning effort up — its per-token rate is so low it can brute-reason through the problem and still cost far less.

Intermediate Read →

🎚️Model Selection 40-70% when most queries are easy

Cascade: Try the Cheap Model First, Escalate Only When It Fails

Send every request to a small model first, programmatically check the answer, and only escalate to a frontier model when the cheap one falls short.

Intermediate Read →