Abort the Stream the Moment You Have Enough

Cuts output tokens on long generations, often 20-50% when you only need the top of the response Intermediate 1 min read

When streaming, close the connection as soon as the part you care about arrives instead of letting the model run to its natural stop. You only pay for tokens actually generated before the abort.

🔒 Pro tip · Intermediate

Unlock this tip — and 105 more

This is one of 106 advanced, fact-checked tactics reserved for Pro. Get the full 128-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.

Start your 7-day free trial Already Pro? Sign in

Prefer to browse? The 22 Beginner tips are free forever.

More in Output Control

📐Output Control Shrinks classification and routing outputs substantially, frequently 5-15x fewer output tokens per call

Return IDs and Enums, Not Sentences

For classification, routing, and selection tasks, have the model emit a short code, ID, or enum value instead of a polite sentence. The downstream code only needs the token, not the prose around it.

Beginner Read →

📐Output Control Often 30-60% fewer output tokens on short tasks

Strip the Preamble: Ask for the Answer Only

Chat models love to restate your question, add caveats, and offer follow-ups. On high-volume tasks those wrapper tokens dominate the bill. Tell the model to return only the payload.

Beginner Read →

📐Output Control Caps runaway costs; output tokens are typically 3-5x the input price

Set max_tokens as a Hard Cost Ceiling, Not an Afterthought

Output tokens are the expensive half of most API bills. Setting an explicit max_tokens on every API call turns an open-ended cost into a known maximum.

Beginner Read →