When streaming, close the connection as soon as the part you care about arrives instead of letting the model run to its natural stop. You only pay for tokens actually generated before the abort.
Abort the Stream the Moment You Have Enough
๐ Pro tip ยท Intermediate
Unlock this tip โ and 37 more
This is one of 38 advanced, fact-checked tactics reserved for Pro. Get the full 60-tip library, a searchable archive, and a new tip every morning for $9/mo.
Prefer to browse? The 22 Beginner tips are free forever.
More in Output Control
๐Output Control
Shrinks classification and routing outputs substantially, frequently 5-15x fewer output tokens per call
Return IDs and Enums, Not Sentences
For classification, routing, and selection tasks, have the model emit a short code, ID, or enum value instead of a polite sentence. The downstream code only needs the token, not the prose around it.
๐Output Control
Often 30-60% fewer output tokens on short tasks
Strip the Preamble: Ask for the Answer Only
Chat models love to restate your question, add caveats, and offer follow-ups. On high-volume tasks those wrapper tokens dominate the bill. Tell the model to return only the payload.
๐Output Control
Caps runaway costs; output tokens are typically 3-5x the input price
Set max_tokens as a Hard Cost Ceiling, Not an Afterthought
Output tokens are the expensive half of most API bills. Setting an explicit max_tokens on every API call turns an open-ended cost into a known maximum.