When you need structured data, the shape you ask for directly determines token count. Short keys, no markdown scaffolding, and minified output cut tokens on every response โ and the input echo if you loop.
Design a Compact Output Schema (and Skip the Pretty-Printing)
๐ Pro tip ยท Advanced
Unlock this tip โ and 37 more
This is one of 38 advanced, fact-checked tactics reserved for Pro. Get the full 60-tip library, a searchable archive, and a new tip every morning for $9/mo.
Prefer to browse? The 22 Beginner tips are free forever.
More in Output Control
๐Output Control
Shrinks classification and routing outputs substantially, frequently 5-15x fewer output tokens per call
Return IDs and Enums, Not Sentences
For classification, routing, and selection tasks, have the model emit a short code, ID, or enum value instead of a polite sentence. The downstream code only needs the token, not the prose around it.
๐Output Control
Often 30-60% fewer output tokens on short tasks
Strip the Preamble: Ask for the Answer Only
Chat models love to restate your question, add caveats, and offer follow-ups. On high-volume tasks those wrapper tokens dominate the bill. Tell the model to return only the payload.
๐Output Control
Caps runaway costs; output tokens are typically 3-5x the input price
Set max_tokens as a Hard Cost Ceiling, Not an Afterthought
Output tokens are the expensive half of most API bills. Setting an explicit max_tokens on every API call turns an open-ended cost into a known maximum.