When Claude uses dynamic filtering to process web-search results in code, set response_inclusion: excluded so the raw search blocks it already consumed are dropped from the API response instead of echoed back as output tokens.
Drop Search-Result Blocks the Agent Already Digested in Code
Unlock this tip โ and 56 more
This is one of 57 advanced, fact-checked tactics reserved for Pro. Get the full 79-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.
Prefer to browse? The 22 Beginner tips are free forever.
More in Output Control
Return IDs and Enums, Not Sentences
For classification, routing, and selection tasks, have the model emit a short code, ID, or enum value instead of a polite sentence. The downstream code only needs the token, not the prose around it.
Strip the Preamble: Ask for the Answer Only
Chat models love to restate your question, add caveats, and offer follow-ups. On high-volume tasks those wrapper tokens dominate the bill. Tell the model to return only the payload.
Set max_tokens as a Hard Cost Ceiling, Not an Afterthought
Output tokens are the expensive half of most API bills. Setting an explicit max_tokens on every API call turns an open-ended cost into a known maximum.