When you ask an AI to fix a bug, you rarely need the entire file in context. Code is token-dense: a loose rule of thumb is ~3-4 characters per token for prose, and code often runs denser than that because symbols, indentation, and long identifiers split into multiple tokens. So an 800-line file can easily be 6,000-10,000+ tokens. On the stateless chat/Messages API, the full conversation is re-sent every turn, so uncached context gets billed again on each request it stays in the thread.
Before (wasteful):
Here's my whole file, fix the date parsing bug:
(pastes all 800 lines oforders.py— ~9,000 tokens)
After (lean):
Bug:
parse_order_datethrows on ISO strings with a trailingZ. Here's the function; the rest of the file is unrelated CRUD handlers.
python def parse_order_date(s): return datetime.strptime(s, "%Y-%m-%dT%H:%M:%S") # ~40 tokens
Same answer, a fraction of the input. The model only needs the failing function plus enough surrounding contract (inputs, any helper it calls, the import that defines key types) to reason correctly. The other 760 lines don't improve the answer — they just inflate input cost.
Why it works:
- Less to bill. Fewer input tokens per request, and you avoid re-paying for dead weight on every follow-up turn.
- Better answers. Long irrelevant code dilutes attention and raises the odds the model edits the wrong section. Lean context is both cheaper and more accurate.
- It can ask. If the AI genuinely needs more, it will say so. Letting it ask beats pre-loading everything.
Practical tips:
- Include the function in question, its direct callers/callees if relevant, and the import that defines key types.
- Replace omitted regions with a marker like
# ... 40 lines of validation, unrelated ...so the model knows they exist. - If you must keep a large file in context across many turns (e.g. an agent loop), reach for prompt caching instead of re-pasting — cached prefixes are billed at roughly a tenth of the input rate, so repeated context stops being expensive. Trimming and caching are complementary, not either/or.
Start trimming and you'll typically cut 50-90% off file-attachment-heavy prompts with no loss in answer quality.