A multi-hour OpenAI agent re-sends its entire swelling transcript on every turn. Set a compaction threshold and the Responses API summarizes old turns server-side into one encrypted item that carries state forward in far fewer tokens.
Turn On OpenAI's Server-Side Compaction for Hour-Long Agent Runs
Unlock this tip โ and 54 more
This is one of 55 advanced, fact-checked tactics reserved for Pro. Get the full 77-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.
Prefer to browse? The 22 Beginner tips are free forever.
More in Context Management
Drop the Screenshot Once the Model Has Read It
Images, PDFs, and attachments are charged as tokens and re-sent every turn in a multimodal thread. After the model has described or transcribed one, you usually don't need to keep sending the pixels.
Paste the Function, Not the Whole File
Most coding questions need 20-40 lines, not your 800-line file. Send the relevant slice plus a one-line note about the rest, and your input shrinks dramatically without hurting the answer.
Start a New Chat When the Topic Changes
Chat apps re-send your whole conversation with every message. When you switch tasks, the old turns become dead weight you keep paying to re-transmit โ even with caching discounts.