Tips / 🧠 Context Management

Turn On OpenAI's Server-Side Compaction for Hour-Long Agent Runs

Caps context growth on long agent loops so each turn re-bills a compacted prefix instead of the full transcript; the win scales with run length and is near-zero on short chats Advanced 2 min read

A multi-hour OpenAI agent re-sends its entire swelling transcript on every turn. Set a compaction threshold and the Responses API summarizes old turns server-side into one encrypted item that carries state forward in far fewer tokens.

🔒 Pro tip · Advanced

Unlock this tip — and 108 more

This is one of 109 advanced, fact-checked tactics reserved for Pro. Get the full 131-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.

Start your 7-day free trial Already Pro? Sign in

Prefer to browse? The 22 Beginner tips are free forever.

More in Context Management

🧠Context Management Images can be 1,000-2,000+ tokens each; removing stale ones cuts that per turn

Drop the Screenshot Once the Model Has Read It

Images, PDFs, and attachments are charged as tokens and re-sent every turn in a multimodal thread. After the model has described or transcribed one, you usually don't need to keep sending the pixels.

Beginner Read →

🧠Context Management 50-90% on file-heavy prompts

Paste the Function, Not the Whole File

Most coding questions need 20-40 lines, not your 800-line file. Send the relevant slice plus a one-line note about the rest, and your input shrinks dramatically without hurting the answer.

Beginner Read →

🧠Context Management Trims re-sent history; often 20-60% fewer input tokens per turn after a topic switch

Start a New Chat When the Topic Changes

Chat apps re-send your whole conversation with every message. When you switch tasks, the old turns become dead weight you keep paying to re-transmit — even with caching discounts.

Beginner Read →