The library

128 ways to spend fewer tokens

The 22 Beginner tips are free to read. The 106 advanced tactics unlock with Pro — plus a fresh tip in your inbox every morning.

All ⚙️ Batching & Automation (14) 💻 Coding Assistants (16) 🧠 Context Management (34) 📊 Measurement & Budgeting (14) 🎚️ Model Selection (8) 📐 Output Control (10) ♻️ Prompt Caching & Reuse (17) ✍️ Prompt Engineering (7) 🔎 Retrieval & RAG (8)

📊Measurement & Budgeting 10-30% on prompts you would have sent bloated

Count Tokens Before You Hit Send, Not After the Bill Arrives

Measure a prompt's token count before sending it, so you catch oversized context while trimming it is still free.

Beginner 1 min Read →

📊Measurement & Budgeting Prevents runaway-loop and leaked-key blowups; bounds worst-case spend rather than reducing normal usage

Set Hard Spend Caps in the Provider Console

Configure provider-side usage limits, budgets, and alerts so a bug, a retry storm, or a leaked key cannot quietly run your bill far past a ceiling you set in advance.

Beginner 3 min Read →

📊Measurement & Budgeting 🔒 Pro

Cap the loops by count, not tokens: spawn and search limits in Claude Code

Claude Code 2.1.212 added session caps on subagent spawns and WebSearch calls, both defaulting to a non-binding 200. Lower them to match your actual workload so a mis-planned agent hits a hard stop at 12 searches instead of quietly burning a full context window and the spend behind it.

Intermediate 3 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Log input_tokens and output_tokens on Every Call to Find Your Real Waste

Persist the usage object from every API response with a feature tag, so you can see exactly which feature and which token type is draining your budget.

Intermediate 1 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Give Each Feature a Token Budget and Enforce It with max_tokens

Set an explicit per-feature output ceiling instead of leaving max_tokens at a huge default, and reserve big budgets only for features that truly need them.

Intermediate 1 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Keep agent threads under the long-context price cliff

GPT-5.6 switches to a higher long-context rate once a single request crosses ~272k tokens. Agent harnesses that quietly raised their window push routine threads past that line and double the per-token bill. Cap each thread below the cliff — compact or start fresh before you cross — and you keep paying the base rate.

Intermediate 3 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Tag Every Call So You Know Which Feature Burns Tokens

Attach feature, user, and environment labels to every API call so your bill breaks down by what actually drives cost instead of one undifferentiated total.

Intermediate 1 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Measure Cost-Per-Success, Not Cost-Per-Call

Compare two prompts or models on cost divided by successful outcomes, including retries and rework, so you stop chasing cheap calls that quietly fail and get redone.

Advanced 2 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Wire Up Spend Alerts and a Token Circuit Breaker Before You Need Them

Combine provider budget alerts with an in-app token meter that hard-stops a feature once it blows past its expected per-period budget.

Advanced 2 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Give Your Agent Fleet a Real Dollar Ceiling: --max-budget-usd Now Halts Background Subagents

Before Claude Code 2.1.216, hitting --max-budget-usd denied new subagent spawns but let already-running background subagents keep burning past the cap. 2.1.216 halts them at the ceiling and adds CLAUDE_CODE_MAX_CONCURRENT_SUBAGENTS (default 20) to bound how many contexts draw tokens in parallel. Upgrade and set both to give a long fan-out a true spend ceiling plus a parallelism throttle.

Advanced 3 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Score Agent Runs in Effective Tokens, So Your "20% Cut" Is a Real 20%

Raw token counts treat a cheap cache-read token and an expensive flagship output token as equal — across token classes and tiers they can differ by 50x or more in price. Roll every run up into one price-weighted Effective Token number so a reported X% reduction maps to a real X% spend cut.

Advanced 4 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Optimize the Bill, Not the Token Count — Caching Can Make Compression Backfire

With prompt caching on, a stable prefix bills at a fraction of base input, so the tokens you change cost far more than the tokens you keep. Trimming context can raise the actual bill and tank task success — measure success-adjusted billed cost instead of raw token count.

Advanced 3 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Cap Each Agent's Spend and Degrade the Model at the Soft Limit

Autonomous agents fan out into tool calls, sub-agents, and retries, so after-the-fact dashboards never actually bound spend. Put a pre-call gate in front of every request that enforces a per-agent hard ceiling and quietly drops to a cheaper model at a soft threshold instead of failing hard.

Advanced 3 min Unlock →

📊Measurement & Budgeting 🔒 Pro

Run Every Agent Thread Off One Shared Token Ledger That Aborts at the Budget Line

Per-agent caps still let a multi-agent run overspend in aggregate: ten threads each "under budget" blow past the total, and a sub-agent or compaction call drawing context after the work is done is invisible to any single counter. Settle every thread, sub-agent, and compaction call against ONE shared ledger, abort each turn at its next usage-accounting boundary when the ledger hits zero, and feed the remaining balance back into the model's context so it steers its own spend.

Advanced 4 min Unlock →

Like what you see?

Get a fresh one in your inbox — weekly free, daily on Pro.

Subscribe free Go Pro — $9/mo