Move stable rules into a reusable, cacheable system prompt once, and constrain the response so the model can't ramble — output tokens usually cost more per token than input.
Factor Your System Prompt and Cap the Output
🔒 Pro tip · Advanced
Unlock this tip — and 37 more
This is one of 38 advanced, fact-checked tactics reserved for Pro. Get the full 60-tip library, a searchable archive, and a new tip every morning for $9/mo.
Prefer to browse? The 22 Beginner tips are free forever.
More in Prompt Engineering
✍️Prompt Engineering
10-25% on short prompts
Stop Paying for 'Please' and 'I Was Wondering If'
Conversational filler and apologetic framing get tokenized and billed like any other text. Strip the social padding and lead with the instruction.
✍️Prompt Engineering
Roughly 20-40% fewer follow-up turns on formatting-sensitive tasks, in our experience
Fence the Output Before It Wanders
State the constraints that usually trigger a do-over up front, so you don't pay for a second generation just to strip the preamble.
✍️Prompt Engineering
On large documents, output tokens on edit turns can drop substantially (often the bulk of an edit's cost is the re-emitted unchanged text)
Ask for the Diff, Not the Director's Cut
When revising a long artifact, request only the changed lines as a patch instead of having the model reprint the whole thing.