Factor Your System Prompt and Cap the Output

up to 50% on high-volume pipelines Advanced 2 min read

Move stable rules into a reusable, cacheable system prompt once, and constrain the response so the model can't ramble — output tokens usually cost more per token than input.

🔒 Pro tip · Advanced

Unlock this tip — and 105 more

This is one of 106 advanced, fact-checked tactics reserved for Pro. Get the full 128-tip library, a searchable archive, and a new tip every morning. Free for 7 days, then $9/mo.

Start your 7-day free trial Already Pro? Sign in

Prefer to browse? The 22 Beginner tips are free forever.

More in Prompt Engineering

✍️Prompt Engineering 10-25% on short prompts

Stop Paying for 'Please' and 'I Was Wondering If'

Conversational filler and apologetic framing get tokenized and billed like any other text. Strip the social padding and lead with the instruction.

Beginner Read →

✍️Prompt Engineering Roughly 20-40% fewer follow-up turns on formatting-sensitive tasks, in our experience

Fence the Output Before It Wanders

State the constraints that usually trigger a do-over up front, so you don't pay for a second generation just to strip the preamble.

Beginner Read →

✍️Prompt Engineering On large documents, output tokens on edit turns can drop substantially (often the bulk of an edit's cost is the re-emitted unchanged text)

Ask for the Diff, Not the Director's Cut

When revising a long artifact, request only the changed lines as a patch instead of having the model reprint the whole thing.

Intermediate Read →