Models don't need warm-ups. Every word of pleasantry, hedging, and meta-commentary is tokenized and counted toward your input cost, and on short prompts that padding can be a large fraction of the total.
Before (47 tokens-ish):
Hi there! I hope you're doing well. I was wondering if you could possibly help me out with something. If it's not too much trouble, I'd really love it if you could maybe summarize the following article for me. Thank you so much in advance!
After (~9 tokens):
Summarize the article below in 3 bullet points.
Both get the same result; the second costs roughly a fifth of the input. Across hundreds of daily calls that compounds fast.
Why it saves tokens: Tokenizers like cl100k/o200k split text into sub-word units of roughly 3-4 characters of English. "I was wondering if you could possibly" is pure overhead that adds tokens without adding instruction signal. Removing it shrinks the input directly, and since you pay per input token on every call, the savings recur every single time.
A few honest caveats:
- The savings are largest on short prompts. If you're pasting a 4,000-token document, trimming 30 tokens of greeting is negligible.
- Politeness has no measurable, reliable effect on quality for instruction-following tasks, so dropping it costs you nothing in output.
- Keep enough words to stay unambiguous. "Summarize" alone is leaner but "Summarize in 3 bullets" prevents a re-prompt, and a re-prompt costs far more than the words you'd save.
Rule of thumb: write prompts the way you'd write a function call, not an email. Lead with the verb, state the constraint, point at the input. Save the courtesy for humans.