AI Token Explained: What Tokens Are and Why They Matter

Quick summary

Tokens are word fragments, characters, or subwords that LLMs process
One English token is roughly four characters or three-quarters of a word
API pricing and context windows are usually token based
Efficient prompts reduce cost, latency, and context waste
Always verify current provider pricing before estimating production spend

Tokens are the units language models read and write. They are not exactly words. A token can be a whole word, part of a word, punctuation, whitespace, code syntax, or a special symbol. That matters because AI pricing, context windows, and output limits are usually measured in tokens.

This guide was verified on April 27, 2026. Model names, prices, and context windows change quickly, so treat provider documentation as the source of truth before making budget or architecture decisions.

What Is a Token?

When you send text to a model, the system converts it into tokens using a tokenizer. For typical English prose:

1 token is roughly 4 characters.
100 words is roughly 130 tokens.
1,000 words is roughly 1,300 tokens.
Code, tables, non-English text, and unusual symbols can tokenize very differently.

Example: “ChatGPT is useful” may be split into several pieces depending on the tokenizer. OpenAI, Anthropic, Google, xAI, and open models can count the same text differently.

Why Tokens Matter

Cost

Most APIs charge separately for input tokens and output tokens. Input is what you send. Output is what the model generates. Long prompts, pasted documents, repeated conversation history, and verbose outputs all increase cost.

The exact prices change by model, provider, cache behavior, and product surface. Check current pricing pages from OpenAI, Anthropic, Google, xAI, Mistral, or DeepSeek before estimating production costs.

Context Windows

The context window is the maximum amount of input and output a model can handle in one request. Current frontier products offer much larger windows than early GPT-3-era systems, including million-token products from several providers, but larger context is not automatically better. It can be slower, more expensive, and harder to verify.

Output Limits

The context window and output limit are related but not identical. A model may accept a very long input while still limiting the number of tokens it can generate in one answer.

Practical Token Budgeting

Use this simple formula:

total cost = (input tokens x input price) + (output tokens x output price)

For a production workflow, estimate:

Average prompt size.
Average retrieved context size.
Average output length.
Number of retries or tool calls.
Whether conversation history is resent each turn.
Cache hit rate, if the provider supports cached input pricing.

How to Reduce Token Waste

Remove repeated instructions.
Put stable instructions in a short system prompt.
Retrieve only relevant document sections.
Ask for concise outputs when you do not need long prose.
Summarize previous conversation turns instead of resending everything.
Use smaller models for simple classification and routing.
Use RAG for changing facts instead of stuffing every document into the prompt.

Common Mistakes

Treating Words and Tokens as the Same

A 2,000-word article is not 2,000 tokens. It is usually closer to 2,600 tokens in English, and more or less depending on formatting and language.

Forgetting Output Tokens

If you ask for a long report, the output can cost more than the input. Many providers price output tokens higher than input tokens.

Pasting Everything

Long-context models are useful, but pasting everything is rarely the best retrieval strategy. The model still has to find the relevant part, and you still pay for the context.

FAQ

How many tokens are in a book?

A typical 80,000-word novel is roughly 100,000 tokens. It can fit inside many current long-context products, but analysis can still be expensive and slow.

Do all models count tokens the same way?

No. Tokenizers differ. Use the provider’s tokenizer or token counting API for precise estimates.

Why is code sometimes expensive in tokens?

Code contains punctuation, indentation, short identifiers, symbols, and repeated syntax. It often tokenizes differently from prose.

Are token limits per conversation or per request?

In APIs, they are usually per request. Chat products may resend conversation history behind the scenes, so long chats can consume more context over time.

Verified Sources

OpenAI API pricing, accessed April 27, 2026: https://openai.com/api/pricing/
OpenAI, “Introducing GPT-5.5,” April 23, 2026: https://openai.com/index/introducing-gpt-5-5/
Anthropic pricing, accessed April 27, 2026: https://www.anthropic.com/pricing
Anthropic Claude Opus page, accessed April 27, 2026: https://www.anthropic.com/claude/opus
Google AI for Developers, “Gemini models,” accessed April 27, 2026: https://ai.google.dev/gemini-api/docs/models
xAI Docs, “Models and Pricing,” accessed April 27, 2026: https://docs.x.ai/developers/models