Skip to main content

Tokenization

Tokenization is the process of breaking text into small units (tokens) that an AI model can process. It is the first step before a language model understands a prompt.

In tokenization, text is split into common word building blocks according to a fixed vocabulary, so that even unknown words can be represented. Different models use different tokenizers, which is why the same text can yield different token counts. Tokenization directly affects cost, speed, and the usage of the context window. Understanding tokenization helps design prompts that are more efficient and cheaper.

Related terms

From term to practice

Save, version, and share your best prompts with Prompt2Love.

Get started free