Skip to main content

Inference

Inference is the process by which a trained AI model produces an output from an input, for example generating an answer to a prompt. It differs from training, where the model first learns.

During inference the model applies its already-learned knowledge without changing its internal parameters. It is the step that runs every time you interact with ChatGPT or Claude. Inference speed (latency) and the tokens consumed determine cost and user experience. Optimizations such as quantization and caching make inference faster and cheaper.

Related terms

From term to practice

Save, version, and share your best prompts with Prompt2Love.

Get started free