Top-p is an alternative or complement to temperature for steering output diversity. Instead of considering all possible tokens, the model restricts itself to the smallest set whose probabilities together reach the threshold p. Low top-p values make the output more focused and predictable, high values more varied. In practice you usually vary either temperature or top-p, not both strongly at once.
Top-p (Nucleus Sampling)
Top-p (nucleus sampling) is a parameter that sets which fraction of the most probable words the model chooses from. At top-p 0.9 it only considers the words that together make up 90 percent of the probability mass.
