With the max tokens parameter you set how many tokens the model may generate at most before stopping. When the limit is reached, the output ends, possibly mid-sentence. The value must fit the context window, since input and output together must not exceed its limit. A sensible cap saves cost and speeds up answers but should be large enough for complete results.
Max Tokens
Max tokens is a parameter that caps the maximum length of the AI output in tokens. It prevents overly long answers and controls cost and latency.
