A Guide to Controlling LLM Model Output: Exploring Top-k, Top-p, and Temperature Parameters

Vibudh Singh
4 min readSep 22, 2023

--

You might have used ChatGPT or any other major LLM for building a system, doing classification task, answer questions, or using it as an assist in various creative and informative tasks. However, controlling the output of these models to meet specific requirements or match a desired style is crucial. In this article, we will focus on three essential parameters that influence the output of a Language Model: top-k, top-p, and temperature.

Before we dive into understanding these parameters we need to understand the difference between Greedy sampling and Random Sampling. Greedy sampling prioritizes the highest probability token, ensuring a focused output, while random sampling (using top-k or top-p) adds an element of randomness, resulting in a more varied and creative output. Most LLMs these days (such as GPT, Llama-2, Claude, etc.) use Greedy sampling and hence we need top-p and top-k parameter to control this randomness.

Controlling the Randomness: Top-k and Top-P

1. Top-k

The top-k parameter limits the model’s predictions to the top k most probable tokens at each step of generation. By setting a value for k, you are instructing the model to consider only the k most likely tokens. This can help in fine-tuning the generated output and ensuring it adheres to specific patterns or constraints.

2. Top-p (Nucleus Sampling)

Top-p, also known as nucleus sampling, controls the cumulative probability of the generated tokens. The model generates tokens until the cumulative probability exceeds the chosen threshold (p). This approach allows for more dynamic control over the length of the generated text and encourages diversity in the output by including less probable tokens when necessary.

Note: Top-k provides a controlled randomness by considering a fixed number of top probable tokens, while top-p allows for dynamic control of the number of tokens considered, leading to different levels of diversity in the generated text.

Exploring Top-k, Top-p, and Temperature Parameters (Picture of Three sisters, Canmore, Alberta by Dean McLeod Photography)

Controlling model randomness

Let’s consider a simplified vocabulary of only 4words with associated probabilities:
Position 1 — blue Probability: 0.3
Position 2 — limit Probability: 0.4
Position 3 — clear Probability: 0.2
Position 4 — overcast Probability: 0.1

Task: Complete the sentence with 1 word- “The sky is….”

Note that generative configuration for LLMs in our scenario is random sampling and not greedy. Hence, there was 40% chance to select the word limit, but the model chose the word blue in the 1st position because of random(-weighted) sampling.

We’ll explore how adjusting the top-k and top-p (nucleus sampling) parameters will affect the model’s response in this constrained vocabulary.

Top-k Parameter:

Setting: Top-k = 2 (considering only the top 2 probable words at each step)
With a top-k parameter set to 2, the model will only consider the top 2 probable words at each step during text generation.
So, the Generated Output will consist of the limit or blue.

Top-p (Nucleus Sampling) Parameter:

Setting: Top-p (nucleus sampling) = 0.2 (considering tokens until cumulative probability reaches 0.2)
So, in this case the Generated Output will only be blue for all cases.

3. Temperature

The temperature parameter is also used to control the randomness of the output. However, it influences the shape of the probability distribution that the model calculates for the next token rather than limiting the token selection. The temperature value is a scaling factor that’s applied within the final softmax layer of the model that impacts the shape of the probability distribution of the next token. A higher temperature (~1) results in more randomness and diversity in the generated text, as the model is more likely to explore a wider range of possible tokens. Conversely, a lower temperature (<1) produces more focused and deterministic output, emphasizing the most probable tokens.

Summary

  • For creative tasks, higher values of top-p and temperature encourage diversity, aiding creativity. A moderate top-k value can balance creativity and coherence.
  • For deterministic output, low values of top-k (k = 1 or 2), a very low value of top-p (close to 0), and a low temperature close to 0 ensure the most probable and deterministic responses.

Thankyou!

Hope you enjoyed reading. Consider following. My Medium blog features concise yet insightful articles exploring the latest topics on Artificial Intelligence (AI), Large Language Models (LLM), Generative AI, and Natural Language Processing (NLP). Stay updated by subscribing for a regular dose of cutting-edge knowledge.

References

--

--

Vibudh Singh
Vibudh Singh

Written by Vibudh Singh

Lead Machine Learning Engineer at S&P Global

Responses (6)