Language & LLMs

What Is the Softmax Function?

The softmax function converts a vector of raw scores, or logits, into a probability distribution where all values are positive and sum to one. In language models it is applied to the output logits to produce token probabilities. It is also used inside attention to weight how much each token attends to others.

Further reading

Read more about softmax function — articles and blogs from around the web: