#concept

softmax function turns a vector of real values (model predictions) into… (visualize equation) ? probabilities that sum to 1.

It is used as an activation function softmax in a neural network is often used as the last activation function of a neural network to normalize output of a network to a probability distribution over predicted output classes.

standard (unit) softmax equation where

  • sigma is the output. a vector of real numbers that is normalized between 0 and 1
  • z is input vector
  • K is vector length ?

intuitively…softmax is great as an activation function because it is non-linear, so… >> you are maximizing the difference in the dot product between vectors. and normalize across all dot products

References

Notes