#concept
softmax
function turns a vector of real values (model predictions) into… (visualize equation)
?
probabilities that sum to 1.
It is used as an activation function
softmax
in a neural network is often used as the last activation function of a neural network to normalize output of a network to a probability distribution over predicted output classes.
standard (unit) softmax
equation where
- sigma is the output. a vector of real numbers that is normalized between 0 and 1
- z is input vector
- K is vector length ?
intuitively…softmax is great as an activation function because it is non-linear, so… >> you are maximizing the difference in the dot product between vectors. and normalize across all dot products