Loss functions

For the results of a classifier, the output is an array of unnormalized scores $x_i$, $i=0\ldots K-1$, for each class. Assuming the target is $i=y$. The cross entropy loss for each sample is given by: \begin{equation} l_\text{CE}(\mathbf{x}) = -\ln\frac{e^{x_y}}{\sum_ie^{x_i}} = -x_y+\sum_ie^{x_i} \end{equation} while \begin{equation} L_\text{CE} = \langle l_\text{CE}\rangle \end{equation} is the average over a given dataset.

The prediction is inferred from the output $x_i$ with \begin{equation}r = \mathop{\text{argmax}}_ix_i.\end{equation} While, the prediction error over a dataset is given by \begin{equation}E_\text{pred} = \langle[y\neq\mathop{\text{argmax}}_ix_i]\rangle = \langle 1-\delta_{r,y}\rangle\end{equation} where $[\cdot]$ is the Iverson bracket and $\delta$ is the Kronecker delta function.