The soft-max regression model can be used in the classes classification problem. The model consists of composition of probabilities distribution for each classes. So, the activation function is given by:
The error function is given by:
Notice that Kronecker delta , with value 1 when
, and zero otherwise. This is needed to enforce that the contribution of a specific activation function only to the appropriate training class.
And we want to know and in order to do so, we will compute for two cases:
From the results we may observe that the difference between the two is a single term that is present when . So we can write both in a single formula:
With this result at hand, we are ready to finish the computation of the gradient of the error.
This formula is ready to be used in a Gradient Descent algorithm capable of learning the (local) optimal parameters of the soft-max activation function.