The soft-max regression model can be used in the classes classification problem. The model consists of composition of probabilities distribution for each classes. So, the activation function is given by:

And

The error function is given by:

Notice that Kronecker delta , with value 1 when

, and zero otherwise. This is needed to enforce that the contribution of a specific activation function only to the appropriate training class.

And we want to know and in order to do so, we will compute for two cases:

From the results we may observe that the difference between the two is a single term that is present when . So we can write both in a single formula:

With this result at hand, we are ready to finish the computation of the gradient of the error.

This formula is ready to be used in a Gradient Descent algorithm capable of learning the (local) optimal parameters of the soft-max activation function.

### Like this:

Like Loading...

*Related*

## Published by Eliezer Silva (zehsilva)

Researcher and engineer. Interested in Machine Learning, probabilistic models, mathematical modeling and many other applications of computational thinking. Also sometimes like to get a bit into politics and philosophy.
View all posts by Eliezer Silva (zehsilva)