Description
1. ( 20 pts.) Consider the multiclass logistic regression model.
(a) (5 pts.) Show that the derivatives of the softmax activation function are given by
∂yk
∂aj
= yk(δkj − yj ).
(b) (15 pts.) Derive the batch and single sample gradient descent weight update rules for
minimizing the cross-entropy error function.