Description
1. Cross-validationisausefulstrategyformodelselection,especiallywhenthetraining data is small. However, it cannot be used for early-stopping (in other words, you cannot pick the best fold). Why is this the case?
2. In multiclass classification, given the definitions in the lecture notes, derive the following distance function. defined as D(y∗,M,x) =−logpM∗(x) =−ay∗ +log K ∑ k=1 exp(ak),
3. Given the definition of the distance function above, derive a learning rule step-bystep for each column vector wc of the weight matrix W (Equation 1.28 in the lecture notes).
4. Multiclass Classification on MNIST