Description
1. Equivalence of negative log probability and logistic loss (10 points) After replacing the label set from{0,1}to{−1,1}, we introduced the log loss Dlog(y,x;M) = 1 log2 log(1+exp(−s(y,x;M))), asanalternativetothelogisticregressiondistancefunctionabove. Showthatthesetwo are equivalent up to a constant multiplication for logistic regression.
2. Hinge loss gradients (10 points) Unlike the log loss, the hinge loss, defined below, is not differentiable everywhere: Dhinge(y,x;M) = max(0,1−s(y,x;M)). Does it mean that we cannot use a gradient-based optimization algorithm for finding a solution that minimizes the hinge loss? If not, what can we do about it?
3. Model Selection (10 points) Consider that we are learning a logistic regression M(1) and a perceptron M(2), and we have three dataset partitions: a training set Dtrain, a validation set Dval, and a test set Dtest. ThetwomodelsareiterativelyoptimizedonDtrain overT steps,andnowwehaveT logisticregressionparameterconfigurations(i.e. weightsandbiases)M(1) 1 ,M(1) 2 ,…,M(1) T and T perceptron configurations M(2) 1 ,M(2) 2 ,…,M(2) T , all with different parameters. We now evaluate the expected cost for all the 2T models on training set, validation set, and test set. So we have 6T quantities ˜ R(i) train,t, ˜ R(i) val,t, ˜ R(i) test,t where i = 1,2 and t = 1,…,T. (a) Which i andt should we pick as the best model? (5 points) (b) How should we report the generalization error? (5 points)
1
4. Image Recovery & Numerical Stability (20 points)