Description
1. When we use empirical cost to approximate the expected cost,
Ex∼X [D(M∗(x),M,x)]≈
1 N
N ∑
n=1
D(M∗(xn),M,xn)
is it okay to weigh each per-example cost equally? Given that we established that noteverydataxisequallylikely,istakingthesumofallper-examplecostsanddividing by N reasonable? Should we weigh each per-example cost differently, depending on how likely eachxis? Justify your answer.
2. A perceptron is defined as follows: M(x) = sign(wx+b), where w∈Rd,x∈Rd and b∈R. Why is the bias b necessary? Provide an example where it is necessary.
3. We used the following distance function for perceptron in the lecture: D(M∗(x),M,x) =−(M∗(x)−M(x))wx+b. This distance function has a problem of a trivial solution. What is the trivial solution? Propose a solution to this.
1
4. The distance function of logistic regression was defined as D(y∗,w,x) =−(y∗logM(x)+(1−y∗)log(1−M(x))). Derive its gradient with respect to the weight vectorwstep-by-step.
5. (Programming Assignment) Complete the implementation of perceptron and logistic regression using Python and scikit-learn. The completed notebooks must be submitted together with the answers to the questions above. When submitting Jupyter notebooks, make sure to save printed outputs as well.