ECE 421 Homework Problems – Tutorial #3 Theme: Gradients and Logistic Regression solution

$24.99

Category:

Description

5/5 - (5 votes)

Question 1 (Gradient Computation)
For a scalar-valued function f : R
d → R, the gradient evaluated at w ∈ R
d
is
∇f(w) = 
∂f(w)
∂w1
· · ·
∂f(w)
∂wd
>
∈ R
d
.
Using this definition, compute the gradients of following functions, where A ∈ R
d×d
is not necessarily a symmetric matrix.
(i) f(w) = w
>Av + w
>A
>v + v
>Aw + v
>A
>w, v ∈ R
d
(ii) f(w) = w
>Aw
Compute the gradients of following functions using above definition and the chain rule.
(iii) f(w) = Pd
i=1 log(1 + exp(wi))
(iv) f(w) = p
1 + kwk
2
2
1
Homework Problems – Tutorial #3 Due: February 6, 2022 11:59 PM
Question 2 (Logistic Regression)
You are given a dataset D = {(xn, yn)}
N
n=1, where xn ∈ R
d
, d ≥ 1, and yn ∈ {+1, −1}. For
w ∈ R
d+1 and x ∈ R
d+1, we wish to train a logistic regression model
h(x) = θ(b +
Pd
i=1 wixi) = θ(w
>x), (1)
where θ(z) = e
z
1 + e
z
, z ∈ R is the logistic function. Following the arguments on page 91 of LFD,
the in-sample error can be written as
Ein(w) = 1
N
PN
n=1 log 
1
Pw(yn|xn)

, (2)
where
Pw(y|x) = (
h(x) y = +1
1 − h(x) y = −1
. (3)
(a) Show that Ein(w) can be expressed as
Ein(w) = 1
N
PN
n=1Jyn = +1K log 
1
h(xn)

+ Jyn = −1K log 
1
1 − h(xn)
 , (4)
where JargumentK evaluates to 1 if the argument is true and 0 if it is false.
(b) Show that Ein(w) can also be expressed as
Ein(w) = 1
N
PN
n=1 log(1 + exp(−ynw
>xn)). (5)
(c) Use (5) to show that ∇Ein(w) = 1
N
PN
n=1 −ynxnθ(−ynw
>xn), and argue that a “misclassified” example contributes more to the gradient than a correctly classified one.
(d) Show that ∇Ein(w) can be expressed as
∇Ein(w) = 1
N
X>p, (6)
for some expression p, where X is the data matrix you are familiar with from linear regression.
What is p and how does it compare with the gradient of the in-sample error of linear regression?
Homework Problems – Tutorial #3 Due: February 6, 2022 11:59 PM
Question 3 (Problem 4, Midterm 2017)
Consider the logistic regression setup as in the previous question. Suppose we are given a dataset
D = {(x1, y1),(x2, y2)} with
x1 =

1 1>
, y1 = 1 and x2 =

1 0>
, y2 = −1.
We consider the l2-regularized error as
Ein(w) = −
PN
n=1 log [Pw(yn|xn)] + λkwk
2
2
, λ > 0, (7)
where
Pw(y|x) = (
h(x) y = +1
1 − h(x) y = −1
, (8)
and h(x) = e
w
>x
1 + ew>x
=
1
1 + e−w>x
.
(a) For λ = 0, find the optimal w that minimizes Ein(w) and the minimum value of Ein(w).
(Hint: you are given xn, yn, so plug those values into the expression of the in-sample error).
(b) Suppose λ is a very large constant such that it suffices to consider weights that satisfy kwk2 
1. Since w has a small magnitude, we may use the Taylor series approximation
log(1 + exp(−ynw
>xn)) ≈ log(2) −
1
2
ynw
>xn. (9)
Assuming the above approximation is exact, find w that minimizes Ein(w) (it should be
expressed in terms of λ).