CSC321 Homework 2 solution

$24.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (3 votes)

1. Perceptron Algorithm. Suppose we have the following training examples in a 2-dimensional
input space, and no bias term. Following our discussion of the perceptron algorithm, we use
t = −1 rather than t = 0 to denote the negative class.
x1 x2 t
1 -2 1
0 -1 -1
Recall the perceptron algorithm from lecture. It repeatedly applies the following procedure,
stopping when the weights are strictly within the feasible region (i.e. not on the boundary):
For each training case (x
(i)
, t(i)
),
z
(i) ← wT x
(i)
If z
(i)
t
(i) ≤ 0,
w ← w + t
(i)x
(i)
Here we work through the algorithm by hand.
(a) [2pts] Sketch out the problem in weight space. In particular: draw and label the axes,
draw the half-space corresponding to each of the two training examples, and shade the
feasible region. (Remember that there is no bias term.)
(b) [2pts] Simulate the perceptron algorithm by hand, starting from the initial weights
w1 = 0, w2 = −2. On the plot from part (a), mark an X on every point in weight space
visited by the algorithm until it stops. You do not need to provide anything for this part
other than the X’s on the plot. Hint: the weights are updated 12 times. It will be tedious
if you compute all the updates using arithmetic; instead, plot the first few updates as you
go along, and you should start to notice a visual pattern.
1
https://markus.teach.cs.toronto.edu/csc321-2018-01
2
http://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/syllabus.pdf
1
CSC321 Winter 2018 Homework 2
2. Feature Maps. Suppose we have the following 1-D dataset for binary classification:
x t
-1 1
1 0
3 1
(a) [1pt] Argue briefly (at most a few sentences) that this dataset is not linearly separable.
Your answer should mention convexity.
Now suppose we apply the feature map
ψ(x) = 
ψ1(x)
ψ2(x)

=

x
x
2

.
Assume we have no bias term, so that the parameters are w1 and w2.
(b) [2pts] Write down the constraint on w1 and w2 corresponding to each training example, and then find a pair of values (w1, w2) that correctly classifies all the examples.
Remember that there is no bias term.
3. Loss Functions. [3pts] Suppose we have a prediction problem where the target t corresponds to an angle, measured in radians. A reasonable loss function we might use is
L(y, t) = 1 − cos(y − t).
Suppose we make predictions using a linear model,
y = w>x + b.
As usual, the cost is the average loss over the training set:
E =
1
N
X
N
i=1
L(y
(i)
, t(i)
).
Derive a sequence of vectorized mathematical expressions for the gradients of the cost with
respect to w and b. As usual, the inputs are organized into a design matrix X with one row
per training example. The expressions should be something you can translate into a Python
program without requiring a for-loop. Your answer should look like:
y = · · ·
∂E
∂y
= · · ·
∂E
∂w
= · · ·
∂E
∂b = · · ·
You can use sin(A) to denote the sin function applied elementwise to A. Remember that
∂E/∂w denotes the gradient vector,
∂E
∂w
=


∂E
∂w1
.
.
.
∂E
∂wD


2