COMPSCI 589 Classification & Model Selection Assignment: 3 solution

$29.99

Category:

Description

5/5 - (3 votes)

Code:
• For this assignment you may use the methods in sklearn.tree ,
sklinear.linear_model , sklearn.svm , sklearn.metrics and sklearn.neighbors .
6
• When using methods from sklinear.linear_model and sklearn.svm , after
training them you can call them via decision_function() only. Do not use
predict or score or predict_proba .
• You may also use sklearn.model_selection.KFold — but not any other methods in
sklearn.model_selection .
5/16/2021 3 Classification & Model Selection
https://www.notion.so/justindomke/3-Classification-Model-Selection-5202709ce7ca4440ba6e40df61cd455b 3/12
• If the assignment asks you to implement a particular function, you are expected to
implement it yourself. If you find that the function is implemented somewhere
within sklearn or np but not specifically banned above, your implementation
should not consist of a call to that function.
3
Preliminaries
Dataset
In this assignment you are given a set of 32×32 RBG images. There are four possible
labels. Your goal will be to train a predictor to recognize what is in the image. Here are
the first few elements of the training data, shown as images:
You are given a file data.npz .
data.npz 30013.1KB 4
The data can be loaded as follows:
stuff=np.load(“data.npz”) X_trn = stuff[“X_trn”] y_trn = stuff[“y_trn”] X_tst
= stuff[“X_tst”] # no Y_tst !
There are a total of 6000 training examples, and 1200 test examples, each with 3072
dimensions. Those dimensions correspond to 32x32x3 RBG images (32*32*3=3072). If
you like, you can plot an example with the following code:
from matplotlib import pyplot as plt def show(x): img =
x.reshape((3,32,32)).transpose(1,2,0) plt.imshow(img) plt.axis(‘off’)
plt.draw() plt.pause(0.01) show(X_trn[7])
5/16/2021 3 Classification & Model Selection
https://www.notion.so/justindomke/3-Classification-Model-Selection-5202709ce7ca4440ba6e40df61cd455b 4/12
Kaggle
is a very popular platform for creating and running machine learning
competitions. It allows the creators of competitions to evaluate submissions against a
secret test set, ensuring that competitors cannot “cheat” by fine-tuning their model
against the test set.
Kaggle
The makers of Kaggle also created a set of features for creating an “InClass”
competition, perfect for classes such as 589! We created a competition for
in which you are required to participate. However, don’t worry! It’s not truly a
“competition” as much as it is a way to automatically evaluate your submissions and to
familiarize you with the Kaggle platform. Again, to make it easier for us to grade, please
create a Kaggle account using your email address.
Assignment
3
umass.edu
The “competition” has two leaderboards: A public leaderboard and private leaderboard.
The test set is split into two sets: A “public” set containing about 30% of the data and a
“private” set containing the remaining 70%. In a normal competition, you can see how
well your submission is performing against the public set. In theory, one could use
brute force to find all of the correct answers. For this reason, in most competitions
submissions are scored against the “private” set. This assignment will, at various points,
ask you to report the performance of various solutions according to the public
leaderboard.
To submit solutions to Kaggle, you will be required to submit a .csv file with two
columns: an “Id” column and a “Category” column classifying the integer prediction for
each element in X_tst . A sample solution using randomly predicted outputs can be
generated as follows:
2
import numpy as np import csv def write_csv(y_pred, filename): “””Write a 1d
numpy array to a Kaggle-compatible .csv file””” with open(filename, ‘w’) as
csv_file: csv_writer = csv.writer(csv_file) csv_writer.writerow([‘Id’,
‘Category’]) for idx, y in enumerate(y_pred): csv_writer.writerow([idx, y])
data = np.load(‘data.npz’) X_tst = data[‘X_tst’] y_pred =
np.random.randint(0, 3, size=len(X_tst)) # random predictions
write_csv(y_pred, ‘sample_predictions.csv’)
You can use the write_csv helper function in your code if you find it helpful to ensure
that your solution is in the correct format.
5/16/2021 3 Classification & Model Selection
https://www.notion.so/justindomke/3-Classification-Model-Selection-5202709ce7ca4440ba6e40df61cd455b 5/12
Note that the leaderboard shows accuracy whereas the assignment in some places asks
for classification error. Note that these are related by
Classification Error = 1 − Accuracy,
so it is easy to translate between the two.
Simple Classifiers
Question 1 (5 points) Take a very small dataset with four scalar inputs:
x
(1)
x
(2)
x
(3)
x
(4)
=
=
=
=
1.0
2.0
3.0
4.0
There are two possible labels, as shown below:
y
(1)
y
(2)
y
(3)
y
(4)
=
=
=
=
1
0
1
1
For each of the following split points, what is the information gain? Show your work. 9+
• Split at x = 0.5
• Split at x = 1.5
• Split at x = 2.5
• Split at x = 3.5
• Split at x = 4.5
5/16/2021 3 Classification & Model Selection
https://www.notion.so/justindomke/3-Classification-Model-Selection-5202709ce7ca4440ba6e40df61cd455b 6/12
Question 2 (5 points) Consider a classification tree with a maximum depth of ,
trained on data wdith dimensions. What is the time complexity to evaluate that
classification tree on a single new input? Give an answer (Something like “order of
“) and explain in at most 3 sentences why your answer is correct.
M 6
D
log(M) D
Question 3 (5 points) Take a dataset with elements each with dimensions. What is
the time complexity to train a classification stump? Give an answer and explain why it’s
correct in at most 3 sentences.
N D 2
Question 4 (6 points) Train 6 different classification trees on the image data, with each
of the following maximum depths: {1,3,6,9,12,14}. (Do not apply any other restriction
when growing the tree.) Using 5-fold cross validation, estimate mean the out of sample
(generalization) classification error, and report this as a table. You should have one row
for each possible depth and one number, which is the mean estimated error.
9+
Question 5 (6 points) What depth performs best in the previous question? Using that
depth, make predictions on the test data, and upload your predictions to Kaggle. For
this question, you need to report:
2
• What depth you chose.
• What was your estimated generalization error using 5-fold cross-validation.
• What accuracy you observed on the public part of the leaderboard.
Question 6 (5 points) Consider a dataset with elements, each with dimensions.
What is the time complexity to evaluate a K-nearest neighbors classifier? Give an
answer and explain why it’s correct in at most 3 sentences.
N D 6
Question 7 (6 points) Do nearest-neighbor prediction for each of the following possible
values of K: {1, 3, 5, 7, 9, 11}. Using 5-fold cross-validation, estimate the out of sample
classification error, and report this as a table. (Warning: This question might take a
significant amount of computational time. You may consider using the n_jobs option.)
6
5/16/2021 3 Classification & Model Selection
https://www.notion.so/justindomke/3-Classification-Model-Selection-5202709ce7ca4440ba6e40df61cd455b 7/12
Question 8 (6 points) What K performs best in the previous question? Using that K,
make predictions on the test data, and upload your predictions to Kaggle. Report:
• What value K you chose.
• What was your estimated generalization error using 5-fold cross validation.
• What accuracy you observed on the public part of the leaderboard.
Question 9 (10 points) For both hinge loss and logistic loss, train linear models with
ridge regularization. That is, find w to minimize
L(y , w x ) +
n=1

N
(n) ⊤ (n) λ∥w∥ .
2
where is the loss. For each loss and each of the regularization constants
, train a model and estimate the mean out of sample
loss/error using 5-fold cross-validation. Organize your errors as a 5×2 table, with one
row for each value of and one column for each training loss.
L λ ∈ 9+
{10 , 10 , 1, 10, 100} −4 −2
λ
Give 3 tables: one where you estimate 0-1 classification error, one where you estimate
logistic loss, and one where you estimate hinge loss. (You will report a total of 30
numbers.)
9+
(Hint: You should be aware of sklearn.svm.LinearSVC . Again, you are not permitted to
use predict() or predict_proba() . But decision_function() is OK.)
(Hint: There has been some confusion about this question. To clarify, you have 10
different training methods, corresponding to each combination of regularization
constant (5 options) and training loss (2 options). For each of these training methods,
you should estimate the generalization error using 5-fold cross-validation. But you
should estimate that generalization error in three ways, for 0-1, logistic, and hinge loss.
Since there are 10 training methods and 3 measures of generalization error, you report
a total of 30 numbers. That is all that you report for this question.)
Question 10 (6 points) Choose the training loss and that you think will perform best
on the public leaderboard. Make predictions for the test data and upload your
predictions to Kaggle. Report:
λ
5/16/2021 3 Classification & Model Selection
https://www.notion.so/justindomke/3-Classification-Model-Selection-5202709ce7ca4440ba6e40df61cd455b 8/12
1. What training loss and λ you chose
2. What was your estimated generalization error using 5-fold cross validation.
3. What accuracy you observed on the public part of the leaderboard.
Neural Networks
You will train several neural networks, each with a single hidden layer. These neural
networks can be written as
f(x) = c + Vσ(b + Wx).
Here:
• x is the input, a vector of length D
• W is a matrix of size M × D that maps input features to a hidden space
• b is the bias term for the hidden layer, a vector of length M
• is the activation function. You will need that the derivative is
.
σ(a) = tanh(a)
da =
dσ(a)
1 − tanh(a)
2
• V is a matrix of size O × M that maps the hidden space to the output space
• c is the bias term for the output space, a vector of length O
Note that is a function that maps a vector to a vector. We will refer
to the -the component of the output as .
f(x) : R
D → R
O
i f(x)i
For this problem, we will use the logistic loss, defined as
4
L(y, f) = −fy + log exp(f ),
i=0

3
i
where is the label for the input , and is the output vector.
Note that is therefore the component of the output vector . Also be careful to
note that here, we are indexing from 0 instead of 1.
y ∈ {0, 1, 2, 3} x f ∈ RO
fy y
th f
f
Question 11 (5 points) Write a function to evaluate the neural network and loss. Your
function should have the following signature:
5/16/2021 3 Classification & Model Selection
https://www.notion.so/justindomke/3-Classification-Model-Selection-5202709ce7ca4440ba6e40df61cd455b 9/12
3
def prediction_loss(x,y,W,V,b,c): # do stuff here return L
This should return a scalar. Give your function directly in your report.
Question 12 (10 points) Write a function to evaluate the gradient of the neural network.
Your function should have the following signature. Do not use any packages outside of
numpy.
def prediction_grad(x,y,W,V,b,c): # do stuff here return dLdW, dLdV, dLdb,
dLdc
Each returned array should be the same size as the input, and contain the
corresponding gradient. So, for example, dLdW is the derivatives . Give
your function directly in your report.
3
∇W L(y, f(x))
Question 13 (10 points) Take the following inputs, where there are 3 hidden units and 2
outputs (y = 0 or y = 1):
7
x
y
W
V
b
c
= [1, 2]
= 1
=

⎛ 0.5
−0.5
1
−1
1
.5 ⎠

= (
−1
1
−1
1
1
1
)
= [0, 0, 0]
= [0, 0]
Run the function from the previous question to compute the gradient with respect to
W, V , b, and c. Give the results directly in your report, organized as you see above.
5/16/2021 3 Classification & Model Selection
https://www.notion.so/justindomke/3-Classification-Model-Selection-5202709ce7ca4440ba6e40df61cd455b 10/12
Autograd
The next several questions use the autograd toolbox, which you can install via pip
install autograd . A short demo of autograd can be found here:
Group Finder
Autograd Demo
Question 14 (5 points) Write a function to evaluate the same gradient as in Question 12
using the autograd toolbox (Hint: You will need to import the NumPy wrapper, import
autograd.numpy as np , and the grad high-order function, from autograd import grad ).
5
def prediction_grad_autograd(x,y,W,V,b,c): # do stuff here return dLdW, dLdV,
dLdb, dLdc
Give your function directly in your report. (You do not need to give any outputs from
your function, but it is suggested to check the results against Q13 since if they are
different, one must be wrong!)
Question 15 (5 points) Update your function from question 11. Instead of taking a
single input x and a single output y, take an 2D of inputs X (where the first dimension
indexes the different examples) and a 1D array of outputs Y. Also, take a regularization
constant λ and apply squared regularization to W and V . Do not regularize b or c .
Your function should be the sum of the logistic losses for each example in the dataset,
plus the regularizer loss applied to W and V . (To be explicit, the regularizer could be
written as .)
5
λ (∑ W + V vm vm
2 ∑mi mi
2 )
def prediction_loss_full(X,Y,W,V,b,c,λ): # do stuff here return L # include
regularization
Question 16 (5 points) Update your gradient function to work on a full dataset and
include regularization, as in the previous question. Again, you should use autograd.
5/16/2021 3 Classification & Model Selection
https://www.notion.so/justindomke/3-Classification-Model-Selection-5202709ce7ca4440ba6e40df61cd455b 11/12
def prediction_grad_full(X,Y,W,V,b,c,λ): # do stuff here return dLdW, dLdV,
dLdb, dLdc
Question 17 (15 points) Here is psuedo-code to optimize a function by gradient
descent with momentum.
h(w)
ave_grad = 0 for iter = 1, 2, … max_iters: ave_grad = (1 – momentum) *
ave_grad + momentum * ∇h(w) w = w – stepsize * ave_grad
For each size of the hidden layer, , train your neural network on the
main data for this homework. Weights for layers and should be initialized by
sampling from , where is the standard normal distribution and is the
number of input dimensions for that layer. Weight for and should be initialized as
zeros.
M ∈ {5, 40, 70}
W V
D
N (0,1) N (0, 1) D
b c
Use gradient descent with momentum, with 1000 iterations, a step size of 0.0001, a
momentum of 0.1, and .
4
λ = 1
Report the following:
1. For each value of , what is the total training time (in ms) for all iterations. (Give a
table with 3 entries.)
M
2. Make a plot of the training objective (regularized loss) as a function of iterations.
This should be a single plot with 3 curves, one for each value of . Include the plot
in your report.
M
Question 18 (10 points) Make a single train-validation split of the data with 50% used
for training and 50% for testing. Train your neural network using the parameters above
for each value of and give the estimated generalization error. Again, using the same
initial weights generated using the scheme above. Then, retrain your network on all the
data, make predictions for the Kaggle data, and upload to Kaggle. Report your
accuracy on the public leaderboard. Report:
4
M
5/16/2021 3 Classification & Model Selection
https://www.notion.so/justindomke/3-Classification-Model-Selection-5202709ce7ca4440ba6e40df61cd455b 12/12
• What value of M you chose.
• What accuracy you expected. 2
• What accuracy you observed on the leaderboard.