Description
1 SVMs
1.1 Scaling the Inputs
True or false: in training an SVM it is generally a good idea to scale all input variables so that, for
example, they all lie in some fixed interval or so that they all have the same mean, µ, and variance, σ
2
,
e.g (µ, σ2
) = (0, 1). Justify your answer.
1.2 Classifying Tumors
1. Load the breast cancer dataset using sklearn.datasets. Construct an SVM classifier for this data.
You should randomly assign t% of your data to the training set and the remainder of your data to
the test set. Then use cross-validation on your training set to build your classifier. You can take
t = 70% initially.
2. Repeat part (1) N = 50 times to get N samples of the performance of the trained classifier on the
test set. (Note that each of the N samples will have different training and test sets.) Compute the
mean and standard deviation of the test- set-performance.
3. Repeat part (b) for values of t = 50%, 55%, . . . , 95% and plot the mean test-set performance together
with 95% confidence intervals for this performance against t. What conclusions can you draw?
1.3 SVMs and Cross-Validation
Suppose you have successfully trained an SVM with 10,000 training points and a Gaussian kernel where
the values of C and σ were selected via cross-validation. Recall that the Gaussian kernel has the form
K(x, x0
) = exp
−
kx − x
0k
2
2σ
2
You are then given an additional 40,000 training points and so you wish to retrain your SVM using the
entire 50,000 training points that you now have. However, you wish to avoid the heavy computational
1
IEOR E4525
Christian Kroer
Assignment 4
Due: Nov 19th, at 11:59pm
expense associated with repeating the cross-validation exercise that you previously used to pick C and
σ. Instead, you simply use the C and σ that you found using the first 10,000 training points, and then
retrain your SVM using those hyperparameters, but on the new set of 50,000 data points. Do you see
any potentially major problem with this? If so, what is it?
2 PyTorch Practice
1. Install PyTorch. I recommend using anaconda, in which case you can do it with the conda package
manager: conda install pytorch torchvision -c pytorch
2. Do the PyTorch 60-minute blitz tutorial
3. Your jupyter notebook should have a section where you run each command from the PyTorch
60-minute blitz (this will only be lightly graded). You do not need to run the GPU commands.
4. Create a neural network with two hidden layers (the notebook shows how to create one with one
hidden layer), both layers should be ReLU layers (you may simply take the one from the notebook
and add a second layer with 256 output features, but feel free to get more creative)
5. Try SGD, Adam, and at least one other optimization algorithm from torch.optim. Try at least 3
different stepsizes for each algorithm (for Adam you should also try the default stepsize). Report
on your experience with finding a reasonable stepsize for each algorithm (e.g. how sensitive is each
algorithm to stepsize), and how the algorithms compare on loss minimization, training accuracy,
and test accuracy.
6. If you pick the best setup from all your experiments above, based on either loss or training accuracy
performance, do you get the best algorithm on test accuracy?
3 Function Approximation
1. Consider a ReLU network with a single hidden layer, with W(1) ∈ R
2×2
, x, b(1) ∈ R
2 and W(2) ∈
R
1×2
, b(2), y ∈ R:
• h
1 = σ(W(1)x + b
(1))
• yˆ = W(1)h
1 + b
(2)
Show that this network is a piecewise linear function. Specify the set of pieces and the value on
each piece.
2. Consider a continuous piecewise-linear function
f(x) =
x + 3 if x < 5
2x − 2 if 5 ≤ x < 10
−1x + 28 if 10 ≤ x
Show how to represent it with a ReLU network that uses a single hidden layer.
2


