Description

5/5 - (1 vote)

IEOR E4525 Assignment 1

1 Lab 3.6 from ISLR
Go through the lab exercise in Section 3.6 of ISLR. The book is written to use the programming language
R for these exercises. If you like, you can use R to complete these exercises (in this case, I highly
recommend using the IDE rstudio). Alternatively, since we will later use python for other assignments,
I have provided a corresponding set of python commands that you can use. These are given in the form of
a Jupyter notebook. I recommend that you open this notebook for reference (run the command jupyter
notebook in a terminal from the directory containing the notebook, and open the notebook from there),
but type all the commands into a new notebook of your own, to make sure that you pay attention to
what each cell is doing.
If you use python, you will need the various packages listed at the top of the notebook (these packages are extremely common at companies, becoming proficient with pandas, numpy, sklearn is highly
recommended). The easiest way to get these is to install the anaconda package from here:
https://www.anaconda.com/products/individual.
You do not need to turn in your code. The goal of this exercise is to practice your data skills.
Questions
1. Compare the plots of the residuals vs. the fitted values for the regression medv lstat + np.square(lstat)
and the regression using only lstat as a predictor. What’s the qualitative difference?
2. Does the fifth-order polynomial from your python regression correspond to the one from the ISLR
book? If not, why might this occur?
2 EDA with the Spam Filtering Data Set
The csv file spam.csv contains a data set for emails that were categorized as spam or not spam. The
documentation for this data set is in the file spam-info.pdf.
1. Look at the documentation. What is the variable of interest, i.e. the dependent variable?
1
IEOR E4525
Christian Kroer
Assignment 1
Due: Oct 1st, at 11:59pm
2. For each of the independent variables, report something about it. Specifically, you should report
on each variable’s relationship with the response, i.e dependent, variable. Pay special attention to
variable type (binary, ordinal, real) when doing this. Your comments should contain at least some
tables and graphs.
3. Investigate the variable ’spampct’.
(a) How many missing values does it have?
(b) Compare graphically the distribution for time.of.day for the cases where spampct is missing
against the distribution of time.of.day when spampct is present. Do you see any differences?
(c) Plot a scatter plot of time of day vs. spampct. How many unique points (x,y coordinates) are
plotted? Explain a technique you might use to deal with the overplotting.
3 Exploring the Relationship Between Overfitting and Noise
Do exercise 13 from Section 3.7 of ISLR. The example codes are for R, but below I provide a table of
translations to python. You will need to use the numpy documentation to look up how to use the various
commands. Make sure you look up the documentation for your version of numpy.
R command python command
set.seed(1) np.random.seed(1)
rnorm() np.random.randn()
rnorm() np.random.randn()
4 Naive Bayes and Spam Filtering
1. Use the spam data from Question 2 and Naive Bayes to build a classifier that distinguishes spam
from non-spam. You can use Naive Bayes from sklearn for this. Your code should split the data
into training and test sets and then estimate the generalization error of your classifier.
2. Randomly assign 80% of your data to the training set, 20% to the test set and now estimate the
test error, Etest, of your classifier. Repeat this 10 times. How much variability do you see in Etest?
What conclusions can you draw from this?
3. There are two types of error that a spam classifier can make. Should these errors be treated equally
when constructing a classifier. Can we adapt our naive Bayes classifier to reflect this?
5 Least Squares Linear Regression is MLE for Gaussian noise
Consider the linear regression model
Y = XT β + ,
where β, X ∈ R
d
, are fixed, and the error ∼ N (0, σ2
) is distributed according to a Gaussian distribution.
In class we saw how to derive the least squares estimator. In this exercise, you just must prove that
the least squares estimator is also the maximum-likelihood estimator, given that the error is Gaussian.
6 k Nearest Neighbors and the Curse of Dimensionality
Solve exercise 4 from Section 4.7 of ISLR.

IEOR E4525 Assignment 2

1 ISLR Classification Lab
Complete the lab from Section 4.6 of ISLR. Feel free to utilize the provided worked jupyter notebook as
inspiration.
You do not need to submit your code for this question, it is purely for you to practice with.
2 Classification Models for Stock Market Data
Solve exercise 10 from Section 4.7 of ISLR.
The Weekly dataset can be found in the Data folder. A description of it can be found here on page
14.
3 Reduced-Rank LDA
Let B and W be positive definite matrices and consider the following problem of maximizing the Rayleigh
quotient:
max
a
a
T Ba
a
T W a
(1)
1. Use the method of Lagrange multipliers to solve this problem. In particular, show that the optimal
solution a
∗
is an eigenvector of a certain matrix related to B and W. What is this matrix, and
which eigenvector does a
∗
correspond to?
Hint: Use the scale invariance of the Rayleigh quotient to rewrite the unconstrained maximization
as a constrained maximization problem where B appears in the objective, and W appears in the
constraint.
1
IEOR E4525
Christian Kroer
Assignment 2
Due: Oct 15th, at 11:59pm
2. By identifying B and W with the between-class and within-class covariance matrices, we can interpret the problem in (1) as the problem of finding the linear combination a
∗T x so as to maximize the
between-class variance relative to the within-class variance. Show that a
∗T x is the first discriminant
variable.
Hint: First note that W = Σ from the lecture slides, and that B∗ = D−1/2U
T BUD−1/2
4 Logistic Regression
1. Show that binary classification using logistic regression yields a linear classifier.
Consider a naive Bayes classifier for a binary classification problem where all the class-conditional
distributions are assumed to be Gaussian with the variance of each feature Xj being equal across the two
classes. That is we assume (Xj |Y = k) ∼ N(µjk, σ2
j
)
2. Show that the decision boundary is a linear function of X = (X1, …, Xd) and hence that it has the
same parametric form as the decision boundary given by logistic regression.
3. Does the result of part (2) imply that in this case, Gaussian naive Bayes and logistic regression will
find the same decision boundary? Justify your answer.
4. If indeed the class conditional distributions are Gaussian with (Xj |Y = k) ∼ N(µjk, σ2
j
) and the
assumptions of naive Bayes are true, which classifier do you think will be “better”: the naive Bayes
classifier of part (2) or logistic regression? Justify your answer.
5 Bootstrap Probabilities
Solve exercise 2 from Section 5.4 of ISLR.

IEOR E4525 Assignment 3

1 Bootstrap Description
Suppose that we have a dataset (xi
, yi)
n
i=1, and we fix a value ¯x. Further suppose that we are going to
build a predictor for the response ¯y associated to ¯x using some statistical learning method. Describe how
we might estimate the standard deviation of our prediction. You must explicitly define every variable
and equation.
2 Bootstrap for Estimating Standard Errors of Logistic Regression Coefficients
Solve exercise 6 from Chapter 5 of ISLR, in python.
You can use either the statsmodels or sklearn package for this. I recommend using statsmodels,
since it has better support for this kind of statistical analysis (note also that sklearn regularizes by
default, so you must turn that off if you use sklearn).
For statsmodels, after building a model m, you can use m.summary() to get the standard errors of
the coefficients.
For 6.b, write a function boot fn that works as described in ISLR. Instead of the R library function
boot, you must write your own: write a function boot(data, fn, R) where data is a pandas dataframe,
fn is a function that computes a statistic, and R is the number of replicates. You can use resample from
sklearn to generate individual bootstrap samples.
3 Cross-Validation on Simulated Data
Solve exercise 8 from Chapter 5 of ISLR, in python.
4 Ridge Regression Effect of λ
Solve exercise 4 in Chapter 6 of ISLR
1
IEOR E4525
Christian Kroer
Assignment 3
Due: Oct 29th, at 11:59pm
5 Comparing Lasso, Ridge, and Least Squares
Solve exercise 9 from Chapter 6 of ISLR. You only need to complete questions (a),(b),(c),(d), and (g).
For question (g), you only need to compare the three approaches from (b), (c), and (d).
You can use scikitlearn LinearRegression, Ridge, Lasso.
If you prefer statsmodels, then you can use regularized for lasso and ridge.

IEOR E4525 Assignment 4

1 SVMs
1.1 Scaling the Inputs
True or false: in training an SVM it is generally a good idea to scale all input variables so that, for
example, they all lie in some fixed interval or so that they all have the same mean, µ, and variance, σ
2
,
e.g (µ, σ2
) = (0, 1). Justify your answer.
1.2 Classifying Tumors
1. Load the breast cancer dataset using sklearn.datasets. Construct an SVM classifier for this data.
You should randomly assign t% of your data to the training set and the remainder of your data to
the test set. Then use cross-validation on your training set to build your classifier. You can take
t = 70% initially.
2. Repeat part (1) N = 50 times to get N samples of the performance of the trained classifier on the
test set. (Note that each of the N samples will have different training and test sets.) Compute the
mean and standard deviation of the test- set-performance.
3. Repeat part (b) for values of t = 50%, 55%, . . . , 95% and plot the mean test-set performance together
with 95% confidence intervals for this performance against t. What conclusions can you draw?
1.3 SVMs and Cross-Validation
Suppose you have successfully trained an SVM with 10,000 training points and a Gaussian kernel where
the values of C and σ were selected via cross-validation. Recall that the Gaussian kernel has the form
K(x, x0
) = exp
−
kx − x
0k
2
2σ
2

You are then given an additional 40,000 training points and so you wish to retrain your SVM using the
entire 50,000 training points that you now have. However, you wish to avoid the heavy computational
1
IEOR E4525
Christian Kroer
Assignment 4
Due: Nov 19th, at 11:59pm
expense associated with repeating the cross-validation exercise that you previously used to pick C and
σ. Instead, you simply use the C and σ that you found using the first 10,000 training points, and then
retrain your SVM using those hyperparameters, but on the new set of 50,000 data points. Do you see
any potentially major problem with this? If so, what is it?
2 PyTorch Practice
1. Install PyTorch. I recommend using anaconda, in which case you can do it with the conda package
manager: conda install pytorch torchvision -c pytorch
2. Do the PyTorch 60-minute blitz tutorial
3. Your jupyter notebook should have a section where you run each command from the PyTorch
60-minute blitz (this will only be lightly graded). You do not need to run the GPU commands.
4. Create a neural network with two hidden layers (the notebook shows how to create one with one
hidden layer), both layers should be ReLU layers (you may simply take the one from the notebook
and add a second layer with 256 output features, but feel free to get more creative)
5. Try SGD, Adam, and at least one other optimization algorithm from torch.optim. Try at least 3
different stepsizes for each algorithm (for Adam you should also try the default stepsize). Report
on your experience with finding a reasonable stepsize for each algorithm (e.g. how sensitive is each
algorithm to stepsize), and how the algorithms compare on loss minimization, training accuracy,
and test accuracy.
6. If you pick the best setup from all your experiments above, based on either loss or training accuracy
performance, do you get the best algorithm on test accuracy?
3 Function Approximation
1. Consider a ReLU network with a single hidden layer, with W(1) ∈ R
2×2
, x, b(1) ∈ R
2 and W(2) ∈
R
1×2
, b(2), y ∈ R:
• h
1 = σ(W(1)x + b
(1))
• yˆ = W(1)h
1 + b
(2)
Show that this network is a piecewise linear function. Specify the set of pieces and the value on
each piece.
2. Consider a continuous piecewise-linear function
f(x) =



x + 3 if x < 5
2x − 2 if 5 ≤ x < 10
−1x + 28 if 10 ≤ x
Show how to represent it with a ReLU network that uses a single hidden layer.

IEOR E4525 Assignments 1 to 4 solution

Download Details:

Description

IEOR E4525 Assignment 1

IEOR E4525 Assignment 2

IEOR E4525 Assignment 3

IEOR E4525 Assignment 4

IEOR E4525 Assignments 1 to 4 solution

Download Details:

Description

IEOR E4525 Assignment 1

IEOR E4525 Assignment 2

IEOR E4525 Assignment 3

IEOR E4525 Assignment 4

Related products

IEOR E4525 Assignment 3 solution

IEOR E4525 Assignment 4 solution

IEOR E4525 Assignment 1 solution