40.319 STATISTICAL AND MACHINE LEARNING HOMEWORK 2 solved

$24.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (4 votes)

1. Entropy [5 Points]
The entropy of a discrete probability distribution, which is always greater than or equal to
zero, is given by
Ent(p) = −
Xn
i=1
pi
log pi
,
Xn
i=1
pi = 1.
1.1. Use Lagrange multipliers to find the distribution which maximizes entropy.
Solution. The Lagrangian is given by
L(p1, . . . , pn, λ) = −
Xn
i=1
pi
log pi + λ
Xn
i=1
pi − 1
!
.
Setting the gradient to zero, we get
∂L
∂pi
= − log pi − 1 + λ = 0 =⇒ λ = 1 + log pi
, ∀i = 1, . . . , n,
∂L
∂λ =
Xn
i=1
pi − 1 = 0 =⇒
Xn
i=1
pi = 1.
The first equation implies pi = pj for all i, j. The second equation implies
pi = 1/n for all i, so the uniform distribution maximizes entropy.
1.2. Which probability distribution minimizes entropy?
Solution. For any i, the distribution where pi = 1 and pj = 0 for all j 6= i
will attain the minimum entropy 0.
2. Schur Complement [5 Points]
Let A be an n x n matrix, B be an n x p matrix, C be a p x n matrix and D be a p x p
matrix. Show that

A B
C D−1
=

M −MBD−1
−D−1CM D−1 + D−1CMBD−1

,
where
M =

A − BD−1C
−1

40.319 STATISTICAL AND MACHINE LEARNING SPRING 2021 HOMEWORK 2 3
Solution.

A B
C D−1
=

M −MBD−1
−D−1CM D−1 + D−1CMBD−1



A B
C D A B
C D−1
=

A B
C D  M −MBD−1
−D−1CM D−1 + D−1CMBD−1



In 0
0 Ip

=

AM − BD−1CM A
−MBD−1

+ B

D−1 + D−1CMBD−1

CM − DD−1CM C
−MBD−1

+ D

D−1 + D−1CMBD−1


Thus, simplifying each of the submatrices, we have:
Top left submatrix:
AM − BD−1CM = A(A − BD−1C)
−1 − B

D−1C

A − BD−1C
−1

=

A − BD−1C
 A − BD−1C
−1
= In,
Top right submatrix:
A

−MBD−1

+ B

D−1 + D−1CMBD−1

= A


A − BD−1C
−1
BD−1

+ BD−1 + BD−1C

A − BD−1C
−1
BD−1
= −

A − BD−1C


A − BD−1C
−1
BD−1

+ BD−1
= −BD−1 + BD−1 = 0,
Bottom left submatrix:
CM − DD−1CM = CM − CM = 0,
Bottom right submatrix:
C

−MBD−1

+ D

D−1 + D−1CMBD−1

= DD−1 = Ip.
3. Convolutional Networks [15 Points]
We will use PyTorch to train a Convolutional Neural Network (CNN) to improve classification accuracy on the Fashion MNIST dataset. This dataset comprises 60,000 training
examples and 10,000 test examples of 28×28-pixel monochrome images of various clothing
items. Let us begin by importing
4 DUE 5 MAR. TOTAL 40 POINTS.
import numpy
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
There are a total of 10 classes enumerated in the following way:
labels = {
0 : “T-shirt”,
1 : “Trouser”,
2 : “Pullover”,
3 : “Dress”,
4 : “Coat”,
5 : “Sandal”,
6 : “Shirt”,
7 : “Sneaker”,
8 : “Bag”,
9 : “Ankle boot”
}
3.1. Define your model by inheriting from the nn.Module using the following format:
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
# initialize layers here
def forward(self, x):
# invoke the layers here
return …
40.319 STATISTICAL AND MACHINE LEARNING SPRING 2021 HOMEWORK 2 5
3.2. Complete the main function below; the test and train functions will be defined later.
def main():
N_EPOCH = # Complete here
L_RATE = # Complete here
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
train_dataset = datasets.FashionMNIST(’../data’, train=True,
download=True, transform=transforms.ToTensor())
test_dataset = datasets.FashionMNIST(’../data’, train=False,
download=True, transform=transforms.ToTensor())
##### Use dataloader to load the datasets
train_loader = # Complete here
test_loader = # Complete here
model = CNN().to(device)
optimizer = optim.SGD(model.parameters(), lr=L_RATE)
for epoch in range(1, N_EPOCH + 1):
test(model, device, test_loader)
train(model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
if __name__ == ’__main__’:
main()
3.3. Complete the training function by defining the model output and the loss function.
Use the optimizer’s step function to update the weights after backpropagating the
gradients. (Remember to clear the gradients with each iteration.)
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
# Fill in here
if batch_idx % 100 == 0:
print(’Epoch:’, epoch, ’,loss:’, loss.item())
6 DUE 5 MAR. TOTAL 40 POINTS.
3.4. In the test function, define the variable pred which predicts the output, and update
the variable correct to keep track of the number of correctly classified objects so
as to compute the accuracy of the CNN.
def test(model, device, test_loader):
model.eval()
correct = 0
exampleSet = False
example_data = numpy.zeros([10, 28, 28])
example_pred = numpy.zeros(10)
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
# fill in here
if not exampleSet:
for i in range(10):
example_data[i] = data[i][0].to(“cpu”).numpy()
example_pred[i] = pred[i].to(“cpu”).numpy()
exampleSet = True
print(’Test set accuracy: ’,
100. * correct / len(test_loader.dataset), ’%’)
for i in range(10):
plt.subplot(2,5,i+1)
plt.imshow(example_data[i], cmap=’gray’, interpolation=’none’)
plt.title(labels[example_pred[i]])
plt.xticks([])
plt.yticks([])
plt.show()
You must achieve more than 80% accuracy to get full credit.
Append the print-outs from your program (test accuracy and plots of images with their
predicted labels) to your PDF submission on Gradescope. Upload the final script as a file
named [student-id]-cnn.py using the Dropbox link at the start of this assignment.
40.319 STATISTICAL AND MACHINE LEARNING SPRING 2021 HOMEWORK 2 7
Solution.
import numpy
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
labels = {
0 : “T-shirt”,
1 : “Trouser”,
2 : “Pullover”,
3 : “Dress”,
4 : “Coat”,
5 : “Sandal”,
6 : “Shirt”,
7 : “Sneaker”,
8 : “Bag”,
9 : “Ankle boot”
}
##### Define the CNN
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4*4*50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4*4*50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
8 DUE 5 MAR. TOTAL 40 POINTS.
##### Training model
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx \% 100 == 0:
print(’Epoch:’, epoch, ’, loss:’, loss.item())
##### Testing model
def test(model, device, test_loader):
model.eval()
correct = 0
exampleSet = False
example_data = numpy.zeros([10, 28, 28])
example_pred = numpy.zeros(10)
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
##### Defining ’pred’ and updating ’correct’
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
#####
if not exampleSet:
for i in range(10):
example_data[i] = data[i][0].to(“cpu”).numpy()
example_pred[i] = pred[i].to(“cpu”).numpy()
exampleSet = True
print(’Test set accuracy: ’,
100. * correct / len(test_loader.dataset), ’%’)
for i in range(10):
plt.subplot(2,5,i+1)
plt.imshow(example_data[i], cmap=’gray’, interpolation=’none’)
plt.title(labels[example_pred[i]])
plt.xticks([])
plt.yticks([])
plt.show()
40.319 STATISTICAL AND MACHINE LEARNING SPRING 2021 HOMEWORK 2 9
def main():
##### Choosing epochs and learning rate
NUM_EPOCHS = 5
LRATE = 0.01
#####
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
train_dataset = datasets.FashionMNIST(’../data’, train=True,
download=True, transform=transforms.ToTensor())
test_dataset = datasets.FashionMNIST(’../data’, train=False,
download=True, transform=transforms.ToTensor())
##### Load the dataset
train_loader = torch.utils.data.DataLoader(train_dataset,
batch_size=64, shuffle=True, num_workers=1)
test_loader = torch.utils.data.DataLoader(test_dataset,
batch_size=1000, shuffle=True, num_workers=1)
#####
model = CNN().to(device)
optimizer = optim.SGD(model.parameters(), lr=LRATE)
for epoch in range(1, NUM_EPOCHS + 1):
test(model, device, test_loader)
train(model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
if __name__ == ’__main__’:
main()
10 DUE 5 MAR. TOTAL 40 POINTS.
4. Support Vector Machines [15 Points]
In this problem, we will implement Support Vector Machines (SVMs) for classifying two
datasets. We start by importing the required packages and modules.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from sklearn.svm import SVC
from sklearn.datasets.samples_generator import make_blobs, make_circles
The make blobs and make circles functions from sklearn.datasets can be invoked to
generate data for the first and second example, respectively.
The following will be used to plot decision boundaries, margins and support vectors.
def plot_svc_decision(model, ax=None):
if ax is None:
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
x = np.linspace(xlim[0], xlim[1], 30)
y = np.linspace(ylim[0], ylim[1], 30)
Y, X = np.meshgrid(y, x)
xy = np.vstack([X.ravel(), Y.ravel()]).T
P = model.decision_function(xy).reshape(X.shape)
# plot decision boundary and margins
ax.contour(X, Y, P, colors=’k’, levels=[-1, 0, 1],
alpha=0.5, linestyles=[’–’, ’-’, ’–’])
# plot support vectors
ax.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1],
s=300, linewidth=1, edgecolors=’black’, facecolors=’none’)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
40.319 STATISTICAL AND MACHINE LEARNING SPRING 2021 HOMEWORK 2 11
4.1. Use the following lines of code to plot the first dataset.
X, y = make_circles(100, factor=.1, noise=.1)
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
ax1.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=’seismic’)
Use SVC to construct a support vector machine (you will have to specify a kernel and
the regularization parameter C) to classify this dataset, then use fit(X, y) to feed
in the data and labels. Show your results using the plot svc decision function.
Provide one graph, labelled with your choice of kernel function and your value of C.
4.2. Now generate and plot the second dataset.
X, y = make_blobs(n_samples=100, centers=2,
random_state=0, cluster_std=1.0)
fig2 = plt.figure(figsize=(16, 6))
fig2.subplots_adjust(left=0.0625, right=0.95, wspace=0.1)
ax2 = fig2.add_subplot(121)
ax2.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=’seismic’)
ax3 = fig2.add_subplot(122)
ax3.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=’seismic’)
Your task here is to classify the dataset using different values of the regularization
parameter C to understand soft margins in SVM. Indicate clearly what values of
C you are using, and plot your results with plot svc decision using ax2 for one
model and ax3 for the other.
Append the plots from your program to your PDF submission on Gradescope. For programming exercises please attach the output to the PDF file you are going to submit in
Gradescope and upload your Python file (.py) in e-Dimension.
Solution.
12 DUE 5 MAR. TOTAL 40 POINTS.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from sklearn.svm import SVC
from sklearn.datasets.samples_generator import make_blobs, make_circles
def plot_svc_decision(model, ax=None):
if ax is None:
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
x = np.linspace(xlim[0], xlim[1], 30)
y = np.linspace(ylim[0], ylim[1], 30)
Y, X = np.meshgrid(y, x)
xy = np.vstack([X.ravel(), Y.ravel()]).T
P = model.decision_function(xy).reshape(X.shape)
# plot decision boundary and margins
ax.contour(X, Y, P, colors=’k’, levels=[-1, 0, 1],
alpha=0.5, linestyles=[’–’, ’-’, ’–’])
# plot support vectors
ax.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1],
s=300, linewidth=1, edgecolors=’black’, facecolors=’none’)
ax.set_xlim(xlim)
ax.set_ylim(ylim)
# (a)
X, y = make_circles(100, factor=.1, noise=.1)
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
ax1.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=’seismic’)
model1 = SVC(kernel=’rbf’, C=1E6)
model1.fit(X, y)
plot_svc_decision(model1, ax1)
40.319 STATISTICAL AND MACHINE LEARNING SPRING 2021 HOMEWORK 2 13
# (b)
X, y = make_blobs(n_samples=100, centers=2,
random_state=0, cluster_std=1.0)
fig2 = plt.figure(figsize=(16, 6))
fig2.subplots_adjust(left=0.0625, right=0.95, wspace=0.1)
ax2 = fig2.add_subplot(121)
ax2.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=’seismic’)
ax3 = fig2.add_subplot(122)
ax3.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=’seismic’)
ax2.set_title(’C={0:.1f}’.format(10), size=14)
ax3.set_title(’C={0:.1f}’.format(0.1), size=14)
model2 = SVC(kernel=’linear’, C=10)
model2.fit(X, y)
model3 = SVC(kernel=’linear’, C=0.1)
model3.fit(X, y)
plot_svc_decision(model2, ax2)
plot_svc_decision(model3, ax3)
plt.show()