Description
CSCI 5525: Machine Learning Homework 0
1. Have you read through the class syllabus, noted the important dates, and the class policies?
2. (i) Which of the following courses have you taken?
• CSci 5512 – Artificial Intelligence II
• CSci 5521 – Introduction to Machine Learning
• CSci 5523 – Introduction to Data Mining
(ii) Have you taken any course on Probability/Statistics? If yes, please write down the
course department and course name.
(iii) Have you taken any course on Linear Algebra? If yes, please write down the course
department and course name.
(iv) Have you taken any course on Optimization? If yes, please write down the course
department and course name.
3. Let X ∈ R
n×p and y ∈ R
n be given. The goal is to find a w
∗ ∈ R
p which solves the following
problem:
min
w∈Rp
1
2
ky − Xwk
2 +
c
2
kwk
2
,
where c > 0 is a constant. Give a closed form expression for w
∗
in terms of X, y and c. (Consult
the Matrix Cookbook if you want to look up expressions for derivatives in matrix/vector form.)
4. Let A be a n × n positive definite matrix. The solutions to the following problems
max
w∈Rn:wT w≤1
w
T Aw and min
w∈Rn:wT w≤1
w
T Aw (1)
have well known names—do you know what the solutions to these problems are called? (You
can refer back to your Linear Algebra course if needed)
5. What is the probability density function p(x; µ, Σ) of a multivariate Gaussian distribution
with mean µ and covariance Σ? Please provide an expression in terms of x, µ, Σ, and clearly
define any special function you use in the expression.
Let Θ = Σ−1 be the precision or inverse covariance matrix. What is expression of the
probability density function p(x; µ, Θ−1
) of a multivariate Gaussian distribution in terms of
the mean µ and precision matrix Θ?
CSCI 5525: Machine Learning Homework 1
1. (15 points) The expected loss of a function f(x) in modeling y using loss function `(f(x), y)
is given by
E(x,y)
[`(f(x), y)] = Z
x
Z
y
`(f(x), y)p(x, y)dydx =
Z
x
Z
y
`(f(x), y)p(y|x)dy
p(x)dx .
(a) (7 points) What is the optimal f(x) when `(f(x), y) = (f(x) − y)
2
.
(b) (8 points) What is the optimal f(x) when `(f(x), y) = |f(x) − y|, where | · | represents
absolute value.
2. (10 points) A generalization of the least squares problem adds an affine function to the least
squares objective,
min
w
kAw − bk
2
2 + c
>w + d
where A ∈ R
m×n
, w ∈ R
n
, b ∈ R
m, c ∈ R
n
, d ∈ R. Assume the columns of A are linearly
independent. This generalized problem can be solved by reducing it to a standard least
squares problem, using a trick called completing the square.
Show that the objective of the problem above can be expressed in the form
kAw − bk
2
2 + c
>w + d = kAw − b + fk
2
2 + g
where f ∈ R
m, g ∈ R. Then solve the generalized least squares problem by finding the w that
minimizes kAw − (b − f)k
2
2
.
Programming assignments: The next two problems involve programming. We will be considering two datasets for these assignments:
(a) Boston: The Boston housing dataset comes prepackaged with scikit-learn. The dataset has
506 data points, 13 features, and 1 target (response) variable. You can find more information
about the dataset here: https://scikit-learn.org/stable/modules/generated/sklearn.
datasets.load_boston.html.
While the original dataset is for a regression problem, we will create two classification datasets
for the homework. Note that you only need to work with the target t to create these classification dataset, the data X should not be changed.
First, load the dataset in with the following commands:
import s kl e a r n a s sk
X, t = sk . d a t a s e t s . l o a d b o s t o n ( r e t u r n X y=True )
Then, create the two following data sets.
i. Boston50: Let τ50 be the median (50th percentile) over all t (response) values. Create a
2-class classification problem such that one class corresponds to label y = 1 if t ≥ τ50 and
the other class corresponds to label y = 0 if t < τ50. By construction, note that the class
priors will be p(y = 1) ≈
1
2
, p(y = 0) ≈
1
2
.
ii. Boston75: Let τ75 be the 75th percentile over all t (response) values. Create a 2-class
classification problem such that one class corresponds to label y = 1 if t ≥ τ75 and the
other class corresponds to label y = 0 if t < τ75. By construction, note that the class priors
will be p(y = 1) ≈
1
4
, p(y = 0) ≈
3
4
.
(b) Digits: The digits dataset comes prepackaged with scikit-learn. The dataset has 1797 data
points, 64 features, and 10 classes corresponding to ten numbers 0, 1, . . . , 9. You can find more
information about the dataset here: https://scikit-learn.org/stable/modules/generated/
sklearn.datasets.load_digits.html.
3. (35 points) In this problem, we consider Fisher’s linear discriminant analysis (LDA) for
this problem. Implement1
, train, and evaluate the following classifiers using 10-fold crossvalidation:
(i) (15 points) For the Boston50 dataset, apply LDA in the general case, i.e., compute
both the between-class and within-class covariance matrices SB and SW , respectively,
from the training data, project the data onto R (one dimension), and then find a suitable threshold (one that minimizes classification error) to classify the training samples
correctly.
(ii) (20 points) For the Digits dataset, apply LDA in the general case, i.e., compute SB
and SW from the data, project the data to R
2
(two dimensions), then use bi-variate
Gaussian generative modeling to do 10-class classification, i.e., estimate and use class
priors πk and parameters (µk, Σk), k = 1, . . . , 10.
You will have to submit (a) summary of methods and results report and (b) code for
each algorithm:
(a) Summary of methods and results: Briefly describe the approaches in (i) and (ii)
above, along with relevant equations. Also, report the training and test set error rates
and standard deviations from 10-fold cross validation for the methods on the datasets.
(b) Code: For part (i), you will have to submit code for LDA1dThres(num crossval) (main
file). This main file has input: the number of folds for cross-validation, and output: the
training and test set error rates and standard deviations printed to the terminal (stdout).
For part (ii), you will have to submit code for LDA2dGaussGM(num crossval), with all
other guidelines staying the same.
1You must implement all algorithms in this homework from scratch; you cannot use toolboxes like scikit-learn.
4. (40 points) In this problem, the goal is to evaluate the results reported in the paper “On
Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes”
by A. Ng and M. Jordan2
, using the Boston50, Boston75, and Digits datasets. Implement,
train, and evaluate two classifiers:
(i) (20 points) Logistic regression (LR), and
(ii) (20 points) Naive-Bayes with marginal Gaussian distributions (GNB)
on all three datasets. Evaluation will be done using 10 random class-specific 80-20 traintest splits, i.e., for each class, pick 80% of the data at random for training, train a classifier
using training data from all classes, use the remaining 20% of the data from each class as
testing, and repeat this process 10 times. We will be creating a learning curve, similar to the
Ng-Jordan paper—please see guidelines below.
You will have to submit (a) summary of methods and results report and (b) code for
each algorithm:
(a) Summary of methods and results: Briefly describe the approaches in (i) and (ii)
above, along with (iterative) equations for parameter estimation. Clearly state which
method you are using for logistic regression. For each dataset and method, create a plot
of the test set error rate illustrating the relative performance of the two methods with
increasing number of training points (see instructions below). The plots will be similar in
spirit to Figure 1 in the Ng-Jordan paper, along with error-bars with standard deviation
of the errors.
Instructions for plots: Your plots will be based on 10 random 80-20 train-test splits.
For each split, we will always evaluate results on the same test set (20% of the data),
while using increasing percentages of the training set (80% of the data) for training. In
particular, we will use the following training set percentages: [10 25 50 75 100], so
that for each 80-20 split, we use 10%, 25%, all the way up to 100% of the training set for
training, and always report results on the same test set. We will repeat the process 10
times, and plot the mean and standard deviation (as error bars) of the test set errors for
different training set percentages.
(b) Code: For logistic regression, you will have to submit code for logisticRegression(num splits,
train percent). This main file has input: the number of 80-20 train-test splits for evaluation, (3) and a vector containing percentages of training data to be used for training
(use [10 25 50 75 100] for the plots), and output: test set error rates for each training
set percent printed to the terminal (stdout). The test set error rates should include both
the error rates for each split for each training set percentage as well as the mean of the
test set error rates across all splits for each training set percentage (print the mean error
rates at the end).
For naive Bayes, you will have to submit code for naiveBayesGaussian(num splits,
train percent), with all other guidelines staying the same.
Additional instructions: Code can only be written in Python 3.6+; no other programming
languages will be accepted. One should be able to execute all programs from the Python command
prompt or terminal. Please specify instructions on how to run your program in the README file.
2
https://ai.stanford.edu/~ang/papers/nips01-discriminativegenerative.pdf
Each function must take the inputs in the order specified in the problem and display the textual
output via the terminal and plots/figures should be included in the report.
For each part, you can submit additional files/functions (as needed) which will be used by the
main file. In your code, you cannot use machine learning libraries such as those available from
scikit-learn for learning the models or for cross-validation. However, you may use libraries for basic
matrix computations. Put comments in your code so that one can follow the key parts and steps
in your code.
Your code must be runnable on a CSE lab machine (e.g., csel-kh1260-01.cselabs.umn.edu).
One option is to SSH into a machine. Learn about SSH at these links: https://cseit.umn.edu/
knowledge-help/learn-about-ssh, https://cseit.umn.edu/knowledge-help/choose-ssh-tool,
and https://cseit.umn.edu/knowledge-help/remote-linux-applications-over-ssh.
Instructions
Follow the rules strictly. If we cannot run your code, you will not get any credit.
• Things to submit
1. hw1.pdf: A document which contains the solutions to Problems 1, 2, 3, and 4, which
including the summary of methods and results.
2. LDA1dThres and LDA2dGaussGM: Code for Problem 3.
3. logisticRegression and naiveBayesGaussian: Code for Problem 4.
4. README.txt: README file that contains your name, student ID, email, instructions
on how to run your code, any assumptions you are making, and any other necessary
details.
5. Any other files, except the data, which are necessary for your code.
Homework Policy. (1) You are encouraged to collaborate with your classmates on homework
problems, but each person must write up the final solutions individually. You need to list in the
README.txt which problems were a collaborative effort and with whom. (2) Regarding online
resources, you should not:
• Google around for solutions to homework problems,
• Ask for help on online,
• Look up things/post on sites like Quora, StackExchange, etc.
CSCI 5525: Machine Learning Homework 2
1. (20 points) Recall that a function K : X × X 7→ R is a valid kernel function if it is
symmetric and positive semi-definite function. For the current problem, we assume that the
domain X = R.
(a) (10 points) Let K1, . . . , Km be a set of valid kernel functions. Show that for any
wj ≥ 0, j = 1, . . . , m, x, x
0 ∈ R
p
that the function K(x, x
0
) = Pm
j=1 wjKj (x, x
0
) is a
valid kernel function.
(b) (10 points) Consider the function K(x, x
0
) = K1(x, x
0
) + K2(x, x
0
) where K1 and K2
are valid kernel functions. Show that K is a valid kernel function.
2. (20 points) The SVM classifier can be implemented in either primal or dual form. In this
problem, implement a linear SVM in dual form using slack variables. Note, this will be a
quadratic program with linear constraints. For this you need an optimizer. Use the optimizer
cvxopt which can be installed in your environment either through pip or conda. Refer to
the cvxopt document for more details about quadratic programming: https://cvxopt.org/
userguide/coneprog.html#quadratic-programming.
Apply your SVM to the dataset “hw2 data 2020.csv”. This dataset consists of samples from
2 classes making this a 2-class classification problem. The rows are the samples and the first
p columns are the features and the last column is the label. Split this dataset into 80% train
data and 20% test data. Apply k = 10 fold cross validation on the train data to choose the
optimal value of C (see (a) below).
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along
with relevant equations. Also, calculate the train and validation error rates over the 10
folds for each value of C = {10−4
, 10−3
, 10−2
, 0.1, 1, 10, 100, 1000}. Report the average
train error rate and its associated standard deviation (over the 10 train error rates)
and the average validation error rate and its associated standard deviation (over the 10
validation error rates). After running cross validation, choose the value of C which gives
the lowest average validation error. Apply the learned model with that value of C to
the held out test set and report the error rate on the test set (1 number). Make sure to
explain why you chose that value of C beyond that it has the lowest validation error rate.
(b) Code: Submit the file SVM dual.py which contains the function def SVM dual(dataset:
str) -> None: where dataset is a string consisting of the name of the dataset and the
function does not return anything but must print out to the terminal (stdout) the average
train and validation error rates and standard deviations, the optimal value of C, and the
test set error rate for the model with the lowest validation error rate.
3. (30 points) In this problem, we consider Kernel SVM. Implement a Kernel SVM for a generic
kernel. Apply your Kernel SVM to the dataset “hw2 data 2020.csv”. Split the dataset in
the same way as in Problem 2 (80% train, 20% test) and apply k = 10 fold cross validation
on the train data to choose to optimal hyperparameters (you must decide on reasonable
hyperparameter ranges) for the following kernels:
(i) Linear kernel,
(ii) RBF kernel.
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along
with relevant equations. Also, for both (i) and (ii), report the average train and validation error rates and standard deviations (over the 10 folds) for each combination of the
hyperparameter values (you choose the values to experiment with – they must be reasonable and you must be able to explain why they are reasonable). After running cross
validation, choose the optimal hyperparameter values and apply the learned model with
those values to the held out test set and report the error rate on the test set. Make sure
to explain why you chose those hyperparameter values.
(b) Code: Submit the file kernel SVM.py which contains the function def kernel SVM(dataset:
str) -> None: where dataset is a string consisting of the name of the dataset and the
function does not return anything but must print out to the terminal (stdout) the average train and validation error rates and standard deviations, the optimal hyperparameter
values, and the test set error rate for the best model.
4. (30 points) In this problem, we consider multi-class classification using SVM. Implement
a multi-class SVM using the one vs all strategy. Apply your SVM to the “mfeat” dataset1
which contains descriptors from MNIST for reducing the data dimensionality.
Split the dataset in the same way as in Problem 2 (80% train, 20% test) and apply k = 10
fold cross validation on the train data to choose to optimal hyperparameters (you must decide
on reasonable hyperparameter ranges) for the following kernels:
(i) Linear kernel,
(ii) RBF kernel.
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along
with relevant equations. Also, for both (i) and (ii), report the average train and validation error rates and standard deviations (over the 10 folds) for each combination of the
hyperparameter values (you choose the values to experiment with – they must be reasonable and you must be able to explain why they are reasonable). After running cross
1Download the dataset here: https://archive.ics.uci.edu/ml/datasets/Multiple+Features
validation, choose the optimal hyperparameter values and apply the learned model with
those values to the held out test set and report the error rate on the test set. Make sure
to explain why you chose those hyperparameter values.
(b) Code: Submit the file multi SVM.py which contains the function def multi SVM(dataset:
str) -> None: where dataset is a string consisting of the name of the dataset and the
function does not return anything but must print out to the terminal (stdout) the average train and validation error rates and standard deviations, the optimal hyperparameter
values, and the test set error rate for the best model.
Additional instructions: Code can only be written in Python 3.6+; no other programming
languages will be accepted. One should be able to execute all programs from the Python command
prompt or terminal. Please specify instructions on how to run your program in the README file.
Each function must take the inputs in the order specified in the problem and display the textual
output via the terminal and plots/figures should be included in the report.
For each part, you can submit additional files/functions (as needed) which will be used by
the main file. In your code, you cannot use machine learning libraries such as those available
from scikit-learn for learning the models. However, you may now use scikit-learn for cross validation – consider the function sklearn.model selection.KFold and see details here: https://
scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html. You
may also use libraries for basic matrix computations and plotting such as numpy, pandas, and
matplotlib. Put comments in your code so that one can follow the key parts and steps in your
code.
Your code must be runnable on a CSE lab machine (e.g., csel-kh1260-01.cselabs.umn.edu).
One option is to SSH into a machine. Learn about SSH at these links: https://cseit.umn.edu/
knowledge-help/learn-about-ssh, https://cseit.umn.edu/knowledge-help/choose-ssh-tool,
and https://cseit.umn.edu/knowledge-help/remote-linux-applications-over-ssh.
Instructions
Follow the rules strictly. If we cannot run your code, you will not get any credit.
• Things to submit
1. hw2.pdf: The report that contains the solutions to Problems 1, 2, 3, and 4 including the
summary of methods and results.
2. dual SVM.py: Code for Problem 2.
3. kernel SVM.py: Code for Problem 3.
4. multi SVM.py: Code for Problem 4.
5. README.txt: README file that contains your name, student ID, email, instructions
on how to run your code, any assumptions you are making, and any other necessary
details.
6. Any other files, except the data, which are necessary for your code.
Homework Policy. (1) You are encouraged to collaborate with your classmates on homework
problems, but each person must write up the final solutions individually. You need to list in the
README.txt which problems were a collaborative effort and with whom. (2) Regarding online
resources, you should not:
• Google around for solutions to homework problems,
• Ask for help on online,
• Look up things/post on sites like Quora, StackExchange, etc.
CSCI 5525: Machine Learning Homework 3
1. (10 points) Consider the convolutional neural network architecture given in Figure 1 for
classifying MNIST digits. Assume that the input images are reduced to size 10 × 10 with
only 1 channel (represented as a matrix in R
10×10). In this architecture, the convolutional
layer uses a 3 × 3 filter, Wconv, with stride 2 and zero padding stride 3 and zero padding of
size 1. The dimensions of the outputs of each layer are shown below.
Figure 1: A toy CNN
(a) (5 points) What are the values of n, m, k in the graph?
(b) (5 points) What are the sizes of Wconv, Wfc, and b?
Programming Problems: The next two problems are programming problems and will focus on
implementing neural networks for handwritten digits classification. We will use the MNIST dataset
where each sample is an image of a hand-written digit and has a corresponding label indicating the
value of the digit written (0, 1, . . . , 9). This makes it a multi-class classification problem.
You must use Tensorflow 2 to implement your neural networks. The implementation must be
for the CPU version only (no GPUs or MPI parallel programming is required for this assignment).
Follow the installation instructions at https://www.tensorflow.org/install in case you want to
use your local machine (we recommend using anaconda); you may also use colab to this assignment.
You can load the MNIST dataset directly in Tensorflow with the following code:
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x train, y train), (x test, y test) = mnist.load data()
2. (40 points) Implement a multi-layer fully connected neural network:
• Input: 1-channel input, size 28×28
• Fully connected layer 1: input with bias; output – 128 nodes
• ReLU activation function
• Fully connected layer 2: input – 128 nodes; output – 10 nodes
• Softmax activation function
• Use cross entropy as the loss function
• Use SGD as optimizer
• Set mini batch size as 32
Train using mini batches of the given batch size. Plot the cumulative training loss and
accuracy for every epoch. Once training is complete, apply the learned model to the test set
and report the testing accuracy.
Epoch: An epoch is a single pass through all the training data. Typically many epochs will
be run when training a neural network before it converges.
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along
with relevant equations. Report the cumulative training loss and accuracy for every epoch
as plots. Also report the testing accuracy (a single number).
(b) Code: Submit the file neural net.py which contains the function def neural net()
-> None:. The function does not have any inputs and does not return anything but must
print out to the terminal (stdout) the cumulative training loss and accuracy per epoch
as well as the testing accuracy.
3. (50 points) Implement a convolutional neural network with the following specifications.
• Input: 1-channel input, size 28×28
• Convolution layer: Convolution kernel size is (3, 3) with stride as 1. Input channels – 1;
Output channels – 20 nodes
• ReLU activation function
• Max-pool: 2×2 max pool
• Dropout layer with probability p = 0.50
• Flatten input for feed to fully connected layers
• Fully connected layer 1: flattened input with bias; output – 128 nodes
• ReLU activation function
• Dropout layer with probability p = 0.50
• Fully connected layer 2: input – 128 nodes; output – 10 nodes
• Softmax activation function
• Use cross entropy as loss function
For this problem, we will be experimenting with a variety of parameters.
First, train using SGD as the optimizer and mini batches of size 32. Plot the cumulative
training loss and accuracy for every epoch. Once training is complete, apply the learned
model to the test set and report the testing accuracy.
Second, train your network using mini batch sizes of [32, 64, 96, 128] and plot the convergence
run time vs mini batch sizes for each of the following optimizers: SGD, Adagrad, and Adam.
You should report 3 figures, one for each optimizer where each figure has mini batch size on
the x-axis and the convergence run time on the y-axis.
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along
with relevant equations. Report the cumulative training loss and accuracy for every epoch
as plots. Also report the testing accuracy (a single number). Also report the convergence
run time vs mini batch sizes for each mini batch size and optimizer above (3 plots).
(b) Code: Submit the file cnn.py which contains the function def cnn() -> None:. The
function does not have any inputs and does not return anything but must print out to
the terminal (stdout) the cumulative training loss and accuracy as well as the testing
accuracy (for the first part above). It must also print out the batch size and convergence
run time for each mini batch size and optimizer (for the second part above).
Additional instructions: Code can only be written in Python 3.6+; no other programming
languages will be accepted. One should be able to execute all programs from the Python command
prompt or terminal. Please specify instructions on how to run your program in the README file.
Each function must take the inputs in the order specified in the problem and display the textual
output via the terminal and plots/figures should be included in the report.
For each part, you can submit additional files/functions (as needed) which will be used by the
main file. In your code, you cannot use machine learning libraries such as those available from
scikit-learn for learning the models – exception being that you must use Tensoflow 2 for your neural
network implementations. You may also use libraries for basic matrix computations and plotting
such as numpy, pandas, and matplotlib. Put comments in your code so that one can follow the key
parts and steps in your code.
Your code must be runnable on a CSE lab machine (e.g., csel-kh1260-01.cselabs.umn.edu).
One option is to SSH into a machine. Learn about SSH at these links: https://cseit.umn.edu/
knowledge-help/learn-about-ssh, https://cseit.umn.edu/knowledge-help/choose-ssh-tool,
and https://cseit.umn.edu/knowledge-help/remote-linux-applications-over-ssh.
Instructions
Follow the rules strictly. If we cannot run your code, you will not get any credit.
• Things to submit
1. hw3.pdf: The report that contains the solutions to Problems 1, 2, and 3 including the
summary of methods and results.
2. neural net.py: Code for Problem 2.
3. cnn.py: Code for Problem 3.
4. README.txt: README file that contains your name, student ID, email, instructions
on how to run your code, any assumptions you are making, and any other necessary
details.
5. Any other files, except the data, which are necessary for your code.
Homework Policy. (1) You are encouraged to collaborate with your classmates on homework
problems, but each person must write up the final solutions individually. You need to list in the
README.txt which problems were a collaborative effort and with whom. (2) Regarding online
resources, you should not:
• Google around for solutions to homework problems,
• Ask for help on online,
• Look up things/post on sites like Quora, StackExchange, etc.
CSCI 5525: Machine Learning Homework 4
1. (35 points) In this problem, we consider Adaboost. Implement the Adaboost algorithm with
100 weak learners and apply it to the cancer dataset described above. For the weak learners,
use decision stumps (1-level decision trees). You must implement the decision stumps from
scratch. Use information gain as the splitting measure.
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along
with relevant equations. Report a plot of the classification error on both the train and
test sets as the number of weak learners increase. (One plot where the x-axis is the
number of weak learners from 1 to 100 and the y-axis is the classification error.)
(b) Code: Submit the file adaboost.py which contains the function def adaboost(dataset:
str) -> None:. The function takes in a string of the dataset filename and does not return anything but must print out to the terminal (stdout) the train and test classification
error rates as the number of weak learners increase.
2. (35 points) In this problem, we consider Random Forests. Implement the Random Forest
algorithm with 100 decision stumps. You must implement the decision stumps from scratch.
Use information gain as the splitting measure. Apply your Random Forest implementation
to the cancer dataset described above and do the following:
(i) Use m = 3 random attributes to determine the split of your decision stumps. Learn a
model for an increasing number of decision stumps in the ensemble. Compute the train
and test set classification error as the number of decision stumps increases.
1
https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28original%29
(ii) Vary the number of random attributes m from {2, . . . , p = 10} and fit a model using
100 decision stumps. Compute the train and test set classification error as the number
of random features m increases.
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along
with relevant equations. Report a plot of the classification error on both the train and
test sets for both (i) and (ii) above. (A total of 2 plots where for (i) the x-axis is the
number of decision trees and y-axis is the classification error, and (ii) the x-axis is the
number of random features m and y-axis is the classification error.)
(b) Code: Submit the file rf.py which contains the function def rf(dataset: str) ->
None:. The function takes in a string of the dataset filename and does not return anything
but must print out to the terminal (stdout) the train and test classification error rates
for (i) and (ii) above.
3. (30 points) In this problem, we consider k-means for image segmentation. We can use
k-means to cluster pixels with similar (color) values together to generate a segmented or
compressed version of the original image. Implement the k-means algorithm and apply it to
the provided image “umn csci.png”. For each k = {3, 5, 7}, generate a segmented image and
compute the cumulative loss (i.e., distortion measure from the lecture notes). (Note, it may
be helpful to test on a smaller version of the image “umn csci.png” to ensure your code works
but report final results on the full version.)
Please submit (a) summary of methods and results report and (b) code:
(a) Summary of methods and results: Briefly describe the approaches used above, along
with relevant equations. For each value of k = {3, 5, 7}, report the final (i.e., after kmeans has converged) segmented image and a plot of the cumulative loss during training.
(This will be 3 segmented images and 3 plots of the loss where the x-axis is the training
iteration number and y-xais is the loss value.)
(b) Code: Submit the file kmeans.py which contains the function def kmeans(image:
str) -> None:. The function takes in a string of the image to segment and does not
return anything but must print out to the terminal (stdout) the cumulative loss at each
iteration during training.
Additional instructions: Code can only be written in Python 3.6+; no other programming
languages will be accepted. One should be able to execute all programs from the Python command
prompt or terminal. Please specify instructions on how to run your program in the README file.
Each function must take the inputs in the order specified in the problem and display the textual
output via the terminal and plots/figures should be included in the report.
For each part, you can submit additional files/functions (as needed) which will be used by the
main file. In your code, you cannot use machine learning libraries such as those available from
scikit-learn for learning the models. However, you may now use scikit-learn for cross validation
and computing misclassification errors. You may also use libraries for basic matrix computations
and plotting such as numpy, pandas, and matplotlib. Put comments in your code so that one can
follow the key parts and steps in your code.
Your code must be runnable on a CSE lab machine (e.g., csel-kh1260-01.cselabs.umn.edu).
One option is to SSH into a machine. Learn about SSH at these links: https://cseit.umn.edu/
knowledge-help/learn-about-ssh, https://cseit.umn.edu/knowledge-help/choose-ssh-tool,
and https://cseit.umn.edu/knowledge-help/remote-linux-applications-over-ssh.
Instructions
Follow the rules strictly. If we cannot run your code, you will not get any credit.
• Things to submit
1. hw4.pdf: The report that contains the solutions to Problems 1, 2, and 3 including the
summary of methods and results.
2. adaboost.py: Code for Problem 1.
3. rf.py: Code for Problem 2.
4. kmeans.py: Code for Problem 3.
5. README.txt: README file that contains your name, student ID, email, instructions
on how to run your code, any assumptions you are making, and any other necessary
details.
6. Any other files, except the data, which are necessary for your code.
Homework Policy. (1) You are encouraged to collaborate with your classmates on homework
problems, but each person must write up the final solutions individually. You need to list in the
README.txt which problems were a collaborative effort and with whom. (2) Regarding online
resources, you should not:
• Google around for solutions to homework problems,
• Ask for help on online,
• Look up things/post on sites like Quora, StackExchange, etc.