CSCI 5521: Introduction to Machine Learning Homework 0 solution


Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment


5/5 - (5 votes)

1. (12 points) Answer the following questions:
(a) Which of the following courses have you taken?
i. Artificial Intelligence II
ii. Introduction to Data Mining
iii. (Advanced) Algorithms
(b) Have you taken any course on Probability/Statistics? If yes, please write down
the course title (not number) and department. 2
(c) Have you taken any course on Numerical Methods/Linear Algebra/Multivar Calculus? If yes, please write down the course title (not number) and department.2
2. (a) (9 points) Given a full rank matrix A ∈ R
m×n where m > n and B ∈ R
show how to solve the following system of equations:
AX = B
It is sufficient to outline the method at a high level.
(b) (9 points) What happens if A is not full rank? Briefly explain your answer.
3. (10 points) Given two vectors a and b, explain one way to calculate the distance
between the vectors (you can give an example). Show the same for two matrices A
and B.
Note: your distance should be one scalar number that represents the distance between
the two vectors/matrices.
Instructor: Catherine Qi Zhao. TA: Prithvi Raj Botcha, Shi Chen, Suzie Hoops, James Yang, Yifeng
Zhang. Email:
2For example, for the current course, the information will be: ‘Machine Learning Fundamentals, Department of Computer Science & Engineering.’ It is ok if you have taken the course at a different University, we
do not need that information.
4. (a) (15 points) Let D be a random variable for a given disease, assume that the
probability a person has the disease is 0.1. Based on this information, researchers
developed a new method to say if a person has the disease: for each 10 people that
do the test, they randomly report that 1 of them has the disease. Will the method
correctly identify if the person has the disease? Briefly explain your answer.
(b) (15 points) Another group of researchers developed a new blood test to identify
the same disease. The test result is given by a random variable X, with sensitivity
and specificity given by 0.7 and 0.8, respectively (that means p(X = 1|D = 1) =
0.7 and p(X = 0|D = 0) = 0.8). If a patient did the blood test and the result is
positive, what is the probability that the person has the disease?
Hint: you might want to use the Bayes Rule: p(b|a) = p(a|b)p(b)
5. (30 points) Below is the pseudo-code of perceptron algorithm for binary classification,
where (x
, yt
) is the t-th data sample: x
is the vector of attribute values (real numbers)
and y
t = ±1 is the class label for the t-th sample:
1. w = w0.
2. Do Iterate until convergence
3. For each sample (x
, yt
), t = 1, 2, · · ·
4. If y
⟨w, x
⟩ ≤ 0
5. w = w + y
Here “convergence” means w does not change at all over one pass through the entire
training dataset in the loop starting in step 3. A note on notation: x
t denotes the t-th
sample in the training data, which is found in the t-th row of the matrix X. This is
the notation used in the textbook. The transpose of a vector or matrix M is denoted
MT with an upper case T.
(a) Implement the perceptron algorithm ( and test it on the data
provided on the class web site. X ∈ R
is the feature matrix of N samples in 2
dimensions and y ∈ R
is the label vector (±1). Use initial value w0 = [1; −1]T
Now, run your perceptron algorithm on the given data. How many iterations does
it take to converge? What is the error rate of the resulting fit (i.e., how many
points are misclassified by this classifier)? The prediction on a single data point
can be computed by comparing ⟨w, x
⟩ with the threshold 0 (e.g., ⟨w, x
⟩ >= 0
then the predicted label is +1).
(b) Visualize all the samples (use 2 different colors for the 2 different classes), and
plot the decision boundary defined by the initial w0 (before training) and w
returned by the perceptron program (after training). The code for plotting
is included in, you can use it to verify your implementation
of the perceptron algorithm. It will save the plots under the name
“initial.png” and “perceptron.png”. Note that you do not need to
modify the file.
− −   


− −  


before iteration after convergence
• Things to submit:
1. hw0 sol.pdf: a document containing all your answers of Problem 1-4 and the two
plots asked by Problem 5.
2. a text file containing the python function for Problem 5 with
header def MyPerceptron(X, y, w0), where X is the feature matrix, y is a label
vector (±1) and w0 is the initial value for the parameter vector w. Use the skeleton
file found with the data on the class web site, and fill in the
missing parts. The output of the function should be a tuple consisting of the final
weight vector w, the number of iterations k, and the fraction of training samples
that are misclassified. Include the number of iterations, error rate, and the plots
of decision boundary on your PDF submission.
• Submit: All material must be submitted electronically via Gradescope. Note that
There are two entries for the assignment, i.e., Hw0-Written (for hw0 sol.pdf)
and Hw0-Programming (for a zipped file containing the Python code),
please submit your files accordingly. We will grade the assignment with vanilla
Python, and code submitted as iPython notebooks will not be graded. This homework
will not be graded but required as a proof of satisfying the prerequisites for taking the