## Description

1. (12 points) Answer the following questions:

(a) Which of the following courses have you taken?

i. Artificial Intelligence II

ii. Introduction to Data Mining

iii. (Advanced) Algorithms

(b) Have you taken any course on Probability/Statistics? If yes, please write down

the course title (not number) and department. 2

(c) Have you taken any course on Numerical Methods/Linear Algebra/Multivar Calculus? If yes, please write down the course title (not number) and department.2

2. (a) (9 points) Given a full rank matrix A ∈ R

m×n where m > n and B ∈ R

m×k

,

show how to solve the following system of equations:

AX = B

It is sufficient to outline the method at a high level.

(b) (9 points) What happens if A is not full rank? Briefly explain your answer.

3. (10 points) Given two vectors a and b, explain one way to calculate the distance

between the vectors (you can give an example). Show the same for two matrices A

and B.

Note: your distance should be one scalar number that represents the distance between

the two vectors/matrices.

1

Instructor: Catherine Qi Zhao. TA: Prithvi Raj Botcha, Shi Chen, Suzie Hoops, James Yang, Yifeng

Zhang. Email: csci5521.s2022@gmail.com

2For example, for the current course, the information will be: ‘Machine Learning Fundamentals, Department of Computer Science & Engineering.’ It is ok if you have taken the course at a different University, we

do not need that information.

1

4. (a) (15 points) Let D be a random variable for a given disease, assume that the

probability a person has the disease is 0.1. Based on this information, researchers

developed a new method to say if a person has the disease: for each 10 people that

do the test, they randomly report that 1 of them has the disease. Will the method

correctly identify if the person has the disease? Briefly explain your answer.

(b) (15 points) Another group of researchers developed a new blood test to identify

the same disease. The test result is given by a random variable X, with sensitivity

and specificity given by 0.7 and 0.8, respectively (that means p(X = 1|D = 1) =

0.7 and p(X = 0|D = 0) = 0.8). If a patient did the blood test and the result is

positive, what is the probability that the person has the disease?

Hint: you might want to use the Bayes Rule: p(b|a) = p(a|b)p(b)

p(a)

5. (30 points) Below is the pseudo-code of perceptron algorithm for binary classification,

where (x

t

, yt

) is the t-th data sample: x

t

is the vector of attribute values (real numbers)

and y

t = ±1 is the class label for the t-th sample:

1. w = w0.

2. Do Iterate until convergence

3. For each sample (x

t

, yt

), t = 1, 2, · · ·

4. If y

t

⟨w, x

t

⟩ ≤ 0

5. w = w + y

tx

t

Here “convergence” means w does not change at all over one pass through the entire

training dataset in the loop starting in step 3. A note on notation: x

t denotes the t-th

sample in the training data, which is found in the t-th row of the matrix X. This is

the notation used in the textbook. The transpose of a vector or matrix M is denoted

MT with an upper case T.

(a) Implement the perceptron algorithm (MyPerceptron.py) and test it on the data

provided on the class web site. X ∈ R

N×2

is the feature matrix of N samples in 2

dimensions and y ∈ R

N×1

is the label vector (±1). Use initial value w0 = [1; −1]T

.

Now, run your perceptron algorithm on the given data. How many iterations does

it take to converge? What is the error rate of the resulting fit (i.e., how many

points are misclassified by this classifier)? The prediction on a single data point

x

t

can be computed by comparing ⟨w, x

t

⟩ with the threshold 0 (e.g., ⟨w, x

t

⟩ >= 0

then the predicted label is +1).

(b) Visualize all the samples (use 2 different colors for the 2 different classes), and

plot the decision boundary defined by the initial w0 (before training) and w

returned by the perceptron program (after training). The code for plotting

is included in hw0.py, you can use it to verify your implementation

of the perceptron algorithm. It will save the plots under the name

“initial.png” and “perceptron.png”. Note that you do not need to

modify the file.

2

− −

−

−

−

−

− −

−

−

−

−

before iteration after convergence

Submission

• Things to submit:

1. hw0 sol.pdf: a document containing all your answers of Problem 1-4 and the two

plots asked by Problem 5.

2. MyPerceptron.py: a text file containing the python function for Problem 5 with

header def MyPerceptron(X, y, w0), where X is the feature matrix, y is a label

vector (±1) and w0 is the initial value for the parameter vector w. Use the skeleton

file MyPerceptron.py found with the data on the class web site, and fill in the

missing parts. The output of the function should be a tuple consisting of the final

weight vector w, the number of iterations k, and the fraction of training samples

that are misclassified. Include the number of iterations, error rate, and the plots

of decision boundary on your PDF submission.

• Submit: All material must be submitted electronically via Gradescope. Note that

There are two entries for the assignment, i.e., Hw0-Written (for hw0 sol.pdf)

and Hw0-Programming (for a zipped file containing the Python code),

please submit your files accordingly. We will grade the assignment with vanilla

Python, and code submitted as iPython notebooks will not be graded. This homework

will not be graded but required as a proof of satisfying the prerequisites for taking the

clas