CSCI 5521: Introduction to Machine Learning Homework 1 solution

$40.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (5 votes)

1. (30 points) Find the Maximum Likelihood Estimation (MLE) of θ in the following
probabilistic density functions. In each case, consider a random sample of size n. Show
your calculation:
(a) f(x|θ) = x
θ
2 exp {
−x
2

2 }, x ≥ 0
(b) f(x|α, β, θ) = αθ−αβx
β
exp{−(
x
θ
)
β}, x ≥ 0, α > 0, β > 0, θ > 0
(c) f(x|θ) = 1
θ
, 0 ≤ x ≤ θ, θ > 0 (Hint: You can draw the likelihood function)
2. (30 points) We want to build a pattern classifier with continuous attribute using
Bayes’ Theorem. The object to be classified has one feature, x in the range 0 ≤ x < 6. The conditional probability density functions for each class are listed below: P(x|C1) = ( 1 6 if 0 ≤ x < 6 0 otherwise P(x|C2) =    1 4 (x − 1) if 1 ≤ x < 3 1 4 (5 − x) if 3 ≤ x < 5 0 otherwise 0 1 2 3 4 5 6 x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 P P(x|C2) P(x|C1) (a) Assuming equal priors, P(C1) = P(C2) = 0.5, classify an object with the attribute value x = 2.5. (b) Assuming unequal priors, P(C1) = 0.7, P(C2) = 0.3, classify an object with the attribute value x = 4. 1 Instructor: Catherine Qi Zhao. TA: Prithvi Raj Botcha, Shi Chen, Suzie Hoops, James Yang, Yifeng Zhang. Email: csci5521.s2022@gmail.com 1 (c) Consider a decision function ϕ(x) of the form ϕ(x) = (|x − 3|) − α with one free parameter α in the range 0 ≤ α ≤ 2. You classify a given input x as class 2 if and only if ϕ(x) < 0, or equivalently 3 − α < x < 3 + α, otherwise you choose x as class 1. Assume equal priors, P(C1) = P(C2) = 0.5, what is the optimal decision boundary - that is, what is the value of α which minimizes the probability of misclassification? What is the resulting probability of misclassification with this optimal value for α? (Hint: take advantage of the symmetry around x = 3.) 3. (40 points) In this programming exercise you will implement three multivariate Gaussian classifiers, with different assumptions as follows: • Assume S1 and S2 are learned independently (learned from the data from each class). • Assume S1 = S2 (learned from the data from both classes). • Assume S1 = S2 (learned from the data from both classes), and the covariance is a diagonal matrix. What is the discriminant function in each case? Show in your report and briefly explain. For each assumption, your program should fit two Gaussian distributions to the 2-class training data in training data.txt to learn m1, m2, S1 and S2 (S1 and S2 refer to the same variable for the second assumption). Then, you use this model to classify the test data in test data.txt by comparing log P(Ci |x) for each class Ci , with P(C1) = 0.3 and P(C2) = 0.7. Each of the data files contains a matrix M ∈ R N×9 with N samples, the first 8 columns include the features (i.e. x ∈ R 8 ) used for classifying the samples while the last column stores the corresponding class labels (i.e. r ∈ {1, 2}). Report the confusion matrix on the test set for each assumption. Briefly explain the results. We have provided the skeleton code MyDiscriminant.py for implementing the classifiers. It is written in a scikit-learn convention, where you have a fit function for model training and a predict function for generating predictions on given samples. Use Python class GaussianDiscriminant for implementing the multivariate Gaussian classifiers under the first two assumptions, and GaussianDiscriminant Diagonal for the third one. To verify your implementation, call the main function hw1.py, which automatically generates the confusion matrix for each classifier. Note that you do not need to modify this file. Submission • Things to submit: 2 1. hw1 sol.pdf: a document containing all your answers for the written questions (including those in problem 3). 2. MyDiscriminant.py: a Python source file containing two python classes for Problem 3, i.e., GaussianDiscriminant and GaussianDiscriminant Diagonal. Use the skeleton file MyDiscriminant.py found with the data on the class web site, and fill in the missing parts. For each class object, the fit function should take the training features and labels as inputs, and update the model parameters. The predict function should take the test features as inputs and return the predictions. • Submit: All material must be submitted electronically via Gradescope. Note that There are two entries for the assignment, i.e., Hw1-Written (for hw1 sol.pdf) and Hw1-Programming (for a zipped file containing the Python code), please submit your files accordingly. We will grade the assignment with vanilla Python, and code submitted as iPython notebooks will not be graded. 3