Description
Question 1 (12 points):
Purpose: To observe the eect of missing data on classier performance.
Competency Level: Basic
On the Assignment 4 Moodle page, you’ll nd a ziple named A4Q1.ZIP, containing 5 versions of the Iris
dataset.
• iris.csv: The original dataset
• iris01.csv: The original dataset, with 1% of the features deleted.
• iris05.csv: The original dataset, with 5% of the features deleted.
• iris10.csv: The original dataset, with 10% of the features deleted.
• iris20.csv: The original dataset, with 20% of the features deleted.
The deletions were made by randomly choosing a row and a column, and deleting the value stored there.
Only feature values were deleted; not class labels. If you look at the CSV le, you’ll see commas with
nothing between them.
There are two problems to address. The practical problem of getting new data into the dataset, and then
choice of which data to put there. For the rst problem, we can look to Pandas methods. Here’s a good
place to start: https://jakevdp.github.io/PythonDataScienceHandbook/03.04-missing-values.html
Your main task is to explore how dierent approaches to dealing with missing data aects classier performance.
1. Remove any sample (row) with missing data.
2. Fill in any missing data with a random value. Use the range (min, max) of the feature as the range for
your random value.
3. Fill in the missing data with the mean value for the feature, regardless of class label.
4. Fill in the missing data with the mean value for the feature, given the class. For example, if the missing value appears on a row with class label L, then use the mean value of that feature, limiting the
calculation to rows having label L.
It might be fun to add to this list, by building a model to predict missing data, or to apply ExpectationMaximization, or by trying a few dierent classiers, but we don’t have time for that kind of fun.
You are free to use any classier you wish, as long as you use the same classier for all approaches. If you
choose one we haven’t studied yet, that’s ne, but you may need to read up about how it works so you can
understand the results.
To answer this question, plot average classication accuracy (using 5-fold cross-validation, for example)
against the proportion of missing data, for each of the approaches above. Also make a comment about
the results, as if you were explaining them to someone who has not done this experiment. There’s nothing
deep here, just express in words what you discovered.
What to Hand In
• A PDF exported from a Jupyter Notebook that plots the average classication accuracy (y axis) vs the
proportion of missing data (x-axis). All 4 approaches will be on the same plot. An explanation of your
ndings is included.
Page 2
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Evaluation
• 8 marks. You made a plot of classication accuracy vs the proportion of missing data for all 4 approaches.
• 4 marks. You commented on the plots, demonstrating understanding of the results.
Page 3
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 2 (24 points):
Purpose: To explore the use of Support Vector Machines for Classication.
Competency Level: Basic
On the Assignment 4 Moodle page, you’ll nd a Jupyter Notebook names A4Q2. Complete the tasks in the
notebook! The data for the notebook is generated procedurally by Scikit-Learn functions.
What to Hand In
• A PDF exported from your Jupyter Notebook with the answers to the questions.
Evaluation
1. 6 marks. Kernels!
• (2 marks) You description of the term *kernel* demonstrated understanding.
• (2 marks) Your description of the polynomial kernel demonstrated understanding.
• (2 marks) Your description of the Radial Basis function kernel demonstrated understanding.
2. 6 marks. Balance between error and model complexity.
• (2 marks) Your discussion on the behaviour of the linear kernel as C changes reects an understanding of SVM.
• (2 marks) Your discussion on the behaviour of the poly kernel as C changes reects an understanding of SVM.
• (2 marks) Your discussion on the behaviour of the rbf kernel as C changes reects an understanding of SVM.
3. 3 marks. Explore the Polynomial kernel
• (3 marks) Your explanation of the behaviour of the polynomial kernel with dierent degrees re-
ects an understanding of SVM.
4. 3 marks. Explore the Radial Basis Function kernel
• (3 marks) Your explanation of the behaviour of the Radial Basis Function kernel with dierent degrees reects an understanding of SVM.
5. 6 marks. Other datasets
• (3 marks) Your explanation for your choice of parameters for all three kernels reects an understanding of the application of SVM to the circ dataset created by make_circles.
• (3 marks) Your explanation for your choice of parameters for all three kernels reects an understanding of the application of SVM to the moon dataset created by make_moons.
Page 4
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 3 (12 points):
Purpose: To explore unsupervised clustering methods with KMeans and Gaussian Mixture Models.
Competency Level: Basic
On the Assignment 4 Moodle page, you’ll nd a Jupyter Notebook named A4Q3. Complete the tasks in
the notebook! The data for the notebook is in the le a4q3.csv.
What to Hand In
• A PDF exported from your Jupyter Notebook with tasks completed and answers to the questions.
Evaluation
• 3 marks. Step 3.
– You called the KMeans constructor with appropriate arguments and options.
• 3 marks. Step 4.
– You called the GMM constructor with appropriate arguments and options.
• 3 marks. Answer to question 1
• 3 marks. Answer to question 2
Page 5
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 4 (12 points):
Purpose: To explore the use of Principal Components Analysis for dimensionality reduction
Competency Level: Basic
On the Assignment 4 Moodle page, you’ll nd a Jupyter Notebook named A4Q4. Complete the tasks in
the notebook! The data for the notebook is in the le a4q4.csv.
What to Hand In
• A PDF exported from your Jupyter Notebook with tasks completed and answers to the questions.
Evaluation
• 3 marks: You built a classier using the whole dataset, and reported an accuracy value.
• 3 marks: You applied PCA, and visualized the rst two principle components.
• 3 marks: You built a classier using the rst two principle components, and reported an accuracy
value.
• 3 marks: You reported on the dierence in accuracy between your two classiers.
Page 6
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 5 (10 points):
Purpose: To clarify the HMM equations on a very small example. Don’t refer to the HMM formulae in your
notes for this. Just use the notation and probability basics you learned in A3. We’ll see how to derive
those formulae, so that they are less abstract.
Competency Level: Extended
The following diagram represents a simple, 3 step HMM represented as a 6 node Bayesian Network.
The Zi are the state variables; the Xi are the observable “evidence.” The state transition model is the CPD
P(Zi
|Zi−1). The sensor model or emission probabilities are the CPD P(Xi
|Zi). The initial state is described
by the CPD P(Z0). You don’t have numeric probabilities, so just do the algebra, as in the previous question.
Hint: use the notions of relevance and independence developed for Bayesian networks.
(a) Show that
P(Z0|X0) = P(X0|Z0)P(Z0)
P(X0)
.
Note: This is a simple form of the state estimation problem: given one observation, predict state Z0.
(b) Show that
P(X0) = X
Z0
P(X0|Z0)P(Z0).
Note: This is the denominator of the previous derivation. Not particularly interesting, because it is a
recapitulation of the numerator. See it there?
(c) Show that
P(Z1|X1, X0) =
P(X1|Z1)P(X0)
P
Z0
P(Z1|Z0)P(Z0|X0)
P(X1, X0)
.
Note: This is a simple form of the state estimation problem. Given two observations, predict the state
Z1. Especially, note the factor P(Z0|X0) within the formula. We already calculated that, above!
(d) Show that
P(X0, X1) = P(X0)
X
Z1
P(X1|Z1)
X
Z0
P(Z1|Z0)P(Z0|X0).
Note: This is the denominator of the previous derivation. Not particularly interesting, because it is a
recapitulation of the numerator.
Clarication: Notice that the factor P(X0) appears in the numerator (Q4.c) and in the denominator
(Q4.d). This factor is constant relative to the query variable Z of Q4.c. Mathematically it cancels (assuming that we’re not using impossible observations as evidence). As a result, we might give the
formula for Q4.c as follows:
P(Z1|X1, X0) = αP(X1|Z1)
X
Z0
P(Z1|Z0)P(Z0|X0).
Page 7
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
(e) Show that
P(Z2|X2, X1, X0) =
P(X2|Z2)P(X1X0)
P
Z1
P(Z2|Z1)P(Z1|X1, X0)
P(X2, X1, X0)
.
Note: You’ve done this 3 times now, deriving basically the same formula. This is the forward phase
of the forward-backward algorithm for HMMs, expressed recursively: P(Z2|X2, X1, X0) is expressed in
terms of P(Z1|X1, X0). The steps (a) through (e) reveal how the forward algorithm for HMMs works.
Clarication: The factor P(X1X0) in the numerator is constant relative to the query variable Z2. Because the denominator is derived from the expression in the numerator, it will appear there too. Mathematically it cancels (assuming that we’re not using impossible observations as evidence). As a result,
we might rewrite the formula for Q4.e as follows:
P(Z2|X2, X1, X0) = αP(X2|Z2)
X
Z1
P(Z2|Z1)P(Z1|X1, X0)
where α represents the quantity P (X1,X0)
P (X2,X1,X0)
. It’s sole purpose is to ensure that the numerator is normalized, and we don’t even need a formula to calculate it. When we have the numerator calculated
for every possible value of Z2, we add up all those values, and divide each numerator by that sum.
Hence, α is called a normalization factor, and usually no formula for it is given.
What to Hand In
• Five calculations. Add these answers to a document for questions 5-6, named A4.PDF. You can use
Jupyter notebooks, or if you prefer, a LATEX document. If necessary, you can submit a scanned document.
Evaluation
• 2 marks each.
Page 8
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 6 (6 points):
Purpose: To clarify the HMM equations on a very small example. Just use the notation and probability
basics you’ve been practicing so far.
Competency Level: Advanced
The following diagram represents a simple, 3 step HMM represented as a 6 node Bayesian Network, the
same one as before.
The Zi are the state variables; the Xi are the observable “evidence.” The state transition model is the CPD
P(Zi
|Zi−1). The sensor model or emission probabilities are the CPD P(Xi
|Zi). The initial state is described
by the CPD P(Z0). You don’t have any numeric probabilities, so just work on the algebra, as in the previous
question.
(a) Show that
P(X2|Z1) = X
Z2
P(X2|Z2)P(Z2|Z1)
Hint: Use conditional independence given Z2. Don’t sum over all nuisance variables!
(b) Show that
P(X2, X1|Z0) = X
Z1
P(X2|Z1)P(X1|Z1)P(Z1|Z0)
Hint: Use conditional independence given Z1.
Note: This might be the rst tricky derivation so far.
(c) Show that
P(Z1|X0, X1, X2) = P(X2|Z1)P(X1, X0)P(Z1|X1, X0)
P(X0, X1, X2)
Hint: Use conditional independence given Z1.
Note: The factor P(X2|Z1) was derived directly above, and the factor P(Z1|X1, X0) earlier in the assignment. This formula tells us how all the data aects the state at a single time point. Observable
evidence X2 aects Z1 from one direction using the backward algorithm; observable evidence X0, X1
aects Z1 from the other direction using the forward algorithm. To see this more clearly, we have to
use a larger HMM, but the derivations are essentially the same. See the next question.
Clarication: As before, the factor P(X1X0) in the numerator is constant relative to the query variable
Z2. As a result, we might rewrite the formula for Q5.c as follows:
P(Z1|X0, X1, X2) = αP(X2|Z1)P(Z1|X1, X0)
What to Hand In
• Three small calculations. Add these answers to a document for questions 5-6, named A4.PDF. You
can use Jupyter notebooks, or if you prefer, a LATEX document. If necessary, you can submit a scanned
document.
Evaluation
• 2 marks each.
Page 9