CMPT 423/820 Machine Learning Assignment 2 Simple Classifiers solution

$29.99

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (4 votes)

Question 1 (15 points):
Learning Objectives: • Practical experience building Naive Bayes Classiers
• Practice critically evaluating the performance of classiers.
Competency Level: Basic
The IRIS dataset (used in A1 and also in lecture) has 4 continuous features/attributes, and the class label.
We saw in class that we can get pretty good accuracy using one Gaussian Naive Bayes Classier (GNBC)
when all 4 features/attributes are used.
In this question, we will build 4 dierent GNBCs, each classier using only one of the features/attributes
to t the model. In other words, the rst classier will use feature/attribute/column 1, the second classier
will use feature/attribute/column 2, etc.
1. Build the four 1-feature classiers, and calculate the accuracy of each.
2. Build the 4-feature classier (as we saw in class), and calculate the accuracy.
3. Reproduce the density plots from A1Q7 Task 4 that shows the class density for each feature, and
compare the density plots to the accuracy scores you obtained. In a few sentences discuss how the
density plot relates to the accuracy score.
4. Compare the best 1-feature classier to the 4-feature classier, in terms of accuracy. Discuss briey
your results.
Errata
1. None so far!
What to Hand In
A PDF document exported from Jupyter Notebook, containing the tasks and discussions above, with your
name and student number at the top of the document, as in Assignment 1.
• Make sure that your document is well-structured, using headings and providing discussion in Markdown cells.
• Make sure that the markers can read your document and grade it easily.
Evaluation
• You constructed the four 1-feature classiers, and calculated their accuracies.
– 3 marks. Your Python scripting was neat and presentable. You made good use of Python comments, and Markdown cells to explain your method to a reader.
– 2 marks. You calculated the accuracies correctly, and presented them neatly.
• You discussed the relation between accuracy of each 1-feature classier, and the graphical visualization provided by the class density for each feature.
– 4 marks. Your discussion highlighted the visual clues that might indicate dierences in accuracy.
– 2 marks. Your discussion was not too long! Seriously, keep it to the point.
• You compared the best 1-feature classier to the 4-feature classier in terms of accuracy.
– 2 marks. Your discussion was relevant.
– 2 marks. Your discussion was not too long.
Page 2
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 2 (10 points):
Learning Objectives: • Critically assess a dataset based on visualization.
Competency Level: Basic
This question is preparation for Question 3. It’s a separate question to prevent your answer for Q3 from
being too cluttered.
On the Assignment Moodle page, you’ll nd a dataset named A2Q2.cvs. This dataset has 14 columns. The
rst column is the class label, using the integers 1-3 as labels. The remaining columns are continuous
features.
Plot the class densities for all 13 features, similar to A1Q7 Task 4. Comment on each feature, relating the
visualization to its potential utility in a classier (based on your experience from Q1).
Answer the following questions:
• Which, if any, of the 13 features, would you pick as the single feature in a 1-feature classier? Briey
explain your answer.
• Prior to building a classier, do you think a classier based on this data will have high accuracy? Briey
explain your answer.
Errata
1. None so far!
What to Hand In
A PDF document exported from Jupyter Notebook, containing 13 density plots, and brief discussion, with
your name and student number at the top of the document, as in Assignment 1.
• Make sure that your document is well-structured, using headings and providing discussion in Markdown cells.
• Make sure the answers to the questions are easy to nd!
• Make sure that the markers can read your document and grade it easily.
Evaluation
• 4 marks: Your density plots are correct, and neatly presented.
• 6 marks: You answers to the questions demonstrate you’ve assessed the features critically.
Page 3
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 3 (15 points):
Learning Objectives: • TO critically compare dierent models on the same dataset.
Competency Level: Basic
On the Assignment Moodle page, you’ll nd a dataset named A2Q3.cvs. This dataset has 14 columns. The
rst column is the class label, using the integers 1-3 as labels. The remaining columns are continuous
features. Use this dataset to compare three classiers:
1. K-Nearest Neighbours Classier. Remember that you’ll have to choose K.
2. Naive Bayes Classier
3. Decision Tree Classier.
To keep things interesting, use f1 as the metric for comparison.
Discuss the performance of the three classiers. Which, if any, would you choose as the best model for
the data? Explain your answer.
To complete this question, you’ll have to research the Scikit-Learn User Manual to use KNN and Decision
Trees.
Errata
1. None so far!
What to Hand In
A PDF document exported from Jupyter Notebook, with your name and student number at the top of the
document, as in Assignment 1.
• Make sure that your document is well-structured, using headings.
• Make sure that you’ve used the Scikit-Learn models correctly.
• Document any decisions about parameter choices, etc, in Markdown cells close to your scripts.
• Address the discussion comparing classiers in Markdown cells at the end of your document.
• Make sure the dierent parts of your solution to the question are easy to nd!
• Make sure that the markers can read your document and grade it easily.
Evaluation
• 3 marks: You tted the KNN classier appropriately by choosing k, and other parameters to the model.
• 3 marks: You tted the Decision Tree classier appropriately by choosing appropriate parameters to
the model.
• 1 mark: You tted the Naive Bayes classier appropriately.
• 8 marks: Your discussion of the performance of the classiers arrived at a well-reasoned conclusion.
Page 4
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 4 (15 points):
Learning Objectives: • To use theoretical principles to adapt software implementations of Naive Bayes
to handle mixed data.
Competency Level: Advanced
Currently, Scikit-Learn has 4 implementations of Naive Bayes. Each implementation assumes that all the
features have the same kind of feature distribution. For example, the Scikit-Learn implementation of Gaussian Naive Bayes Classier assumes that all features are numeric, and the histogram of the features given
the class label are more-or-less bell shaped around a mean value. On the other hand, the Scikit-Learn
implementation of the Categorical Naive Bayes Classier assumes that all features are categorical. This is
a limitation of the software, not a theoretical limitation of Naive Bayes.
In this question, you are invited to explain, or describe how we could use these two classiers to handle
mixed data.
This is an open-ended question. You could address it at a number of dierent levels of detail.
1. Informally. You can describe what you would do without going into a lot math or Python scripting. This
could show that you have some ideas, and the ideas well-considered.
2. Formally. You can start with the Naive Bayes Formula, and produce a revised version that shows how
two dierent Naive Bayes classiers can be combined. This would show that you were able to take
your informal ideas and derive a formula that correctly shows how it can be done.
3. Practically: You can take your informal or formal basis, and give a demonstration of the basis in terms
of the software provided by Scikit-Learn. This would show that you have substantiated your ideas
empirically.
You can decide how you want to answer this question. If you have time, or interest, you can do more work,
for more marks. If you have less time, then address it informally, for less marks.
Note: An informal description is valuable, even though it is worth less marks. Your choice here is a
compromise between time, and eort. It’s not a failure to give an informal answer.
Errata
1. None so far!
What to Hand In
A PDF document exported from Jupyter Notebook, with your name and student number at the top of the
document, as in Assignment 1.
• Make sure that your document is well-structured, using headings, LATEX math as appropriate to your
answer, and Python demonstrations, if you get to that.
Evaluation
For this question, we will apply the following rubric.
• Zero marks: You submitted nothing.
• 10 marks: Your approach.
– You submitted an informal description of an approach.
∗ 6/10 marks: Your description is clear, and your approach will lead to a correct combination.
∗ 3/10 marks: Your description was vague, or does not lead to a correct combination.
Page 5
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
– You submitted a formal derivation.
∗ 10/10 marks: Your formal derivation is completely correct.
∗ 7/10 marks: Your derivation didn’t get to the right answer, or made mathematical errors along
the way.
• 5 marks: You were able to demonstrate your approach in software, using a mixed dataset.
Page 6