Description
Question 1 (1 point):
Purpose: Jupyter Notebook
Degree of Diculty: Basic
References: https://jupyter-notebook.readthedocs.io/en/stable/
Open the document A1Q1.ipynb using Jupyter Notebook, and complete the given task.
Errata
1. None so far!
What to Hand In
• A PDF document exported from Jupyter Notebook, containing original document, with your name
and student number.
Evaluation
• 1 mark: Your version of the notebook is named A1Q1.pdf, and it contains your name and student number at the top.
Page 4
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 2 (4 points):
Purpose: To demonstrate the level of Python programming needed.
Degree of Diculty: Basic
References: https://docs.python.org/3/tutorial/
Open the document A1Q2.ipynb using Jupyter Notebook, and complete the given task.
Errata
1. None so far!
What to Hand In
• Your PDF version of this notebook named A1Q2.pdf, containing solutions to Tasks 1 and 2, and your
name and student number at the top.
Evaluation
• 2 marks: Your code cell for Task 1 uses Python (only – no imported modules) to show the given output.
• 2 marks: Your code cell for Task 2 uses Python (only – no imported modules) to show the given output
Page 5
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 3 (10 points):
Purpose: Python and Numpy
Degree of Diculty: Basic
References: https://numpy.org/devdocs/user/quickstart.html
Open the document A1Q3.ipynb using Jupyter Notebook, and complete the given task.
Errata
1. None so far!
What to Hand In
• Your PDF version of this notebook named A1Q3.pdf, containing solutions to Tasks 1 and 2, and your
name and student number at the top.
Be sure to include your name, NSID, student number, and course number at the top of all documents.
Evaluation
• 1 mark: Your answer two Task 1 Part 1 was correct.
• 2 marks: Your answer two Task 1 Part 2 was correct.
• 4 marks: Your code cell for Task 2 Part uses Python and Numpy to calculate the 30th Fibonacci number correctly.
• 3 marks: Your explanation for the behaviour of this script for large N is correct.
Page 6
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 4 (10 points):
Purpose: Python and Numpy
Degree of Diculty: Basic
References: https://numpy.org/devdocs/user/quickstart.html
Open the document A1Q4.ipynb using Jupyter Notebook, and complete the given task.
Errata
1. None so far!
What to Hand In
• Your PDF version of this notebook named A1Q4.pdf, containing solutions to the 10 parts of the question, and your name and student number at the top.
Be sure to include your name, NSID, student number, and course number at the top of all documents.
Evaluation
• 10 marks: Your answers show that you have basic mastery of Numpy.
Page 7
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 5 (9 points):
Purpose: Python and MatPlotLib
Degree of Diculty: Basic
References: https://matplotlib.org/tutorials/index.html
Open the document A1Q5.ipynb using Jupyter Notebook, and complete the given task.
Errata
1. None so far!
What to Hand In
• Your PDF version of this notebook named A1Q5.pdf, containing solutions to the 3 tasks in the question,
and your name and student number at the top.
Evaluation
• 3 marks: Your plot for task 1 shows the 3 data sets on a single set of axes. The plot has labels, a title,
and a legend.
• 3 marks: Your plot for task 2 shows the 3 data sets on individual axes arranged horizontally. The plots
have labels, and there is a title. It looks good.
• 3 marks: Your plot for task 3 shows the 3 data sets on individual axes arranged vertically. The plots
have labels, and there is a title. It looks good.
Page 8
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 6 (5 points):
Purpose: Python and Pandas
Degree of Diculty: Basic
References: https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html
Open the document A1Q6.ipynb using Jupyter Notebook, and complete the given task.
Errata
1. None so far!
What to Hand In
• Your PDF version of this notebook named A1Q6.pdf, containing solutions to the 3 tasks in the question,
and your name and student number at the top.
Evaluation
• 1 mark. For Task 1, you used read_csv() to load a datale into the notebook.
• 1 mark. For Task 2, you used describe() to display some information about the DataFrames.
• 1 mark. For Task 3, you used cov() to display some covariance information about the dataframe.
• 1 mark. For Task 4, you used hist() to display histograms for the dataframe.
• 1 mark. For Task 5, you used scatter_matrix() to display a visualization of the dataframe
Page 9
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 7 (12 points):
Purpose: Python and Pandas
Degree of Diculty: One or two parts might be a little challenging.
References: https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html
Open the document A1Q7.ipynb using Jupyter Notebook, and complete the given task.
Errata
1. None so far!
What to Hand In
• Your PDF version of this notebook named A1Q7.pdf, containing solutions to the 3 tasks in the question,
and your name and student number at the top.
Evaluation
• 1 mark. For Task 1, you used read_csv() to load a datale into the notebook.
• 1 mark. For Task 2, you used density() to display density estimation plots for the dataframe.
• 3 marks. For Task 3, you used Boolean array indexing to create separate DataFrames (one for each
label value) and then used density() to display density estimation for each dataframe.
• 4 marks. For Task 4, you plotted the density estimation for each of the columns in the original DataFrame,
and you have 3 densities allowing you to compare the distribution for each label in one plot. You have
4 such plots.
• 1 mark. For Task 5, you used Seaborn’s pairplot() to display visualization of the original dataframe.
• 2 marks. For Task 6, you commented on the two kinds of plots, and what they represent. You also
made some observations about some patterns in the data.
Page 10
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
Question 8 (5 points):
Purpose: To exercise the derivation of formulae involving probability.
Degree of Diculty: Easy. The actual derivation is easier than the explanation.
References: Lecture Notes 04, Math with LATEX in Jupyter Notebook
There is a notion in Bayesian statistics that yesterday’s posterior probabilities are today’s prior probabilities.
It’s an idea that suggests that learning should naturally combine data collected over time. It also reassures
us that choosing a prior can be based on data seen previously. To understand this notion we need to do
some math.
Task Suppose we collected data X1 yesterday, and used the data to calculate P(y|X1). Yesterday, the
prior that we assumed was P(µ) = Beta(µ|a, b). Today, we collected data X2, and we wish to calculate
P(y|X1, X2).
Derive an expression for P(µ|X1, X2) in terms of yesterday’s posterior P(µ|X1). This expression shows how
yesterday’s posterior can be used as if it were a prior.
Elaboration In the following, we’ll start with a review of the lecture material. Then we’ll think about what
happens with data X1 collected yesterday, and then more data X2 today. We could just throw away the
model based on X1, and start over with all the data. But we can be cleverer than that here.
In class we derived the following equation using Bayes’ Rule:
P(µ|X) = P(X|µ)P(µ)
P(X)
This was one of the steps in determining P(y|X) for a binary event Y . In this expression, P(µ) is the prior
distribution for µ, and P(µ|X) is the posterior. We assumed that P(µ) = Beta(µ|a, b), and we learned that
E [µ] = a
a+b
. We also saw (skipping some of the mathematical details) that:
E [µ|X] = m + a
N + a + b
where m and N come from the data X. The posterior P(µ|X) turns out also to be a Beta distribution over µ,
but it’s Beta(a1, b1), where a1 = m + a and b1 = N − m + b are new hyper-parameters. In eect, Beta(a1, b1)
summarizes everything we learned about µ from X.
Suppose we collected data X1 yesterday, and used the data to calculate P(y|X1). Yesterday, the prior that
we assumed was P(µ) = Beta(µ|a, b). Today, we collected data X2, and we wish to calculate P(y|X1, X2).
In class, we were able to show, by signicant hand-waving, that
P(y|X1, X2) = m1 + m2 + a
N1 + N2 + a + b
This is exactly what we’d get if we combined X1, X2 together. But it also shows that we can update our
uncertainty about y, by counting m2, N2 in X2, and re-using the counts m1, N1 from yesterday. Your work
in this question replaces some of the handwaving with a more formal derivation.
Hint: There’s no calculus involved, only the basic rules of probability; start with Bayes Rule. One possibly
confusing idea is that P(µ|X1) doesn’t look like a prior, because it is conditional. That’s true. But in this
context, we can look at the prior more in terms of its use, than by its notational syntax.
Errata
1. None so far!
Page 11
Department of Computer Science
176 Thorvaldson Building
110 Science Place, Saskatoon, SK, S7N 5C9, Canada
Telephine: (306) 966-4886, Facimile: (306) 966-4884
CMPT 423/820
Machine Learning
What to Hand In
• A Jupyter Notebook named A1Q8.ipynb with your derivation encoded in markdown, using LaTeX for
the math.
• A PDF document A1Q8.pdf exported from Jupyter Notebook, containing the derivation above. The
marker will primarily look at this, so make it presentable, like a report.
Be sure to include your name, NSID, student number, and course number at the top of all documents.
Evaluation
• 5 marks: Your derivation has P(µ|X1, X2) on the left hand side, P(µ|X) on the right hand side, and
your derivation applied valid rules of probability.
Page 12