## Description

Part I – Short Answer (Show Points/Results) – 5 points each, 40 points total

1. Given the following feature vector , what would a

categorical representation of this feature vector be if we assumed discrete

categories with values as , as , and as ?

2. Given a binary classification problem with classes , draw a Confusion Matrix

showing result counts in terms of Predicted and Actual class.

Provide calculations for Accuracy and Error Rate, highlighting False Positives and False

Negatives as functions of these result counts.

3. For frequent itemsets , show the difference between the Confidence

vs the Interest Factor (Lift) for the Association Rule . What value

does Lift take into account that Confidence does not?

x = [4.4,5.1, − 3.7,2.1, − 1.9]

x ≤ − 2.5 A −2.5 < x < 2.5 B x ≥ 2.5 C

{C1,C2}

( f

11, f

10, f

1+, f

0+, . . . )

(FP, FN )

{{A, B}, {C}} c

{A, B} ⟹ {C}

4. Given a dataset with observations, what is the size of the training set if we choose

to hold out records as a test set? If we allow for , what does the

corresponding training set size approach?

5. With a data set containing features and observations, what is the

dimensionality of the covariance matrix of the predictors? If we were to represent the

predictors with a multivariate normal (Gaussian) distribution, how many distribution

parameters would need to be estimated from the feature data?

6. Given the following point observations: and , what would the

length of each vector in terms of the Manhattan and Euclidean norms be

defined as? Would the distance between the two points be larger under the or

norm?

n

n

k

k → n

d = 15 N = 12,000

x1 = [3,4] x2 = [5,12]

(L1, L2)

L1 L2

7. Draw the 2-way contingency table for a binary association rule ,

containing presence/absence counts . Interest Factor (Lift) can

be interpreted as a conditional probability , show this probability in terms

of these counts.

8. For a binary association rule , show that the coefficient for the rule’s

correlation measure is not invariant under null addition (unchanged with added

unrelated data) in terms of changes to the relevant counts .

{A} ⟹ {B}

( f

11, f

10, f

1+, f

0+, . . . )

P(A, B)

P(A)P(B)

{a} ⟹ {b} ϕ

( f

11, f

10, f

1+, f

0+, . . . )

Part II – Long Answer (Show Reasoning/Calculations) – 10 points each, 40 points total

1. Show the cosine similarity of the two vectors and . Results

can be kept in formula form in terms of the component values of and (calculation

of final value not required).

2. Given a classifier with True Positives/Negatives and False Positives/

Negatives , what is the highest Recall value that a model can achieve?

Define the Recall measure via . How can one design a simple

model which achieves the maximum value for Recall?

x = [3,4,5] y = [5,12,13]

x y

(TP, TN )

(FP, FN ) r

(TP, TN, FP, FN )

3. Given the following transactions: , with

, what itemsets would be frequent? What would be the support of

the association rule: be? What would the confidence of this rule be?

Given the value, would this be a valid rule that is extracted via the Apriori

Algorithm?

4. Given a data matrix with features/columns with a total variance of 100, an

analyst performs a PCA via eigenvalue decomposition, with the resulting eigenvalues

as . If the analyst wishes to reduce dimensionality with of

variance explained, how many dimensions would the analyst be able to reduce their

selection to? What would be the standard deviations of the data for each these

selected dimensions?

{a, b, c}, {a, c}, {b, c}, {a}, {b}, {c}

minsup = 60 % s

{a} ⟹ {c} c

minsup

D d = 5

[35,25,20,15,5] 80 %

σi

Part III – Essay Question (Show Argument/Proof) – 20 points each, 20 points total

1. Given a decision tree node containing records, half of which belong to Class and

the other half which belong to Class , show the impurity of the node under the

Entropy, Gini, and Misclassification Error measures. What would be the value of these

measures be for the child nodes, assuming an optimal split is performed? (Hint:

Assume ).

10 CA

CB I

0 log2 0 = 0

Lucky 7 – Bonus Questions (Industry News, AI/ML Topics) – 1 point each, 7 points total

1. What model recently released by DeepMind allows for accurate prediction of 3-

dimensional shape of a protein molecule given input amino acids?

2. Which firm recently fired its head of AI ethics, shortly after the controversial

departure of one of its senior researchers?

3. What family of algorithms were recently developed which are able to solve classic

treasure hunting video games such as Pitfall on Atari?

4. What disease was IBM able to predict the onset of based on changes in writing/

language via the use of machine learning models?

5. What category of modified videos did a consortium led by Facebook/Microsoft/

Cornell/MIT recently introduce a detection challenge for?

6. Which firm recently released a new image recognition algorithm that was trained on

over 1 billion images, but did not require manual labels?

7. What quantum computing goal was recently achieved by Google which was revealed to

the public via NASA?