## Description

Problem 1. (10 points). Design the decision tree for the tennis data using Gini impurity measure. Compute the Gini measure for all attributes at each node, and continue until all the examples

are correctly labeled by the tree.

Problem 2 (20 points). Consider the training set given in Table 1. The attribute “Shirt Size

Fine” is a refinement of the attribute “Shirt Size”, wherein the value “Medium” has been further

categorized into two values “Small-Medium” and “Large-Medium”. The goal of this problem is to

see the reasonably intuitive assertion that the information gain is higher for an attribute that is a

refinement of another.

Note: when computing information gain, use base 2 logarithm.

1. What is the entropy of the labels?

2. Compute the information gain for “Shirt Size” and “Shirt Size Fine”. Which is higher?

3. A function f is called concave if for x, y, and any 0 ≤ λ ≤ 1,

f(λx + (1 − λ)y) ≥ λf(x) + (1 − λ)f(y). (1)

The logarithm function is concave. You can assume that as a fact for all other parts of this

assignment. For this part, you have to show that Equation (1) holds for λ = 1/2, and f is

the logarithm function.

4 The following inequality is called as the log-sum inequality. For positive x1, x2, y1, y2,

x1 log x1

y1

+ x2 log x2

y2

≥ (x1 + x2) log x1 + x2

y1 + y2

. (2)

Prove this using the concavity of logarithms.

1

Gender Car Type Shirt Size Shirt Size Fine Label

M Family Small Small +

M Sports Medium Small-Medium +

M Family Large Large −

F Sports Small Small +

M Sports Extra Large Extra Large +

F Luxury Small Small −

M Family Medium Large-Medium −

M Sports Extra Large Extra Large +

M Sports Large Large +

F Luxury Medium Large-Medium −

F Sports Medium Large-Medium +

F Family Small Small +

F Luxury Large Large −

M Luxury Medium Small Medium −

F Family Medium Small-Medium +

Table 1: Training Data

5

∗ We will show that part 2 of this problem can be generalized as follows. Consider a training

set of any size with the four features as in Table 1, again with the property that “Shirt Size

Fine” is a refinement of the attribute “Shirt Size”. Show that the information gain for

“Shirt Size Fine” is always at least that for “Shirt Size”. This is a heuristic justification for

the fact that IG picks attributes that have more possibilities.

(hint: Suppose nm are the number of medium’s, and nml and nms are the number of smallmedium, and large medium respectively. then nml + nms = nm. You may also want to define

terms such at n

+

m which are the number of medium’s with +ve labels). You may have to use

Equation (2) carefully!

Problem 3. (10 points). Consider the training set given in Table 2. There are nine examples,

each with three features. Feature 1 and Feature 2 are binary, and Feature 3 is continuous.

1. For Feature 1 and Feature 2, compute the information gain with respect to the examples.

2. To use a feature that takes continuous values, we fix a threshold and categorize the continuous

feature depending on whether it is greater than the threshold or not. For example, in the

given example, if the threshold is fixed at 4.5, we convert Feature 3 into a categorical feature,

{S, G, G, S, G, S, G, G, G} where S, G imply that the value is smaller than and greater than

the threshold respectively.

For Feature 3, compute the information gain with respect to the threshold values 2.5, 3.5,

4.5, 5.5, 6.5, and 7.5. Which threshold has the highest information gain?

3. Which feature will be chosen as the root node according to the information gain criterion,

feature 1, 2, or 3? If feature 3 is chosen, please specify the threshold.

2

Feature 1 Feature 2 Feature 3 Label

T T 1.0 +

T T 6.0 +

T F 5.0 −

F F 4.0 +

F T 7.0 −

F T 3.0 −

F F 8.0 −

T F 7.0 +

F T 5.0 −

Table 2: Training Data

4. Construct any decision tree that gives correct answers for all the training examples. You

don’t need to follow the information gain or Gini criterion. You can use different thresholds

for feature 3 in different branches of the tree.

Problem 4. Decision Trees (30 points). See attached jupyter notebook for details.

3