Description
1. Consult the AdaBoost algorithm given in Bishop Chapter 14. Suppose there are two weak leaners
β1 and β2, and a set of 17 points.
a) Let say β1 makes one mistake and β2 makes four mistakes on the dataset. Which leaner
will AdaBoost choose in the first iteration (namely m=1)?
b) What is πΌπΌ1?
c) Calculate the data weighting co-efficient π€π€2 for the following two cases (1) the points on
which chosen leaner made a mistake and (2) the points on which the chosen leaner did
not make a mistake. [10 Points]
2. The diagram shows training data for a binary concept where a heart denotes positive examples.
Also shown are three decision stumps (A, B and C) each of which consists of a linear decision
boundary. Suppose that AdaBoost chooses A as the first stump in an ensemble and it has to decide
between B and C as the nest stump. Which will it choose? Explain. What will be the Ο΅ and Ξ± values
for the first iteration? [5 Points]
3. Consider cluster 1D data with a mixture of 2 Guassian using the EM algorithm. You are given the
ID data points x = [1 10 20]. Suppose the output of the E step is the following matrix
π
π
= οΏ½
1 0
0.4 0.6
0 1
οΏ½
where entry ππππ,ππ is the probability of observation π₯π₯ππ belonging to cluster ππ ( the responsibility of cluster c
for data point i). You have to compute the M step. You may state the equations for maximum likelihood
estimates of these quantities ( which you should know) without proof; you just have to apply the
equations to this data set. You may leave your answer in fractional form.
a. Write down the likelihood function you are trying to optimize.
b. After performing M step for the missing weights ππ1, ππ2, what are the new values?
c. After performing M step for the means ππ1, ππ2, what are the new values?
[10 Points]
4. In the following figure some data points are shown which lie on integer grid. Suppose we apply
the K-means algorithm to this data, using K =2 and with the centers initialized at the two circled
data points. Draw the final clusters obtained after K-means converges. [5 Points]
5. Consider training a two-input perceptron. Give an upper bound on the number of training
examples sufficient to assure with 90% confidence that the learned perceptron will have true
error of at most 5%? [5 Points]
6. The VC dimension is always less than size of the hypothesis space. True/False?
[5 Points]
7. Computational Learning Theory
[10 Points]
(a) Consider the class C of concepts of the form: (ππ β€ π₯π₯1 β€ ππ) ᴧ (ππ β€ π₯π₯2 β€ ππ). Note that each
concept in this class corresponds to a rectangle in 2-dimensions. Let a, b be integers in the range
[0, 199] and c, d be integers in the range [0, 99]. Give an upper bound on the number of training
examples sufficient to assure that for any target concept c Π C, any consistent learner using H = C
will, with probability 0.99, output a hypothesis with error at most 0.05.
(b) Consider the class C of concepts of the form: (ππ β€ π₯π₯1 β€ ππ) ᴧ (ππ β€ π₯π₯2 β€ ππ) ᴧ (ππ β€ π₯π₯3 β€ ππ).
Note that each concept in this class corresponds to a hyper-rectangle in 3-d. Now suppose that a,
b, c, d, e, f take on real values instead of integers. Give an upper bound on the number of training
examples sufficient to assure that for any target concept c Ο΅ C, a learner will, with probability 0.95,
output a hypothesis with error at most 0.01.