## Description

Problem 1. (10 points). Suppose AdaBoost is run on n training examples, and suppose on each

round that the weighted training error εt of the tth weak hypothesis is at most 1

2 − γ, for some

number γ > 0. Show that after T > ln n

2γ

2 rounds of AdaBoost the final combined classifier has zero

training error!

Problem 2. (10 points). Recall bagging. Starting from a training set S of size n, we created m

bootstrap training sets S1, . . . , Sm, each of size n each by sampling with replacement from S.

1. For a bootstrap sample Si

, what is the expected fraction of the training set that does not

appear at all in Si? As n → ∞, what does this fraction approach?

2. Let m > 2 ln n, and n → ∞. Show that the expected number of training examples in S that

appear in at least one Si

is more than n − 1.

Problem 3. (10 points). The tanh function is tanh(y) = (e

y − e

−y

)/(e

y + e

−y

). Consider the

function tanh(w0 + w1x1 + w2x2), with five inputs, and a scalar output.

1. Draw the computational graph of the function (you can use tanh in your computation graph).

2. What is the derivative of tanh(y) with respect to y.

3. Suppose (w0, w1, w2, x1, x2) = (−2, −3, 1, 2, 3). Compute the forward function values, and

back-propagation of gradients.

Problem 4. (30 points). Please see attached notebook for details.

1