## Description

Problem 1

Read Shannon’s 1948 paper ’A Mathematical Theory of Communication’. Focus on pages 1-19 (up

to Part II), the remaining part is more relevant for communication.

http://math.harvard.edu/ ctm/home/text/others/shannon/entropy/entropy.pdf

Summarize what you learned briefly (e.g. half a page).

Problem 2: Scraping, Entropy and ICML papers.

ICML is a top research conference in Machine learning. Scrape all the pdfs of all ICML 2017 papers

from http://proceedings.mlr.press/v70/.

1. What are the top 10 common words in the ICML papers?

2. Let Z be a randomly selected word in a randomly selected ICML paper. Estimate the entropy

of Z.

3. Synthesize a random paragraph using the marginal distribution over words.

4. (Extra credit) Synthesize a random paragraph using an n-gram model on words. Synthesize

a random paragraph using any model you want. Top five synthesized text paragraphs win

bonus (+50 points for homeworks and labs).

Problem 3: Starting in Kaggle.

1. Lets start with our first Kaggle submission in a playground regression competition. Make

an account to Kaggle and find https://www.kaggle.com/c/house-prices-advanced-regressiontechniques/

2. Follow the data preprocessing steps from https://www.kaggle.com/apapiu/house-prices-advancedregression-techniques/regularized-linear-models. Then run a ridge regression using α = 0.1.

Make a submission of this prediction, what is the RMSE you get?

(Hint: remember to exponentiate np.expm1(ypred) your predictions).

3. Compare a ridge regression and a lasso regression model. Optimize the alphas using cross

validation. What is the best score you can get from a single ridge regression model and from

a single lasso model?

4. Plot the l0 norm (number of nonzeros) of the coefficients that lasso produces as you vary the

strength of regularization parameter alpha.

1

5. Add the outputs of your models as features and train a ridge regression on all the features

plus the model outputs (This is called Ensembling and Stacking). Be careful not to overfit.

What score can you get? (We will be discussing ensembling more, later in the class, but you

can start playing with it now).

6. Install XGBoost (Gradient Boosting) and train a gradient boosting regression. What score

can you get just from a single XGB? (you will need to optimize over its parameters). We will

discuss boosting and gradient boosting in more detail later. XGB is a great friend to all good

Kagglers!

7. Do your best to get the more accurate model. Try feature engineering and stacking many

models. You are allowed to use any public tool in python. No non-python tools allowed.

8. Read the Kaggle forums, tutorials and Kernels in this competition. This is an excellent way

to learn. Include in your report if you find something in the forums you like, or if you made

your own post or code post, especially if other Kagglers liked or used it afterwards.

9. Be sure to read and learn the rules of Kaggle! No sharing of code or data outside the Kaggle

forums. Every student should have their own individual Kaggle account and teams can be

formed in the Kaggle submissions with your Lab partner. This is more important for live

competitions of course.

10. As in the real in-class Kaggle competition (which will be next), you will be graded based on

your public score (include that in your report) and also on the creativity of your solution.

In your report (that you will submit as a pdf file), explain what worked and what did

not work. Many creative things will not work, but you will get partial credit for developing

them. We will invite teams with interesting solutions to present them in class.

2