Description
Assignment overview. This assignment is designed to review gradient descent regression with regularization and probability theory. This Assignment requires you to load a cleaned version of the dataset House Sales[1], learn and predict using your own implemented linear models, and evaluate the models. Follow the steps as indicated and complete the tasks. You are expected to figure out details of syntax by consulting Python’s Help. Since the material builds from one question to the next, it will be easiest to do them in order. Please answer each question by copying and commenting as needed on the material that you produce in the Jupyder.
Questions:
- [25 marks] House data: This Assignment requires you to write a Python script for linear regression. You are not allowed to use any Python public libraries related to regression and metrics. You can compare the results of your program with sklearn, numpy, or scipy linear models, but the whole exercise is to write the algorithm yourself.
- Load the House sales dataset from the csv file and place them in a data-frame df. (Hint: You can use pandas.read_csv function. New in pandas? Click here ). Then generate and show various statistic summary using pandas.DataFrame.describe method. What is pandas dataframe? Using pandas.DataFrame methods split the dataset into target value Y (price) and feature matrix X (all feature columns). In addition, extract the sqft_living column into a feature vector name X_1.
- Write a function named linear_regression to implement Linear Regression without using public libraries related to regression. The inputs of this function should be predictor values (X or X_1), a target value (Y), a learning rate (lr), and the number of iterations (repetition). The function must build a linear model using gradient descent and output the model (params) and loss values per iteration (loss). Set the iteration to 10000 and calculate and show the mean squared error (MSE) for the models obtained from both X and X_1 predictors (hint: you might write another function named predict to predict the values based on X or X_1 and params) and plot the learning curve (loss) for both models in one figure (hint: use log scaling plot). Try different learning rates and show the results.
- Visualize the best-obtained model for X_1 using a scatter plot to show price vs area and plot the linear model. Then, visualize the best-obtained model for all features (X) using a scatter plot to show the predicted vs actual target values.
- Modify the linear_regression function in a way that applies Ridge regression, and name them linear_regression_Ridge. Then repeat the assignments 1.2 and 1.3. You can thereby use a fixed learning rate that you find appropriate, but you should try and plot different values for the regularization penalty alpha.
- Use linear_regression_Ridge and write a function named linear_regression_Ridge_momentum in which you add a momentum term. Try different momenta and plot the learning curves with and without momentum for a fixed learning rate.
- Modify the linear_regression_Ridge_momentum function in a way that it fits the feature vector X_1 with a polynomial of order 2 and name it as Calculate the MSE, plot the learning curve and show the quadratic model on the scatter plot (price vs area).
- [15 marks] Working with random numbers:
- Write a program that rolls two dice (randomly selects a number between 1 and 6 inclusive for each die). Repeat rolling dice 20 times. For each trial, you should add the two numbers that appear on each die and save it in a vector. Plot a histogram of the values you have gathered in the vector. Based on this histogram, what are the estimates for the probabilities of each number?
- Run your code for 1000 times. What is your estimation now?
- Change your program and assume that you have one fake die and one correct die. The fake die has 0 instead of 2. This means when you roll this defect die, a random number should be chosen among (1,0,3,4,5, and 6). Plot histogram of the values you have gathered in the vector. Calculate the probability of getting a 7 as the sum of the numbers appeared on the dice. What is the probability of getting 3?
- [Grad student only] Try the same problem with seven dice and plot the distribution together with a Gaussian fit. Report the mean and variance as calculated directly from the data as well as the fitted parameters.
- [10 marks] Theoretical limit of Gaussian classification: Given are two classes who both have one normal distributed attribute with variance 1. The first class is centered around m1=1, and the second class around
m 2=2. Calculate analytically the theoretical limit of the optimal classification accuracy. Provide your answer with a brief outline of the calculation as separate pdf file.
[1] This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015.