## Description

Background (least squared regression):

Least squared regression is a popular method to find the line of best fit. Although I wanted to go

over how to do it in class, we don’t have time to do it. I’ll do my best to explain it through these

words and examples on this paper L

The goal is to calculate the slope (m) and y-intercept (b) in the equation of the line:

� = �� + �

The steps to compute the line of best fit for � ordered pairs:

1. For each point (x, y), calculate x2 and xy

2. Find ∑ � , ∑ � , ∑ �! , ∑ ��

3. Calculate the slope (N is the number of ordered pairs):

� = � ∑ �� − ∑ � ∑ �

� ∑ �! − (∑ �)!

� = (20)(16718.5006) − (505.748847)(507.922204)

20(16655.073) − 505.748847! = 1.00219

4. Calculate the y-intercept:

� = ∑ � − � ∑ �

�

� = 507.922204 − 1.00219(505.748847)

20 = 0.05327206

5. Make our equation � = �� + �

� = 1.00219� + 0.05327206

The graph is shown below (I used Excel not Python). The line of best fit is graphed and so are

the points that we used to find the line of best fit.

So, now that you’ve seen the algebraic method, let’s see the linear algebra method!

The setup is based on this matrix equation:

� = � ;

�

� <

� is a nx1 matrix of y-coordinates

X is an nx2 matrix where the first column is the x-coordinate. The second column is 1 for matrix

multiplication purposes.

To find the slope (m) and the y-intercept (b), use…

;

�

� < = (�”�)#$�”�

Let’s use the same points as last time to find the best fit line with this method.

Note: X (not x) is a matrix and it looks like this:

The first column has all of the x’s (like in the previous example). The second column is full of

1’s. This is for the y-intercept.

The y is the same as in the last example.

The calculations are as follows:

�”� = =16655 505.749

505.75 20 >

(�”�)#$ = =

0.00025867 −0.006541

−0.006541 0.21540568>

�”� = ;

16718.5006

507.922204<

;

�

� < = (�”�)#$�”� = ;

1.0022

0.0533<

And we get the same results! J

Task:

1. Take a close look at the lin_reg.py file. There are four empty functions:

least_sq(file_name) and mat_least_sq(file_name) and

predict(file_name, x) and plot_reg(file_name, using_matrix). Read

through all of their descriptions carefully. Remember, you will lose points if you do

not follow the instructions. We are using a grading script

Summary of function tasks

least_sq(file_name):

Given the csv file_name, find the slope and y-intercept of the data using algebraic least

squares (the first linear regression presented). You need to return the slope and yintercept IN THAT ORDER. Round the slope and y-intercept to four decimal places.

mat_least_sq(file_name):

Given the csv file_name, find the slope and y-intercept of the data using linear

algebraic least squares using matrices (the second linear regression presented). You

need to return the slope and y-intercept IN THAT ORDER. Round the slope and yintercept to four decimal places.

predict(file_name, x):

Given the csv file_name and an input value x, predict what the output would be using

the equation that is derived from mat_least_sq(). This means that you should be

calling mat_least_sq() in this function. Round the predicted output to four decimal

places before returning the value.

plot_reg(file_name, using_matrix):

Given the csv file_name and an indicator of which linear regression method to use

using_matrix, output a graph of the data points and the line of best fit.

• If using_matrix=False, then you should be plotting your results from

least_sq. You should be using red for everything in the graph with X markers for

the data points.

• If using_matrix=True, then you should be plotting your results from

mat_least_sq. You can use any color but the default blue and red. You can use any

data point marker except for the default dot and X.

plot_reg() should not return anything. Your graphs should also contain the

following:

• Labeled x axis

• Labeled y axis

• Graph Title

• Legend (see example for details)

Some important notes:

• For consistency’s sake, do not round until the very end. Meaning you should not

round anything until you return your answers.

• Hint: to plot the best fit line, find the smallest and largest x-coordinate. Plug these xcoordinates into the linear equation and plot them.

• If you want to create extra functions/methods to assist you, feel free to do so.

However, we will only be testing the three functions that are originally in the file.

• If you use any library’s linear regression or least squares method function, you will

get an automatic zero. You must implement this on your own!

2. Your job is to implement all four of these functions so that it passes all test cases. We

provide one csv file for you to test on (data.csv), but we will be using other data

sets and csv files to check if your work is correct.

3. By running the test case provided (data.csv), you should get the following

results:

Note: your “matrix using least squares” graph may have different colors and

markers from mine.

In NO CASE should your graphs have the dot marker or the blue color shown

above!

4. If you feel confident in your program so far, run your program after changing the test

case’s csv_file from “data.csv” to “data2.csv”

5. Take screenshots of the two graphs you obtain (one from using algebraic least

squares and the other from matrix least squares). Put these two screenshots in a pdf

or word file. You will be submitting this with your py and txt files

6. After completing these functions, comment out the test cases (or delete them) or

else the grading script will pick it up and mark your program as incorrect. Ensure

that you have commented out or deleted ALL print statements. You risk losing

points if your file prints anything.

7. Convert your lin_reg.py file to a .txt file. Submit your lin_reg.py file and

your .txt file AND YOUR PDF on BeachBoard. Do NOT submit it in compressed

folder. IN TOTAL, YOU SHOULD BE SUBMITTING THREE FILES!

Some helpful functions

Function name What it does

round(x, y) Rounds the value, x, to y decimal places:

Example: round(1.23456, 3) => 1.235

matrix_name.T Transposes matrix

np.ones(num) Creates a vector full of ones. There will be num

ones.

Example: np.ones(3) => s

1

1

1

t

np.column_stack((col1, col2)) Concatenates two 1d numpy arrays to make a 2d

numpy array.

If � = s

1

2

3

t ��� � = s

1

1

1

t

np.column_stack(x,b) => s

1 1

2 1

3 1

t

np.linalg.inv(mat_name) Finds the inverse of the matrix mat_name

Grading rubric:

To achieve any points, your submission must have the following. Anything missing from

this list will result in an automatic zero. NO EXCEPTIONS!

• Submit everything: py file, txt file, and pdf file

• Program has no errors (infinite loops, syntax errors, logical errors, etc.) that

terminates the program

Please note that if you change the function headers or if you do not return the proper

outputs according to the function requirements, you risk losing all points for those test

cases.

Points Requirement

5 Submission is correct. All three files are part of submission (py file, txt

file, and pdf file)- All or nothing

4 Graphs from pdf file (testing data2.csv) are correct- 2 points each

16 Implemented least_sq correctly (four other cases not including

data.csv and data2.csv)

16 Implemented mat_least_sq correctly (four other cases not including

data.csv and data2.csv)

8 Implemented predict correctly (four other cases not including data.csv

and data2.csv)

8 Implemented plot_reg correctly. Remember that least_sq and

mat_least_sq should be called here. (four other cases not including

data.csv and data2.csv)

8 Graphs have proper x-axis labels, y-axis labels, titles, and legends (1

point each)

5 Passes original test case (test cases on python file have been commented

out too)- all or nothing

TOTAL: 70