## Description

For this project we will apply both Logistic Regression and SVM to predict whether capacitors from a fabrication

plant pass quality control based (QC) on two different tests. To train your system and determine its reliability you

have a set of 118 examples. The plot of these examples is show below where a red x is a capacitor that failed QC

and the green circlesrepresent capacitors that passed QC.

I have already randomized the data into two data sets: a training

set of 85 examples and a test set of 33 examples. Both are

formatted as

•First line: m and n, tab separated

•Each line after that has two real numbers representing the

results of the two tests, followed by a 1.0 if the capacitor

passed QC anda 0.0 if it failed QC—tab separated.

Assignment: Your assignment is to use what you have learned

from the class slides and homework to create (from scratch in

Python, not by using Logistic Regression library function!) a

Logistic Regression and SVM binary classifier to predict whether

each capacitor in the test set will pass QC.

Logistic Regression: You are free to use any model variation and any testing or training approach we have discussed

for logistic regression. In particular, since this data is not linear, I assume you will want to add new features based on

power of the original two features to create a good decision boundary. w0 + w1x1 + w2x2 is not going to work!

One choice might be

w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5x5 +w6x6 + w7x7 + w8x8 where the new features are created as follows:

Note that it is easy to create a small Python program that reads in your original

features, uses a nested loop to create the new features and then writes them to a file.

thePower = 2

for j in range(thePower+1):

for i in range(thePower+1):

temp = (x1**i)*(x2**j)

if (temp != 1):

fout1.write(str(temp)+”\t”)

fout1.write(str(y)+”\n”)

With a few additions to the code, you can make a program to create combinations of any powers of x1 and x2!

SVM: You need to use the original training and testing data file with kernel functions for SVM. You can use the svm

functions in the Scikit-learn library and don’t need to implement the algorithm from scratch.

Please refer to https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html for details

New

Features

From

Original

Features

x1 x1

x2 x1

2

x3 x2

x4 x1x2

x5 x1x2

2

x6 x2

2

x7 x1

2

x2

x8 x1

2

x2

2

What to Upload to Canvas:

Logistic Regression:

1. A single py file (lastname_firstname _P3_LR.py) that prompts for a training file name, computes weights

using gradient descent, prints out a plot of iterations vs. J, plot the decision boundary on the whole dataset

and then prompts for a test filename, and using the computed weights prints out final J, FP, FN, TP, TN,

accuracy, precision, recall and F1 for the test set. All values should be clearly labelled.

2. Your training set file (lastname_firstname_P3Train.txt). First line should contain integers m and n,tab

separated. Each line after that should have n real numbers representing the new feature data, followed by a 1

if the capacitor passed QC and a 0 if it failed QC—tab separated.

3. Your test set file (lastname_firstname_P3Train.txt). First line should contain integers m and n, tabseparated.

Each line after that should have n real numbers representing the new feature data, followed by a 1 if the

capacitor passed QC and a 0 if it failed QC—tab separated.

4. A pdf file (lastname_firstname_P3_LR.pdf) that includes

• A description of your model and testing procedure, including

o Description of your model

o Initial values that you chose for your weights, learning rate, and the initial value for J.

o Final values for learning rate, your weights, how many iterations your learning algorithm went

through and your final value of J on your training set.

o Include a plot of J (vertical axis) vs. number of iterations (horizontal axis).

o Include a plot of hyperplane on the whole dataset

o Value of J on your test set.

o Your code

• A confusion matrix showing your results on your test set.

• A description of your final results that includes accuracy, precision, recall and F1 values.

SVM:

1. A single py file (lastname_firstname _P3_SVM.py) that prompts for a training file name, plot the margin and

hyperplane, and then prompts for a test filename, and using the computed weights prints out final FP, FN, TP,

TN, accuracy, precision, recall and F1 for the test set. All values should be clearly labelled.

2. A pdf file (lastname_firstname_P3_SVM.pdf) that includes

• A description of your model and testing procedure, including

o Description of your model

o Description of your kernel function

o Include a plot of margin and hyperplane

o Your code

• A confusion matrix showing your results on your test set.

• A description of your final results that includes accuracy, precision, recall and F1 values.

Note:

For undergrads (CPSC 4430) the final accuracy of both algorithms on your test set should be higher than 70%

For graduate-level (CPSC 6430) the final accuracy of both algorithms on your test set should be higher than 85%

Do not assume that any files are available to you besides files you turn in!

Zip your files into one zip file named lastname_firstname_P3_midterm.zip and upload it to Canvas.