CMSC 678 Project No. 1 to 4 solutions

$100.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

Statistical Learning and Fuzzy Logic Algorithms CMSC 678 Project No. 1

1) 10 points for part 1)
Create (in MATLAB) 20, 2-dimensional, normally distributed data with standard deviation 2, centered
at [0; 0] for positive class, and 10, 2-dimensional, normally distributed data with standard deviation
2, centered at [5; 5] for negative class. Data should be created with a seed = 1. Train Perceptron with
learning rate η = 0.1. Implement Perceptron update as given by Method 1 in the textbook. Initial
weight vector w = [0 0 0]’.

a) Show data and separation boundary in the first graph. How many epochs are needed?
b) What is the final weight vector w?
c) Run experiments with various learning rates η, say [1e-4 1e-3 1e-2 1e-1 1e0 1e 1e1 1e2 1e3
1e4]. Show in the second graph how number of epochs depends upon η.

d) Train linear neuron in a batch mode. Show its separation boundary in the first graph too.
Comment all the results in 1).

2) 5 points for part 2)
Create an outlier at [20, 20] which belongs to negative class.
a) Train perceptron with η = 1. Show on the first graph from 1a) both the outlier and percep-
tron’s separation boundary.

b) Train linear neuron in a batch mode again without regularization. Show its separation
boundary too.
c) Train linear neuron in a batch mode again with penalty parameter λ = 1. Now the weight
vector must be calculated as follows w = (X’X + λI)
-1X’Y. Show the new separation
boundary too.

3) 10 points for part 3)
Run 10-fold crossvalidation and find the best penalty parameter λbest.
Using the λbest design linear neuron and show its (best) separation boundary. What is the w now?
Comment all the results in 2) & 3).

Some hints:
SOFTWARE MUST BE USER FRIENDLY, so that I can run it easily too. At the top of your routine have the
commands: close all, format compact None of the calculations should be longer than 5 seconds on my laptop.
Your report should STRICTLY be (in terms of everything; starting with 2 columns format up to fonts type and
font size) in the form of IEEE journal (conference) paper. Use the template attached but don’t send to me your
Word file, send to me PDF file.

Submit both your written report and program to me by Email.
ZIP your report and programs into a single zip file (which will contain max 2 files (the report in PDF, and
the MATLAB’s m routine) name it with your family name (say, lee.zip) and send it to me. A Subject field
in your Email MUST be CMSC 678, Family name, Project 1. Don’t hesitate to contact me in the case of
need. Use my office hours –Tuesday 11am-12pm (but, you can always drop by for up to 7 minutes
questions and discussion. Just knock on my door!)
FINALLY: Any copying, “copying” or use in any form of somebody else’s code or report will be
treated as cheating and treated according to the VCU Honors Code.

Statistical Learning and Fuzzy Logic Algorithms – CMSC 678 Project No. 2

Part 1) Hard margin linear SVM 5% of grade
Use dataset P2_data.mat and design the Linear Support Vector Machine i.e., the hard margin classifier.
a) What are the alpha values of support vectors? What is the bias? What is the size of margin M?
Calculate margin as M = 1/norm(w).

d) What are the values of the decision function for the test datapoints [3 4] and [6 6]?
Plot the data, separation boundary (solid blue) and both margins (dashed blue) in the input space.
Clearly show which data are support vectors.

Part 2) Multiclass soft classification 20% of grade
For a given dataset glass design the 1 vs All classifier by using both polynomial kernel (you will write
the code for getting kernel matrix for polynomial kernel) and Gaussian one (here, use my present to
you, the code grbf_fast.m). What is the accuracy of each classifier?

Some hints:
• In part 1 neither shuffle nor scale the data. Use them as given to you. In part 2 do both
shuffling and scaling.
• Define Hessian matrix and all the other matrices and vectors needed for matlab’s routine
quadprog. Note, Hessian matrix H may be badly conditioned. The remedy is as follows:
add to the H’s diagonal elements small number by the line H = H + eye(l)*1e7;, where l is the number of training datapoints.

• In identifying ALL support vectors find alphas bigger than some accuracy value, say ε =
1e-5. In finding the FREE support vectors use the line
ind_Free = find(alpha >= ε & alpha <= C – ε);
• In calculating bias you have to differ between the free and bounded support vectors.
• In part 2 data is in sparse format. Read it in and change the format as follows:
[Y X]=libsvmread(‘glass’); X=full(X);

• In part 2 for each classifier in 1 vs All design do the 5-fold cross-validation (CV)
following CV handouts. Use the following values for the cross-validation
C0 = [1e-2 1e-1 1 1e1 1e2 1e3 1e4]
parameters = [1 2 3 4 5] for polynomial kernel classifier i.e.
parameters = [1e-2 1e-1 1 1e1 1e2 1e3] for Gaussian (i.e. RBF) kernel
classifier

After finding the best hyperparameters (Cbest and degree of polynomialbest i.e., Cbest and
σbest) design each classifier by using ALL data.
• After designing all classifiers you will have 6 class predictions vectors Ypred. Your final,
single, classifier will be obtained by using MAX operator to decide about class. This is
known as the Winner-takes-All approach.

• Write a single code for designing both classifiers. I mean don’t write two codes, the first
one for the polynomial kernel classifier and the other for the Gaussian one. Use the
variable kernel, and say if kernel = 1 use polynomial kernel and if kernel = 2 use the
Gaussian one. The difference between different kernels is only in calculation of kernel
matrix. All the other lines are same. Sure, in designing CV loops work with these lines
for i = 1:length(C0)
C = C0(i);
for j = 1:length(parameters)
param = parameters(j)

end
end

Where param for polynomial is its degree d and for the Gaussian is the σ of the kernel.
**********************************************************************************
Submit both your written report (in an IEEE format) and program to me by Email.
ZIP your report and programs into a single zip file. Name it with your family name (say, lee.zip) and
send it to me. A Subject field in your Email should be CMSC 678, Family name, Project 2. Don’t
hesitate to contact me in the case of need.

Statistical Learning and Fuzzy Logic Algorithms – CMSC 678 Project No. 3

1) 10% of grade

For a given dataset cancer design a multilayer perceptron neural network. For hidden layer neurons
use the tangent hyperbolic activation function (AF) given as . This is a problem with
two classes only. So, we have only one OL neuron and let its AF be linear. Split your data randomly
into the training and test dataset as 75% for training and 25% for test.

Your tasks are:

1. to design an NN which will be the best for cancer data classification. This means that
you will have to play with various numbers N of centers (activation functions, neurons) and various number I of iterations steps while learning. (1 iteration step is one
sweep through the training datasets, meaning one epoch).
• Play with following numbers N0 = [5 10 15 25 50 75 100] and I0 = [100 250 500
1000]

2. For each selection of N and I after the training is over, calculate the output layer outputs for
all the test inputs and see what is the error in percentage on the test data points. Save this error
as E = E(N, I). After the training plot the E surface (use mesh here) and choose the best N and
I. Report the smallest E, and best both N and I. Also, report the smallest errors in percent achieved for each class. In the case of tie choose smallest N and then smallest Iteration.

Some hints:
SOFTWARE MUST BE USER FRIENDLY, so that I can run it easily too. At the top of your routine have the
command: close all, clear all, format compact. Now, the calculations can take some time. Be prepared.
Your report should be in a form of IEEE journal (conference) paper. Use the template given.
Submit both your written report and program to me by Email.

ZIP your report and all programs and data needed into a single zip file, name it with your family name
(say, lee.zip) and send it to me. A Subject field in your Email should be CMSC 678, Family name, Project

3. Don’t hesitate to contact me in the case of need. Use my office hours – Tuesday or Thursday 11am12pm.

2
2 1
1 u y
e- = – +

Statistical Learning and Fuzzy Logic Algorithms CMSC 678 Project 4

Design a Fuzzy Logic Model for Grading the Project 4 in Fuzzy Logic

In a real life, while developing FL models, you are the one who transfers your knowledge into the algorithm and, consequently, you are choosing
everything, meaning the domains i.e., input variables, number and type of fuzzy subsets a.k.a. membership functions (MFs) for each domain (input),
as well as number and type of output fuzzy subset.

Finally, you are the one who will transform your structured knowledge about the problem into the
IF-THEN rules. In this project, I will try to make your job easier by suggesting to you some parts of the mentioned tasks.

The three relevant input
domains (meaning the inputs used to make an output which is the grade) are the Writing Skill (i.e. written Quality) of the Report (Q), number of
Errors in the report (E) and expressive power of Figures (F). Under the excellent Skill (Quality), we understand a well written project, grammatically
correct, stylistically nice, edited and formatted sophistically and following a prescribed format. As for the numerical values for Q, use 0 for the lowest
quality and 100 for the highest one.

For the number of errors let’s say that they go from 0 to 10. As for the expressive power of Figures, use three
membership functions only (ugly and/or non-informative (placed it at 50), medium (placed it at 75) and very fine (placed it at 100)).

The output
domain is the grade (G) on the scale of percentages from 40% and less to 100%, where 40% is fail (F) and 100% is excellent i.e. an A.
Hence, you have 3 input domains (a.k.a. antecedents) and 1 output variable (consequence). For the inputs’ fuzzy subsets use triangles (i.e., you
should develop your code for designing triangular MFs). Place triangles symmetrically with all of them having same width. For the output G, use the
singleton MFs.

Use the product for AND, and make a Fuzzy Additive Model.
a) 15 points
First design fuzzy model without using the third domain i.e. neglect the Power of Figures input here. Your model should work for any number of MFs
per the first two input variables. Run and show the results for any two different, but not same, numbers of MFs per variables (say, between 2 and 6
MFs per input). Show two models (meaning for the two choices of the numbers of MFs, say 3 & 4 and 4 & 3, or 6 & 2) only.

For each model show two figures:
1) Show all the MFs in a single graph having three subplots – inputs Q (subplot 131) & E (subplot 132), and the output G (subplot 133).
2) In addition, show the ‘surface of knowledge’ describing the dependency of grade G upon the two input variables Q and E.

However, show these two figures for three cases:
a) one for narrow MFs (overlapping 0.1 in the middle between the triangles),
b) one for medium size MFs (overlapping 0.5) and
c) one for broad MFs (overlapping 0.9) ).

In short, I expect 6 figures per a model in your report.
If you are encoding your model correctly by using MATLAB’S kron function, YOUR MODEL MUST WORK FOR ANY NUMBER of MFs. In this case,
kron will take care of getting H values in eq. (6.16).

That’s all
for Project four!
Your report doesn’t have to be in IEEE journal (conference) paper this time. Just title and your name + 12 figures. No Comments.!
If you want some additional points which will be counted toward your final grade in the course, expand your model by using the third antecedent,
namely, the Power of Figures input.

3) Now, show all the MFs in a single graph having four subplots – inputs Q (subplot 221), E (subplot 222) & F (subplot223), and the output
G (subplot 224).

4) In addition, show the ‘surfaces of knowledge’ describing the dependency of grade G upon the two input variables.
Those will be two additional figures in your report.

SOFTWARE MUST BE USER FRIENDLY, so that I can run it easily too. Submit both your written report as the PDF document and MATLAB routines to
me by Email. ZIP your report and programs into single zip file (which will contain the report and the routines for FLM, name it with your family name (say,
lee.zip) and send it to me.

A Subject field in your Email should be CMSC 678, Your family name, Project 5.
Don’t hesitate to contact me in the case of need.
Note: Late submission will shrink your points as follows: up to 1 hour late -2 points, 1 to 3 hours late -4 points, more than 3 hours late -10 points, but if you are late more than 6 hours don’t
bother with submitting, for there won’t be any points. This remark is here because the semester is ending soon and both you and I must have time for both closing it and the exam.