Description

5/5 - (1 vote)

Problem #1 (15 points)

FOR SECTIONS 590-12 and 590-53 (undergraduate) ONLY

Explain the motivation of using an ensemble of classifiers. What are the advantages/disadvantages of using this strategy?
Identify and explain cases where an ensemble approach can result in a performance that is:
Better than the best individual classifier
Worse than the best individual classifier

Comparable to the best performing individual classifier

Problem #2 (15 points)

Create a 2-dimensional data set with 20 samples that has the following properties:

Samples should belong to 2 clusters (10 samples per class)
Data cannot be clustered correctly using the K-Means algorithm
Data can be clustered correctly using the Hierarchical Agglomerative clustering

Explain why the K-Means cannot generate the correct clusters.

What kind of linkage is needed for the Agglomerative algorithm to cluster the data correctly?

Problem #3 (15 points)

Create a 2-dimensional data set with 22 samples that has the following properties:

20 of the samples should belong to 2 clusters (10 samples per class)
The remaining 2 samples are noise
Data cannot be clustered correctly using the K-Means algorithm (with K=2)
Data cannot be clustered correctly using the Hierarchical Agglomerative clustering (with K=2)
Data can be clustered correctly using DBSCAN

Explain why the K-Means and the Agglomerative algorithms cannot generate the correct clusters.

Explain why the DBSCAN is the appropriate algorithm for this dataset.

Problem #4 (15 points)

List three different reasons for trying to reduce the number of features prior to applying a machine learning algorithm. Justify and explain each reason.

Problem #5 (15 points)

Identify two cases where accuracy may be an inadequate measure to evaluate the performance of a classification algorithm. Explain the reasons.

For each case, provide an alternative scoring measure and explain why it is more reliable than the accuracy.

Problem #6 (10 points)

Given the following pseudo-code that is supposed to train and test an SVM classifier on the Iris data.

Load Iris data
Normalize data to have zero mean and unit variance
X_train, X_test, y_train, y_test = train_test_split (iris.data, iris.target)
fit (X_train, y_train)
score = svm.score (X_test, y_test)

Is the above algorithm logically correct? If not, identify the problem and correct it.

Problem #7 (15 points)

Suppose that we have 3 classification algorithms. Each algorithm has two parameters: P1 and P2.

After performing a grid search for each algorithm (using {0.01, 0.1, 1, 10} for each parameter), we obtain the following accuracy results:

Algorithm 1

10	0.71	0.80	0.93	0.93
1	0.70	0.70	0.92	0.91
0.1	0.65	0.71	0.90	0.89
0.01	0.63	0.65	0.88	0.87
	0.01	0.1	1	10

Did we use the correct range of values for each parameter? Justify your answer.

If the answer is no, then what other values for P1 and P2 do you recommend exploring?

Algorithm 2

10	0.90	0.95	0.91	0.89
1	0.88	0.93	0.87	0.82
0.1	0.75	0.88	0.84	0.70
0.01	0.63	0.79	0.73	0.67
	0.01	0.1	1	10

Did we use the correct range of values for each parameter? Justify your answer.

If the answer is no, then what other values for P1 and P2 do you recommend exploring?

Algorithm 3

10	0.85	0.90	0.88	0.82
1	0.83	0.93	0.91	0.85
0.1	0.75	0.89	0.84	0.70
0.01	0.63	0.79	0.73	0.67
	0.01	0.1	1	10

Did we use the correct range of values for each parameter? Justify your answer.

If the answer is no, then what other values for P1 and P2 do you recommend exploring?

Solved CSE 590 Introduction to Machine Learning Mid-Term Exam 2

Download Details:

Description

Solved CSE 590 Introduction to Machine Learning Mid-Term Exam 2

Download Details:

Description

Related products

CSE 590-54/59 Mid-term Programming Portion solved

CSE 590-3/53 Final Programming Portion solved

Solved CSE 590-54/59 Homework Assignment 1 (American) Football Simulation