Description
P3-1. Revisit Text Documents Classification
Use the 20 newsgroups dataset embedded in scikit-learn:
from sklearn.datasets import fetch_20newsgroups
(See https://scikitlearn.org/stable/modules/generated/sklearn.datasets.fetch_20newsgroups.html#sklearn.datasets.f
etch_20newsgroups)
(a) Load the following 4 categories from the 20 newsgroups dataset: categories = [‘rec.autos’,
‘talk.religion.misc’, ‘comp.graphics’, ‘sci.space’].
(b) Build classifiers using the following methods:
Support Vector Machine (sklearn.svm.LinearSVC)
Naive Bayes classifiers (sklearn.naive_bayes.MultinomialNB)
K-nearest neighbors (sklearn.neighbors.KNeighborsClassifier)
Random forest (sklearn.ensemble.RandomForestClassifier)
AdaBoost classifier (sklearn.ensemble.AdaBoostClassifier)
Optimize the hyperparameters of these methods and compare the results of these methods.
P3-2. Recognizing hand-written digits
Use the hand-written digits dataset embedded in scikit-learn:
from sklearn import datasets
digits = datasets.load_digits()
(a) Develop a multi-layer perceptron classifier to recognize images of hand-written digits. To
build your classifier, you can use:
sklearn.neural_network.MLPClassifier
(See https://scikitlearn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_
network.MLPClassifier)
Instructions: use sklearn.model_selection.train_test_split to split your dataset into random
train and test subsets, where you set test_size=0.5.
(b) Optimize the hyperparameters of your neural network to maximize the classification
accuracy. Show the confusion matrix of your neural network. Discuss and compare your results
with the results using a support vector classifier (see https://scikitlearn.org/stable/auto_examples/classification/plot_digits_classification.html#sphx-glr-autoexamples-classification-plot-digits-classification-py).
P3-3. Nonlinear Support Vector Machine
(a) Randomly generate the following 2-class data points
import numpy as np
np.random.seed(0)
X = np.random.rand(300, 2)*10-5
Y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0)
(b) Develop a nonlinear SVM binary classifier (sklearn.svm.NuSVC).
(c) Plot these data points and the corresponding decision boundaries, which is similar to the
figure in the slide 131 in Chapter 4.