Description

5/5 - (6 votes)

Problem: Implement a fixed-depth decision tree algorithm, that is, the input to the ID3 algorithm
will include the training data and maximum depth of the tree to be learned. The code skeleton
as well as data sets for this assignment can be found on e-Learning.
Data Sets: The MONK’s Problems were the basis of a first international comparison of learning
algorithms [1]. The training and test files for the three problems are named monks-X.train
and monks-X.test. There are six attributes/features (columns 2–7), and binary class labels
(column 1). See monks.names for more details.
Visualization: The code skeleton provided contains a function render_dot_file(), which
can be used to generate .png images of the trees learned by both scikit-learn and your
code. See the documentation for render_dot_file() for additional details on usage.
a. (Autograder Score, 20 points) Your code will be auto-graded and cross-checked with other
submissions. The auto-grader will evaluate your code on several different data sets to perform a
sanity check. In order to ensure that your code passes the auto-grader, ensure that you do not
modify the function headers. In addition, do not hard code any values (such as y = 0 and 1) and
make your code as general as possible.
b. (Learning Curves, 20 points) For depth = 1, …, 10, learn decision trees and compute the average
training and test errors on each of the three MONK’s problems. Make three plots, one for each
of the MONK’s problem sets, plotting training and testing error curves together for each problem,
with tree depth on the x-axis and error on the y-axis.
c. (Weak Learners, 20 points) For monks-1, report the visualized learned decision tree and the
confusion matrix on the test set for depth = 1, 3, 5. You may use scikit-learns’s
confusion matrix() function [2].
Figure 1: Confusion matrix for binary Classification Problem
d. (scikit-learn, 20 points) For monks-1, use scikit-learn’s
DecisionTreeClassifier [3] to learn a decision tree using criterion=’entropy’ for
depth = 1, 3, 5. You may use scikit-learn’s confusion matrix() function [2].
e. (Other Data Sets, 20 points) Repeat steps (c) and (d) with your “own” data set and report the
confusion matrices. You can use other data sets in the UCI repository. If you encounter
continuous features, consider a simple discretization strategy to pre-process them into binary
features using the mean. For example, a continuous feature x can be discretized using its mean
µ as
Write a report with the solutions to the questions above, showing the plots, confusion matrices
and a brief discussion (4–5 lines) comparing your implementation to that of scikit-learn, which
is a widely-used, publicly-available, open-source implementation.
[1]https://archive.ics.uci.edu/ml/datasets/MONK’s+Problems
[2]https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
[3]https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

CS6375 Assignment II Fixed-Length Decision Tree solution

Download Details:

Description

CS6375 Assignment II Fixed-Length Decision Tree solution

Download Details:

Description

Related products

Problem Set 2 CS 6375 solution

CS6375 Assignment 4 solution

CS 6375 Final Exam solution