DSCI552 Programming Assignment 1 to 7 solutions

$150.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

DSCI552 Programming Assignment 1: Decision Trees

Part 1: Implementation [7 points]

Your job in this exercise is to predict whether you will have a good night-out in Jerusalem for the coming
New Year’s Eve. Assume that you have kept a record of your previous night-outs with the following
attributes.

• How densely the place is usually occupied {High, Moderate, Low}
• How the prices are {Expensive, Normal, Cheap}
• Volume of the music {Loud, Quiet}

• The location {Talpiot, City-Center, Mahane-Yehuda, Ein-Karem, German-Colony}
• Whether you are a frequent customer (VIP) {Yes, No}
• Whether this place has your favorite beer {Yes, No}
• Whether you enjoyed {Yes, No}

We have provided a data file (dt_data.txt) that contains the relevant records.
(a) Write a program to construct a decision tree based on the idea of splitting by Information Gain.
(b) Run your program on the data file.
(c) Make a prediction for (occupied = Moderate; price = Cheap; music = Loud; location = City-Center;
VIP = No; favorite beer = No).

You can write your program in any programming language. However, you will have to implement the
decision tree learning algorithm yourself instead of using library functions. Provide a description of the
data structures you use, any code-level optimizations you perform, any challenges you face, and of
course, the requested prediction.

Your code should print the decision tree that it produces. Please describe the format in which you
print the decision tree in your report.

Part 2: Software Familiarization [Optional – No Credit]

Do your own research and find out about a library function that offers a good implementation of the
decision tree learning algorithm. Please learn how to use it. Compare it against your implementation and
suggest some ideas for how you can improve your code. Describe all these in your report.

Part 3: Applications [Optional – No Credit]

Do your own research and describe some interesting applications of decision trees.

Submission Guidelines

In your submission report, please include the names of all group members and mention their individual
contributions. The maximum number of the members in a team is 2. The report should be in PDF format.

Your submission should include the code as well as the report and is due before 02/05, 11:59pm in an
archive in the zip, tar.gz or tar.xz format. Your source code should have a comment line that contains the
names of all group members. Only one submission is required for each group by one of the group
members. Please submit your homework on D2L (do NOT email the homework to the instructor or the
TA).

DSCI 552: Programming Assignment 2

Part 1: Implementation [7 points]

Implement the K-means algorithm AND the Expectation Maximization algorithm for clustering using a
Gaussian Mixture Model (GMM). Run your algorithms on the data file “clusters.txt” using K, the number
of clusters, set to 3. Report the centroid of each cluster in K-means; and report the mean, amplitude and
covariance matrix of each Gaussian in GMM. Compare the results of the two algorithms. The data file
contains 150 2D points. Each row in the file contains the coordinates of a single point.

You can write your program in any programming language. However, you will have to implement the
algorithms yourself instead of using high-level library functions. Please provide a description of the data
structures you use, any code-level optimizations you perform, any challenges you face, and of course,
the requested output.

Part 2: Software Familiarization [Optional – No Credit]

Do your own research and find out about library functions that offer good implementations of the two
algorithms. Learn how to use them. Compare them against your implementations and suggest some ideas
for how you can improve your code. Describe all this in your report.

Part 3: Applications [Optional – No Credit]

Do your own research and describe some interesting applications of the two algorithms.
Submission Guidelines
In your report, please include the names of all group members and mention their individual contributions.

The maximum number of the members in a team is 2. The report should be a PDF file. Your submission
should include the code as well as the report and is due before 02/16, 11:59pm in an archive in a zip,
tar.gz or tar.xz format.

Your source code should have a comment line that contains the names of all group
members. Only one submission is required for each group by one of the group members. Please submit
your homework assignment on D2L (do NOT email the homework to the instructor or the TA).

DSCI 552: Programming Assignment 3

Part 1: Implementation [7 points]

PCA (2 points)

Use PCA to reduce the dimensionality of the data points in pca-data.txt from 3D to 2D. Each line of the
data file represents the 3D coordinates of a single point. Please output the directions of the first two
principal components.

FastMap (5 points)

Use FastMap to embed the objects in fastmap-data.txt into a 2D space. The first two columns in each
line of the data file represent the IDs of the two objects; and the third column indicates the symmetric
distance between them. The objects listed in fastmap-data.txt are actually the words in fastmapwordlist.txt (nth word in this list has an ID value of n). The distance between each pair of objects is the
Damerau–Levenshtein distance between them. Plot the words on a 2D plane using your FastMap
solution.

You can write your program in any programming language. However, you will have to implement the
algorithms yourself instead of using high-level library functions – except for computing eigenvectors and
eigenvalues. Provide a description of the data structures you use, any code-level optimizations you
perform, any challenges you face, and of course, the requested outputs.

Part 2: Software Familiarization [Optional – No Credit]

Do your own research and find out about library functions that offer good implementations of PCA and
FastMap. Learn how to use them. Compare them against your implementations and suggest some ideas
for how you can improve your code. Describe all this in your report.

Part 3: Applications [Optional – No Credit]

Do your own research and describe some interesting applications of PCA and FastMap.

Submission Guidelines

In your report, please include the names of all group members and mention their individual contributions.

The maximum number of the members in a team is 2. The report should be in PDF format. Your
submission should include the code as well as the report. It is due before 02/27, 11:59pm in an archive
in zip, tar.gz or tar.xz formats. Only one submission is required for each group by one of the group
members. Please submit your homework on D2L (do NOT email the homework to the instructor or the
TA).

DSCI 552: Programming Assignment 4

Part 1: Implementation [7 points]

[2 points] Implement the Perceptron Learning algorithm. Run it on the data file “classification.txt”
ignoring the 5th column. That is, consider only the first 4 columns in each row. The first 3 columns are
the coordinates of a point; and the 4th column is its classification label +1 or -1. Report your results
(weights and accuracy after the final iteration).

[1 point] Implement the Pocket algorithm and run it on the data file “classification.txt” ignoring the 4th
column. That is, consider only the first 3 columns and the 5th column in each row. The first 3 columns
are the coordinates of a point; and the 5th column is its classification label +1 or -1. Plot the number of
misclassified points against the number of iterations of the algorithm. Run up to 7000 iterations. Also,
report your results (weights and accuracy after the final iteration).

[3 points] Implement Logistic Regression and run it on the points in the data file “classification.txt”
ignoring the 4th column. That is, consider only the first 3 columns and the 5th column in each row. The
first 3 columns are the coordinates of a point; and the 5th column is its classification label +1 or -1. Use
the sigmoid function Ɵ(s) = es
/(1+es
). Run up to 7000 iterations. Report your results (weights and
accuracy after the final iteration).

[1 point] Implement Linear Regression and run it on the points in the data file “linear-regression.txt”.
The first 2 columns in each row represent the independent X and Y variables; and the 3rd column
represents the dependent Z variable. Report your results (weights after the final iteration).

You can write your programs in any programming language. However, you will have to implement the
algorithms yourself instead of using high-level library functions, except for solving a system of linear
equations. Please provide a description of the data structures you use, any code-level optimizations you
perform, any challenges you face, and of course, the requested outputs.

Part 2: Software Familiarization [Optional – No Credit]

Do your own research and find out about library functions that offer good implementations of linear
classification, linear regression, and logistic regression. Learn how to use them. Compare them against
your implementations and suggest some ideas for how you can improve your code. Describe all this in
your report.

Part 3: Applications [Optional – No Credit]

Do your own research and describe some interesting applications of linear classification, linear
regression, and logistic regression.

Submission Guidelines

In your report, please include the names of all group members and mention their individual contributions.
The maximum number of the members in a team is 2. The report should be in a PDF format. Your
submission should include the code as well as the report and is due before 3/10, 11:59pm in an archive
in a zip, tar.gz or tar.xz format. Only one submission is required for each group by one of the group
members. Please submit your homework on D2L (do NOT email the homework to the instructor or the
TA).

DSCI 552: Programming Assignment 5 [Neural Networks]

Part 1: Implementation [7 points]

In the directory gestures, there is a set of images1
that display “down” gestures (i.e., thumbs-down
images) or other gestures. In this assignment, you are required to implement the Back Propagation
algorithm for Feed Forward Neural Networks to learn the down gestures from training instances available
in downgesture_train.list. The label of an image is 1 if the word “down” is in its file name; otherwise
the label is 0.

The pixels of an image use the gray scale ranging from 0 to 1. In your network, use one
input layer, one hidden layer of size 100, and one output perceptron. Use the value 0.1 for the learning
rate. For each perceptron, use the sigmoid function Ɵ(s) = 1/(1+e-s
).

Use 1000 training epochs; initialize
all the weights randomly between -0.01 and 0.01 (you can also choose your own initialization approach,
as long as it works); and then use the trained network to predict the labels for the gestures in the test
images available in downgesture_test.list. For the error function, use the standard squared error. Output
your predictions and accuracy.

The image file format is “pgm” <https://netpbm.sourceforge.net/doc/pgm.html>. Please follow the link
for the format details. You can either use a third-party library to read these image files or easily read them
yourself.

You can write your programs in any programming language. However, you will have to implement the
algorithms yourself instead of using library functions (except for reading “pgm” image files). In your
report, please provide a description of the data structures you use, any code-level optimizations you
perform, any challenges you face, and of course, the requested outputs.

Part 2: Software Familiarization [Optional – No Credit]

Do your own research and find out about library functions that offer good implementations of the Back
Propagation algorithm for Feed Forward Neural Networks. Learn how to use them. Compare them
against your implementations and suggest some ideas for how you can improve your code. Describe all
this in your report.

Part 3: Applications [Optional – No Credit]

Do your own research and describe some interesting applications of Neural Networks in general.

Submission Guidelines

In your report, please include the names of all group members and mention their individual contributions.

The maximum number of the members in a team is 2. The report should be in a PDF format. Your
submission should include the code as well as the report and is due before 04/05, 11:59pm in an archive
in a zip, tar.gz or tar.xz format.

Only one submission is required for each group by one of the group
members. Please submit your homework on D2L (do NOT email the homework to the instructor or the
TA).
1
source: https://www.cs.cmu.edu/~tom/faces.html

DSCI 552: Programming Assignment 6 [Support Vector Machines]

Part 1: Implementation [7 points]

You are given two data files – linsep.txt and nonlinsep.txt – each of which contains 100 2D points with
classification labels +1 or -1. The first two columns in each file indicate the 2D coordinates of a point;
and the third column indicates its classification label. The points in linsep.txt are linearly separable. The
points in nonlinsep.txt are not linearly separable in the original space but are linearly separable in a zspace that uses a simple nonlinear transformation.

Part (a) [3.5 points]: Find the fattest margin line that separates the points in linsep.txt. Please solve the
problem using a Quadratic Programming solver. Report the equation of the line as well as the support
vectors.

Part (b) [3.5 points]: Using a kernel function of your choice along with the same Quadratic Programming
solver, find the equation of a curve that separates the points in nonlinsep.txt. Report the kernel function
you use as well as the support vectors.

You can write your programs in any programming language. However, you will have to implement the
algorithms yourself instead of using library functions (except for the Quadratic Programming solver). In
your report, please provide a description of the data structures you use, any code-level optimizations you
perform, any challenges you face, and of course, the requested outputs.

Part 2: Software Familiarization [Optional – No Credit]

Do your own research and find out about library functions relevant to Support Vector Machines. Learn
how to use them. Compare them against your implementations and suggest some ideas for how you can
improve your code. Describe all this in your report.

Part 3: Applications [Optional – No Credit]

Do your own research and describe some interesting applications of Support Vector Machines.

Submission Guidelines

In your report, please include the names of all group members and mention their individual contributions.

The maximum number of the members in a team is 2. The report should be in PDF format. Your
submission should include the code as well as the report and is due before 04/16, 11:59pm in an archive
in zip, tar.gz or tar.xz format. Only one submission is required for each group by one of the group
members. Please submit your homework on D2L (do NOT email the homework to the instructor or the
TA).

DSCI 552: Programming Assignment 7 [Hidden Markov Models]

Part 1: Implementation [7 points]

Consider a variable x with domain {1, 2, 3 … 10}. Let vt be the value of x at timestep t. vt+1 is equal to vt
– 1 or vt + 1 with probability 0.5 each, except when vt = 1 or vt = 10, in which case vt+1 = 2 or vt+1 = 9,
respectively.

At each timestep t, we also get noisy measurements of vt. That is, vt -1, vt or vt + 1 can be
returned with equal probabilities. Your task is to use a Hidden Markov Model to figure out the most likely
sequence of values v1 v2 … v10 when the observation sequence is 8, 6, 4, 6, 5, 4, 5, 5, 7, 9. At timestep t =
1, v1 can be any value in {1, 2, 3 … 10} with equal prior probabilities.

You can write your program in any programming language. However, you will have to implement the
algorithms yourself instead of using library functions. In your report, please provide a description of the
data structures you use, any code-level optimizations you perform, any challenges you face, and of
course, the requested outputs.

Submission Guidelines

In your report, please include the names of all group members and mention their individual contributions.

The maximum number of the members in a team is 2. The report should be in PDF format. Your
submission should include the code as well as the report. It is due before 04/27, 11:59pm in an archive
in zip, tar.gz or tar.xz format.

Only one submission is required for each group by one of the group
members.

Please submit your homework on D2L (do NOT email the homework to the instructor or the
TA).