Description

5/5 - (1 vote)

CSE 537 Project 1

This assignment covers the topic of AI search. It will reinforce your understanding of this topic.
In the assignment you will write a program to find a solution to the game of Peg Solitaire. A
description of this board game follows:
The board consists of peg holders. A peg holder can either be empty or hold exactly one peg. In
Figure 1 below “○” and “●” correspond to an empty and a filled peg holder respectively. The
other blank spaces in the board that are neither empty or filled are unusable – these are the 2 x 2
squares on the four corners of the board. The objective of the game is to remove all except one
peg from the board. The rule for removing a peg is this: If X, Y and Z are three consecutive peg
holders with X and Y holding a peg each and Z is empty then the peg in X can jump over Y and
fill Z. In the process Y is removed from the board. The peg holders X and Y become empty and
Z now holds a peg. Note that only horizontal and vertical moves are allowed.
There are several variants of the game with differing board sizes and shapes. For this assignment
we will use the 7 x 7 board as shown in Figure 1 below with the 2 x 2 squares on the four
corners unused, i.e. they are not peg holders and hence unusable by our definition. In our game
the objective is to remove all the pegs in the board except one. The final peg should be placed in
the center peg holder in order to win – see Figure 1(b).
Figure 1 (a) Figure 1(b)
○ ○ ○
○ ○ ○
○ ○ ○ ○ ○ ○ ○
○ ○ ○ ● ○ ○ ○
○ ○ ○ ○ ○ ○ ○
○ ○ ○
○ ○ ○
○ ○ ○
○ ● ○
○ ○ ● ● ● ○ ○
○ ○ ○ ● ○ ○ ○
○ ○ ○ ● ○ ○ ○
○ ○ ○
○ ○ ○
The board’s configuration can be represented by a vector of string characters, one string per row.
The configuration in Figure 1(a) above will be represented as:
< – – 0 0 0 – -, – – 0 X 0 – – , 0 0 X X X 0 0, 0 0 0 X 0 0 0, 0 0 0 X 0 0 0, – – 0 0 0 – -, – – 0 0 0 – – >
0 and X denote an empty and filled peg holder respectively and (–) denotes an unusable peg
holder. In this assignment you can use your own configuration (test set) to test your algorithm.
Use an array of strings to represent the test set (the input of your function).
You should number the peg holders as shown in Figure 2 below:
Figure 2
You are required to implement AI search algorithms to solve the peg solitaire game as specified
above. Your program will take a given initial configuration of the board and will output a
sequence of valid moves that will result in a solution, if one exists. A valid move is one that
results in removing a peg (every move must remove a peg from the board). A move will be a
pair (X,Y) where the peg in peg holder numbered X is moved to the peg holder numbered Y. For
the initial board configuration in Figure 1(a) the following sequence of valid moves will result in
the configuration in Figure 1(b) which is also the solution to the game with Figure 1(a) as the
initial configuration: <(9,7),(23,9),(10,8),(7,9),(4,16)>. You will use two search strategies:
1. Iterative Deepening Search (uninformed)
2. A* with two kinds of heuristics (your choice), one more informed than the other
0 1 2
3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29
30 31 32
For each search strategy – IDS and A* with the two heuristics – your program should generate
the number of nodes expanded, memory usage and running time.
Extra points (up to a maximum of 25) will be given if your program uses tricks to prune searches
and not revisiting expanded nodes.
Notes:
1. Your program should be written in Python3.
2. For more information about peg solitaire see: h%p://en.wikipedia.org/wiki/Peg_solitaire
Submission:
Submit by 11:59 pm on due date via Black Board. Put all the documents in a folder and zip it.
The file name should be LastName_FirstName_HW1.zip.
• Source code with good documentation. Make sure it compiles.
• A trace of the execution for each search strategy
• Test set
• A report with the required statistics generated, heuristics used, prune method (if any) and
any other detail you want to mention.

CSE537 Project 2

Sudoku is a number placement puzzle. In this puzzle you are given an N x N grid of cells. The
grid itself is composed of M x K sub-grids. You can place a number, drawn from 1 to N, in any
cell. Initially the grid will have some of its cells partially filled. The objective of the puzzle is to
complete the grid so that:
1. Every cell contains a number.
2. No number appears twice in any row, column of the N x N grid or in any row, column of
any of the M x K sub-grid.
Figure 1 (a) below is a 12 x 12 Sudoku puzzle made up of 3 x 4 sub-grids and Figure 1 (b) is the
solution to this puzzle.

Figure 1(a) Figure 1(b)
Your assignment is to write a program to solve the general Sudoku puzzle using finite domain
CSP methods discussed in the class.
The input to your program will be a text file formatted as follows:
1. The first line will have three integers that fix the values for N, M and K in that order.
2. The next N lines describe the board configuration – one per row. An empty cell will be
denoted by _.
So N, M, K and the rows in Figure 1(a) will be represented like this:
12, 3, 4
_,11,_,_,_,_,_,_,_,4,_,_
7,_,_,2,6,_,_,3,5,_,11,_
. (Input format)
.
_,_,3_,_,_,_,_,_,_,12,_
You should implement
1. Backtracking + MRV heuristic
2. Backtracking + MRV + Forward Checking
3. Backtracking + MRV + Constraint Propagation
Given an input instance you will generate an output for each of the 3 implementations. Each
output will have the solution to the puzzle and the statistics about the solution. The first N lines
generated by the output should describe the solved puzzle with the format introduced before. The
last line should be the number of consistency checks done to arrive at the solution for each
implementation.
You can find random sudoku examples from the web to test your program. Based on your test
file analyze the pros and cons of these 3 implementations and write your findings in a report.
Make sure that your input file follows the input format.
Notes:
1. Your program should be written in Python3
2. There are numerous sources on the Web for the Sudoku puzzle. You are encouraged to
consult them to gain a good understanding of the puzzle.
Submission:
Submit by 11:59 pm on due date via Black Board. Put all the documents in a folder and zip it.
The file name should be LastName_FirstName_HW2.zip.
• Source code with good documentation. Make sure it compiles.
• A trace of the execution for each strategy
• A report with the required statistics generated and any findings you discovered.

CSE 537 Project 03: Machine Learning

1. Clickstream Mining with Decision Trees [50 points]
The project is based on a task posed in KDD Cup 2000. It involves mining click-stream data
collected from Gazelle.com, which sells legware products. Your task is to determine: Given a set
of page views, will the visitor view another page on the site or will he leave?
The data set given to you is in a different form than the original. In particular it has discretized
numerical values obtained by partitioning them into 5 intervals of equal frequency. This way, we
get a data set where for each feature, we have a finite set of values. These values are mapped to
integers, so that the data is easier for you to handle
Dataset
The data set can be found on Blackboard.
You have 5 files in .csv format
1. trainfeat.csv: Contains 40000 examples, each with 274 features in the form of a
40000 x 274 matrix.
2. trainlabs.csv: Contains the labels (class) for each training example (did the visitor
view another page?)
3. testfeat.csv: Contains 25000 examples, each with 274 features in the form of a
25000 x 274 matrix.
4. testlabs.csv: Contains the labels (class) for each testing example.
5. featnames.csv: Contains the “names” of the features. These might useful if you
need to know what the features represent.
The format of the files is not complicated, just rows of integers separated by empty spaces.
Stopping Criterion: Chi-squared criterion
What about split stopping? Suppose that the attribute that we want to split on is irrelevant. Then
we would expect the positive and negative examples (labels 1 and 0) to be distributed according
to a specific distribution. Suppose that splitting on attribute T, will produce sets {T_i} where
i = 1 to m
Let p, n denote the number of positive and negative examples that we have in our dataset (not the
whole set, the remaining one that we work on at this node). Let (N is the total number of
examples in the current dataset):
be the expected number of positives and negatives in each partition, if the attribute is irrelevant
to the class. Then the statistic of interest is:
where p_i, n_i are the positives and negatives in partition T_i. The main idea is that S is
distributed (if the class is irrelevant) according to a chi-squared distribution with m-1 degrees of
freedom.
Now we must compute the p-value. This is the probability of observing a value X at least as
extreme as S coming from the distribution. To find that, we compute P(X >= S). The test is
passed if the p-value is smaller than some threshold. Remember, the smallest that probability is,
the more unlikely it is that S comes from a chi-squared distribution and consequently, that T is
irrelevant to the class.
Your Task:
Code
Implement the ID3 decision tree learner on Python. Your program should use the chi-squared
split stopping criterion with the p-value threshold given as a parameter. Use your implementation
with the threshold for the criterion set to 0.05, 0.01 and 1. Remember, 1 corresponds to growing
the full tree.
Report
1. For each value of threshold, what is your tree’s accuracy and size (size equals number of
internal nodes and leaves)? What do you observe? If all your accuracies are low, tell us what you
have tried to improve the accuracies and what you suspect is failing.
2. Explain which options work well and why?
Your submission file should be named ‘q1_classifier.py’ and it should run as follows:
Usage: python q1_classifier.py -p -f1 -f2 -o
Where:
• and are .csv files
• is a csv file containing the predicted labels for the test dataset (same
format as the original testlabs.csv)
• is the p-value threshold as described above
Note:
1. You may find the function scipy.stats.chisquare helpful
2. You should implement the ID3 algorithm yourself
3. You may use libraries such as pandas for parsing the data. You can also use standard
libraries such as scipy and numpy
2. Spam Filter [50 points]
The dataset we will be using is a subset of 2005 TREC Public Spam Corpus. It contains a
training set and a test set. Both files use the same format: each line represents the
space-delimited properties of an email, with the first one being the email ID, the second one
being whether it is a spam or ham (non-spam), and the rest are words and their occurrence
numbers in this email. In preprocessing, non-word characters have been removed, and features
selected similar to what Mehran Sahami did in his original paper using Naive Bayes to classify
spams.
Dataset
The data set can be found on Blackboard.
Your Task:
Code
Implement the Naive Bayes algorithm to classify spam.
Report
Use your algorithm to learn from the training set and report accuracy on the test set. Try various
smoothing parameters for the Naive Bayes learner.
Which parameters work best?
Extra Credit (10 points)
Features selected makes learning much easier, but it also throws out useful information. For
example, exclamation mark (!) often occurs in spams. Even the format of email sender matters:
in the case when an email address appears in the address book, a typical email client will replace
it with the contact name, which means that the email is unlikely to be a spam (unless, of course,
you are a friend of the spammer!). Sahami’s paper talked about a few such features he had used
in his classifier. For extra credit, you can play with the original files and come up with useful
features that improve your classifier. Index.zip (on Blackboard) contains the list of the files
used in train/test.
Your submission file should be named ‘q2_classifier.py’ and it should run as follows:
Usage: python q2_classifier.py -f1 -f2
-o
Where:
• and contain space delimited properties of
an email
• is a csv file containing the predicted labels for the test dataset
What to Submit: You should submit a zip file containing:
• Source code q1_classifer.py, q2_classifier.py (Python 3.x).
• A Report with the required explanations for each problem.
All Project submissions must be made through Blackboard.
Academic Dishonesty: We will be checking your code against other submissions in the class for
logical redundancy. If you copy someone else’s code and submit it with minor changes, we will
know. These cheat detectors are quite hard to fool, so please don’t try. We trust you all to submit
your own work only; please don’t let us down. If you do, we will pursue the strongest
consequences available to us.
Getting Help: Feel free to use the Piazza discussion board to discuss or get clarifications on
homework-related issues.

CSE 537 Projects 1 to 3 solutions

Download Details:

Description

CSE 537 Project 1

CSE537 Project 2

CSE 537 Project 03: Machine Learning

CSE 537 Projects 1 to 3 solutions

Download Details:

Description

CSE 537 Project 1

CSE537 Project 2

CSE 537 Project 03: Machine Learning

Related products

CSE 537 Project 03: Machine Learning solution

CSE 537 Project 1 solution

CSE537 Project 2 solution