Description

5/5 - (1 vote)

Program Goals

Get familiar with Python syntax
Get familiar with a Python development/execution environment
Implement a basic AI algorithm

Be advised:

Some (many? most?) of you will never have used Python before, so we’re going to keep the first assignment a bit conceptually boring and hold your hand a little bit more than is typical in a 500-level course. We’ll expect more of you pretty quickly, though.

We’ve created a resourceLinks to an external site. Download We’ve created a resourceLinks to an external site. for those of you transitioning from Java to Python, check it out!

Summary

This project includes two separate implementations:

State space generation for the water jug puzzle (code in a file called p1_statespace.py)
Weather prediction from historical data (code in a file called p1_weather.py)

We’ll walk you through the concepts and provide quite a bit of scaffolding for each of the implementations, since you will be expected to complete them both in Python, which many of you do not know (or haven’t used in a while).

Part 1: State Space Generation

The code for this portion should be written in a file called p1_statespace.py.

You began generating this state space manually in class on January 23 — the problem of the two water jugs of fixed integer size. We’ll be using this as an excuse to practice using Python lists.

For the purposes of this assignment, we will be representing each state as a list of two ints. Please also maintain a separate list of two ints with the respective capacities of the jugs.

There are three operations you can perform on a jug:

Fill jug to capacity from faucet
Empty jug onto ground
Pour into another jug until that jug is full or this jug is empty

For this component of implementation, please implement four (4) Python functions:

fill(state, max, which) — return a copy of state which fills the jug corresponding to the index in which (0 or 1) to its maximum capacity. Do not modify state.
empty(state, max, which) — return a copy of state which empties the jug corresponding to the index in which (0 or 1). Do not modify state.
xfer(state, max, source, dest) — return a copy of state which pours the contents of the jug at index source into the jug at index dest, until source is empty or dest is full. Do not modify state.
succ(state, max) — display the list of unique successor states of the current state in any order.

Each of these functions should take as an argument the current state of the two water jugs as a list of two ints, as well as the maximum capacities of each jug as a list of two ints.

These functions should work as follows (though be sure to test them more thoroughly yourself):

>>> max = [5,7]
>>> s0 = [0,0]
>>> fill(s0,max,1)
=> [0,7]
>>> s0
=> [0,0]
>>> s1 = fill(s0,max,1)
>>> xfer(s1,max,1,0)
=> [5,2]
>>> succ(s0,max)
[0,0]
[5,0]
[0,7]

You may assume our states and capacities will be valid, non-negative integers presented in a valid, two-element list, and indexes will be valid. Error-check if you want the practice, but it will not be part of your score.

Be sure to test your functions with capacities other than 5 and 7! We will 🙂

In the event that a function does not modify the state (e.g. you fill a jug which is already at capacity or empty a jug which is already empty), you should still return a list whose contents are identical to the original state.

>>> s0 == empty(s0,max,1)
=> True

Part 2: Weather Prediction

The code for this portion should be written in a file called p1_weather.py.

This implementation will require some basic math, file I/O, and use of some built-in Python data structures, and is a little more involved than the previous one.

You’ll be using a version of the k-Nearest Neighbors algorithmLinks to an external site. (we’ll look at it in much more depth later this semester) to predict whether we expect it to be raining in Seattle based on various weather conditions.

For this component of implementation, please implement four (4) Python functions:

euclidean_distance(data_point1, data_point2) – return the Euclidean distance between two dictionary data points from the data set.
read_dataset(filename) – return a list of data point dictionaries read from the specified file.
majority_vote(nearest_neighbors) – return a prediction of whether it is raining or not based on a majority vote of the list of neighbors.
k_nearest_neighbors(filename, test_point, k) – using the above functions, return the majority vote prediction for whether it’s raining or not on the provided test point.

1. Euclidean distance

This is what you probably think of as a “normal” distance function:

$d = \sqrt{{(x_{1} - x_{2})}^{2} + {(y_{1} - y_{2})}^{2}}$

except in this case we’re taking the distance between points in three dimensional space, where those dimensions are the precipitation amount, maximum temperature, and minimum temperature for the day. (You can ignore the date when doing this calculation.)

>>> euclidean_distance({'DATE': '1951-05-19', 'TMAX': 66.0, 'PRCP': 0.0, 'TMIN': 43.0, 'RAIN': 'FALSE'},{'DATE': '1951-01-27', 'TMAX': 33.0, 'PRCP': 0.0, 'TMIN': 19.0, 'RAIN': 'FALSE'}) 
=> 40.80441152620633
>>> euclidean_distance({'DATE': '2015-08-12', 'TMAX': 83.0, 'PRCP': 0.3, 'TMIN': 62.0, 'RAIN': 'TRUE'}, {'DATE': '2014-05-19', 'TMAX': 70.0, 'PRCP': 0.0, 'TMIN': 50.0, 'RAIN': 'FALSE'})
=> 17.694349380522585

2. Read dataset

Okay, so why do the data points in the previous example code look like that?

Because you’ll be reading them in from rain.txtLinks to an external site. Download rain.txtLinks to an external site..

Each line of the file looks something like this:

1948-01-01 0.47 51 42 TRUE

where the first entry is the DATE, the second entry is the PRCP (precipitation), the third entry is the TMAX (maximum temperature), the fourth entry is the TMIN (minimum temperature) and the last entry is a boolean representing RAIN (what we’ll be predicting for new data points).

In this function, you must read the file in line by line, split each line out with its spaces, and create a dictionary with the keys listed above and the values on the line, where the numeric values have been converted to floats. The function should return a list with one dictionary for each line in the file.

The sample line provided above, for example, should produce the following dictionary:

{'DATE': '1948-01-01', 'TMAX': 51.0, 'PRCP': 0.47, 'TMIN': 42.0, 'RAIN': 'TRUE'}

We won’t show you the whole result of testing this function, but here are some things you can try:

>>> dataset = read_dataset('rain.txt')
>>> len(dataset)
=> 25548
>>> dataset[0]
=> {'DATE': '1948-01-01', 'TMAX': 51.0, 'PRCP': 0.47, 'TMIN': 42.0, 'RAIN': 'TRUE'}

Again, here’s the file for you to read in: rain.txtLinks to an external site. Download rain.txtLinks to an external site.

3. Majority vote

Very simply, a k-Nearest Neighbor (or kNN) algorithm looks at the classes assigned to the k points closest to the query point, and says that the query point has the same label as most of its neighbors (in this case, whether it was raining or not). This function takes in a list of those neighbors, and returns the string representing whether it’s raining or not. (Note: this function does not return a boolean! Booleans in Python look like True or False. We want all-caps.)

If a tie occurs, default to ‘TRUE’ as your answer. This is Seattle, after all.

>>> majority_vote([{'DATE': '2015-08-12', 'TMAX': 83.0, 'PRCP': 0.3, 'TMIN': 62.0, 'RAIN': 'TRUE'},
{'DATE': '2014-05-19', 'TMAX': 70.0, 'PRCP': 0.0, 'TMIN': 50.0, 'RAIN': 'FALSE'},
{'DATE': '2014-12-05', 'TMAX': 55.0, 'PRCP': 0.12, 'TMIN': 44.0, 'RAIN': 'TRUE'},
{'DATE': '1954-09-08', 'TMAX': 71.0, 'PRCP': 0.02, 'TMIN': 55.0, 'RAIN': 'TRUE'},
{'DATE': '2014-08-27', 'TMAX': 84.0, 'PRCP': 0.0, 'TMIN': 61.0, 'RAIN': 'FALSE'}])
=> 'TRUE'

4. k-Nearest Neighbors

In this function, you’ll be given the path to a correctly-formatted file (to read using your function), a test point dictionary that contains all of the same keys (except the label ‘RAIN’), and the value of k indicating how many neighbors to select.

Find the closest k neighbors using your Euclidean distance, and return their majority vote on whether it’s raining or not in the test point.

>>> k_nearest_neighbors('rain.txt', {'DATE': '1948-01-01', 'TMAX': 51.0, 'PRCP': 0.47, 'TMIN': 42.0}, 2)
=> 'TRUE'
>>> k_nearest_neighbors('rain.txt', {'DATE': '1948-01-01', 'TMAX': 51.0, 'PRCP': 0.00, 'TMIN': 42.0}, 2)
=> 'FALSE'
>>> k_nearest_neighbors('rain.txt', {'DATE': '1948-01-01', 'TMAX': 51.0, 'PRCP': 0.00, 'TMIN': 42.0}, 10)
=> 'FALSE'
>>> k_nearest_neighbors('rain.txt', {'DATE': '1948-01-01', 'TMAX': 51.0, 'PRCP': 0.05, 'TMIN': 42.0}, 10)
=> 'TRUE'

You may notice that you could take the ‘PRCP’ element, check whether it’s nonzero, and simply make your prediction on that. Very good: you’re right! This is actually a pretty trivial classification task. In practice, it’s sometimes better (and more accurate) to be less clever with your implementation. For the purposes of this assignment, though, pretend you’re an absent-minded data scientist or just trying to be Very Impressive. 😉

Submission

Please submit both p1_statespace.py and p1_weather.py at the same time so the TAs can grade you easily. You may submit more than once; unless you specify otherwise, we will grade your last submission.

CS540 P1: Introduction to Python solved

Download Details:

Description

Program Goals

Be advised:

Summary

Part 1: State Space Generation

Part 2: Weather Prediction

1. Euclidean distance

2. Read dataset

3. Majority vote

4. k-Nearest Neighbors

Submission

CS540 P1: Introduction to Python solved

Download Details:

Description

Program Goals

Be advised:

Summary

Part 1: State Space Generation

Part 2: Weather Prediction

1. Euclidean distance

2. Read dataset

3. Majority vote

4. k-Nearest Neighbors

Submission

Related products

CS540 Homework 9 solution

CS 540 Homework Assignment # 3 solution

CS 540 Homework Assignment # 5 solution