CSE 578 Programming Assignments 1 to 6 solutions

$120.00

Original Work ?

Download Details:

  • Name: Assignments-t79akh.zip
  • Type: zip
  • Size: 647.61 KB

Category: You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (1 vote)

CSE 578 Programming Assignment 1 Dino Fun World

You, in your role as a burgeoning data explorer and visualizer, have been asked by the
administrators of a small amusement park in your hometown to answer a couple questions about
their park operations. In order to perform the requested analysis, they have provided you with a
database containing information about one day of the park’s operations.
Provided Database
The database provided by the park administration is formatted to be readable by any SQL database
library. The course staff recommends the sqlite3 library. The database contains three tables, named
‘checkins’, ‘attractions’, and ‘sequences’. The information contained in each of these tables is listed
below:
checkins :
­ Description: check­in data for all visitors for the day in the park. The data includes two types of
check­ins, inferred and actual checkins.
­ Fields: visitorID, timestamp, attraction, duration, type
attraction :
­ The attractions in the park by their corresponding AttractionID, Name, Region, Category, and type.
Regions are from the VAST Challenge map such as Coaster Alley, Tundra Land, etc. Categories
include Thrill rides, Kiddie Rides, etc. Type is broken into Outdoor Coaster, Other Ride, Carousel,
etc.
­ Fields: AttractionID, Name, Region, Category, type
sequences :
­ The check­in sequences of visitors. These sequences list the position of each visitor to the park
every five minutes. If the visitor has not entered the part yet, the sequence has a value of 0 for that
time interval. If the visitor is in the park, the sequence lists the attraction they have most recently
checked in to until they check in to a new one or leave the park.
­ Fields: visitorID, sequence
1
The database file is named ‘dinofunworld.db’ and is available in the read only directory of the Jupyter
Notebook environment (i.e. readonly/dinofunworld.db).
Assignment
The administrators would like you to answer four relatively simple questions about the park activities
on the day in question. These questions all deal with park operations and can be answered using the
data provided.
Question 1 : What is the most popular attraction to visit in the park?
Question 2 : What ride (note that not all attractions are rides) has the longest visit time?
Question 3 : Which Fast Food offering has the fewest visitors?
Question 4 : Compute the Skyline of number of visits and visit time for the park’s ride and report the
rides that appear in the Skyline
Administrative Notes
This assignment will be graded by Coursera’s grading system. In order for your answers to be
correctly registered in the system, you must place the code for your answers in the cell indicated for
each question. In addition, you should submit the assignment with the output of the code in the cell’s
display area. The display area should contain only your answer to the question with no extraneous
information, or else the answer may not be picked up correctly. Each cell that is going to be graded
has a set of comment lines at the beginning of the cell. These lines are extremely important and
must not be modified or removed.(Graded Cell and PartID comments must be in the same line for
proper execution of code)
A correct submission would result in feedback as:”Correct!”
An incorrect submission would look like: “ Incorrect Response! ”
2

CSE 578 Programming Assignment 2 Graphing Dino Fun World

Impressed by your previous work, the administrators of Dino Fun World have asked you to create
some charts that they can use in their next presentation to upper management. The data used for
this assignment will be the same as the data used for the previous assignment.
Provided Database
The database provided by the park administration is formatted to be readable by any SQL database
library. The course staff recommends the sqlite3 library. The database contains three tables, named
‘checkins’, ‘attractions’, and ‘sequences’. The information contained in each of these tables is listed
below:
checkins :
­ Description: check­in data for all visitors for the day in the park. The data includes two types of
check­ins, inferred and actual checkins.
­ Fields: visitorID, timestamp, attraction, duration, type
attraction :
­ The attractions in the park by their corresponding AttractionID, Name, Region, Category, and type.
Regions are from the VAST Challenge map such as Coaster Alley, Tundra Land, etc. Categories
include Thrill rides, Kiddie Rides, etc. Type is broken into Outdoor Coaster, Other Ride, Carussel,
etc.
­ Fields: AttractionID, Name, Region, Category, type
sequences :
­ The check­in sequences of visitors. These sequences list the position of each visitor to the park
every five minutes. If the visitor has not entered the part yet, the sequence has a value of 0 for that
time interval. If the visitor is in the park, the sequence lists the attraction they have most recently
checked in to until they check in to a new one or leave the park.
­ Fields: visitorID, sequence
The database file is named ‘dinofunworld.db’ and is available in the readonly directory of the Jupyter
Notebook environment (i.e. readonly/dinofunworld.db).
1
Assignment
The administrators would like you to create four graphs: a pie chart, a bar chart, a line chart, and a
box­and­whisker plot. All of these plots can be created with the data provided.
Chart 1 : A Pie Chart depicting visits to thrill ride attractions.
Chart 2 : A Bar Chart depicting total visits to food stalls.
Chart 3 : A Line Chart depicting attendance at the newest ride, Atmosfear over the course of the day.
Chart 4 : A Box­and­Whisker Plot depicting total visits to the park’s Kiddie Rides.
Administrative Notes
This assignment will be graded by Coursera’s grading system. In order for your answers to be
correctly registered in the system, you must place the code for your answers in the cell indicated for
each question. In addition, you should submit the assignment with the output of the code in the cell’s
display area. The display area should contain only your answer to the question with no extraneous
information, or else the answer may not be picked up correctly. Each cell that is going to be graded
has a set of comment lines at the beginning of the cell. These lines are extremely important and
must not be modified or removed.
A correct submission would result in feedback as:”Correct!”
An incorrect submission would look like: ”“ Incorrect Response! ”
2

CSE 578 Programming Assignment 3 Dino Fun World Analysis

The administrators of Dino Fun World, a local amusement park, have asked you, one of their data
analysts, to perform three data analysis tasks for their park. These tasks will involve understanding,
analyzing, and graphing attendance data for three days of the park’s operations that the park has
provided for you to use. They have provided the data in the form of a database, described below.
Provided Database
The database provided by the park administration is formatted to be readable by any SQL database
library. The course staff recommends the sqlite3 library. The database contains three tables, named
‘checkins’, ‘attractions’, and ‘sequences’. The information contained in each of these tables is listed
below:
checkin :
­ Description: check­in data for all visitors for the day in the park. The data includes two types of
check­ins, inferred and actual checkins.
­ Fields: visitorID, timestamp, attraction, duration, type
attraction :
­ The attractions in the park by their corresponding AttractionID, Name, Region, Category, and type.
Regions are from the VAST Challenge map such as Coaster Alley, Tundra Land, etc. Categories
include Thrill rides, Kiddie Rides, etc. Type is broken into Outdoor Coaster, Other Ride, Carussel,
etc.
­ Fields: AttractionID, Name, Region, Category, type
sequences :
­ The check­in sequences of visitors. These sequences list the position of each visitor to the park
every five minutes. If the visitor has not entered the part yet, the sequence has a value of 0 for that
time interval. If the visitor is in the park, the sequence lists the attraction they have most recently
checked in to until they check in to a new one or leave the park.
­ Fields: visitorID, sequence
The database is named ‘dinofunworld.db’ and is in the ‘read only’ folder of the Jupyter Notebook
environment.
1
Assignment
1: The park’s administrators would like you to help them understand the different paths visitors take
through the park and different rides they visit. In this mission, they have selected 5 visitors at random
whose checkin sequences they would like you to analyze. For now, they would like you to construct
a distance matrix for these 5 visitors. The five visitors have the ids: 165316, 1835254, 296394,
404385, and 448990.
2: The park’s administrators would like to understand the attendance dynamics at each ride (note
that not all attractions are rides). They would like to see the minimum (non­zero) attendance at each
ride, the average attendance over the whole day, and the maximum attendance for each ride on a
Parallel Coordinate Plot.
3: In addition to a PCP, the administrators would like to see a Scatterplot Matrix depicting the min,
average, and max attendance for each ride as above.
Administrative Notes
This assignment will be graded by Coursera’s grading system. In order for your answers to be
correctly registered in the system, you must place the code for your answers in the cell indicated for
each question. In addition, you should submit the assignment with the output of the code in the cell’s
display area. The display area should contain only your answer to the question with no extraneous
information or else the answer may not be picked up correctly. Each cell that is going to be graded
has a set of comment lines at the beginning of the cell. These lines are extremely important and
must not be modified or removed.
A correct submission would result in feedback as:”Correct!”
An incorrect submission would look like: “ Incorrect Response! ”
2

CSE 578 Programming Assignment 4 Dino Fun World Time Series Analysis

The administrators of Dino Fun World, a local amusement park, have asked you, one of their data
analysts, to perform three data analysis tasks for their park. These tasks will involve understanding,
analyzing, and graphing attendance data that the park has provided for you to use. They have
provided the data in the form of a database, described below.
Provided Database
The database provided by the park administration is formatted to be readable by any SQL database
library. The course staff recommends the sqlite3 library. The database contains three tables, named
‘checkins’, ‘attractions’, and ‘sequences’. The information contained in each of these tables is listed
below:
`checkin`:
­ Description: check­in data for all visitors for the day in the park. The data includes two types of
check­ins, inferred and actual checkins.
­ Fields: visitorID, timestamp, attraction, duration, type
`attraction`:
­ The attractions in the park by their corresponding AttractionID, Name, Region, Category, and type.
Regions are from the VAST Challenge map such as Coaster Alley, Tundra Land, etc. Categories
include Thrill rides, Kiddie Rides, etc. Type is broken into Outdoor Coaster, Other Ride, Carussel,
etc.
­ Fields: AttractionID, Name, Region, Category, type
`sequences`:
­ The check­in sequences of visitors. These sequences list the position of each visitor to the park
every five minutes. If the visitor has not entered the part yet, the sequence has a value of 0 for that
time interval. If the visitor is in the park, the sequence lists the attraction they have most recently
checked in to until they check in to a new one or leave the park.
­ Fields: visitorID, sequence
The database is named ‘dinofunworld.db’ and is available in the readonly directory of the Jupyter
Notebook environment (i.e. readonly/dinofunworld.db).
1
Assignment
1: The park’s administrators are worried about the attendance at the ride ‘Atmosfear’ in the data
window. To assuage their fears, they have asked you to create a control chart of the total attendance
at this ride. Using the data provided, create a control chart displaying the attendance, the mean, and
the standard deviation bands at one and two standard deviations.
2: Some of the park’s administrators are having trouble interpreting the control chart graph of
‘Atmosfear’ attendance, so they ask you to also provide a moving average chart of the attendance in
addition to the control chart created in the previous question. In this case, they request that you use
50 samples for the size of the moving average window.
3: In order to have options concerning the graphs presented, the park’s administrators also ask you
to provide a 50­sample moving average window with the average computed with exponential
weighting (i.e. an exponentially­weight moving average) over the same ‘Atmosfear’ attendance data.
Administrative Notes
This assignment will be graded by Coursera’s grading system. In order for your answers to be
correctly registered in the system, you must place the code for your answers in the cell indicated for
each question. In addition, you should submit the assignment with the output of the code in the cell’s
display area. The display area should contain only your answer to the question with no extraneous
information, or else the answer may not be picked up correctly. Each cell that is going to be graded
has a set of comment lines at the beginning of the cell. These lines are extremely important and
must not be modified or removed.
A correct submission would result in feedback as:”Correct!”
An incorrect submission would look like: “ Incorrect Response! ”
2

CSE 578 Programming Assignment 5 Geographic Data Analysis

In this assignment, you will be using a database of geographic data for you in the PySal library to
create two plots, a choropleth map and a proportional symbol map. In addition to these two plots,
you will compute the value of Moran’s I for this data.
Dataset
The United States’ lower 48 states. In addition to the state­by­state data, the dataset contains shape
files for each state that you can use to create the choropleth and proportional symbol maps.
Administrative Notes
Your assignment will be graded by Coursera’s grading system. In order for your answers to the
question, you must have the code for each question. In addition, you should submit the assignment
in the cell’s display area. Contain no extraneous information. Each cell that is going to be graded has
a set of comment lines at the beginning of the cell. These lines are extremely important and must not
be modified or removed.
A correct submission would result in feedback as:”Correct!”
An incorrect submission would look like: “ Incorrect Response! ”
1

CSE 578 Programming Assignment 6 Hierarchical Clustering

As in your previous assignments, the administrators of the Dino Fun World theme park, you must
have the data analysis in order to help them administer the park. In this case, your task is the same.
In a priori, you were asked to find the distance between a set of visitor trajectories using a simple
edit distance algorithm and report the distances. For this task, you must construct and display a
dendrogram of those distances. Again, the administrators of the park.
Provided Database
The database provided by the park administration is formatted to be readable by any SQL database
library. The course staff recommends the sqlite3 library. The database contains three tables, named
‘checkins’, ‘attractions’, and ‘sequences’. The information contained in each of these tables is listed
below:
`checkin`:
­ Description: check­in data for all visitors for the day in the park. The data includes two types of
check­ins, inferred and actual checkins.
­ Fields: visitorID, timestamp, attraction, duration, type
`attraction`:
­ The attractions in the park by their respective AttractionID, Name, Region, Category, and type.
Regions are from the VAST Challenge map such as Coaster Alley, Tundra Land, etc. Categories
include Thrill rides, Kiddie Rides, etc. Type is broken into Outdoor Coaster, Other Ride, Carussel,
etc.
­ Fields: AttractionID, Name, Region, Category, type
`sequences`:
­ The check­in sequences of visitors. These sequences are the list of the positions of each visitor to
the park. If the visitor has not entered the part yet, the sequence has a value of 0 for that time
interval. If the visitor is in the park, the sequence lists are the most visited.
­ Fields: visitorID, sequence
The database is named ‘dinofunworld.db’ and is available at read only / dinofunworld.db.
1
Assignment
This tasks consists of only one question, which will require you to generate a dendrogram graph.
Create this dendrogram using the trajectories of the visitors with the IDs: 165316, 1835254, 296394,
404385, and 448990. If you are unsure about how to create a dendrogram, please refer to the
Jupyter Notebook example which is creating a dendrogram. When performing clustering over the
trajectories to inform the dendrogram, use an average distance over all points in the cluster.
Administrative Notes
Coursera’s grading system. In order for your answers to the question, you must have the code for
each question. In addition, you should submit the assignment in the cell’s display area. Contain no
extraneous information. Each cell that is going to be graded has a set of comment lines at the
beginning of the cell. These lines are extremely important and must not be modified or removed.
A correct submission would result in feedback as:”Correct!”
An incorrect submission would look like: “ Incorrect Response! ”
2