CSC 455 Assignment 1 to 4 solutions

$90.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

CSC 455: Database Processing for Large-Scale Analytics Assignment 1

Part 1

Create a set of relational schemas with underlined (primary) keys and arrows connecting foreign keys and primary keys for a database containing the following information. If you have any difficulty drawing arrows, you can write foreign key information in a sentence instead.

  • Authors have Last Name, Firstname, ID, and Birthdate (identified by ID)
  • Publishers have Name, ID, address (identified by ID)
  • Books have ISBN, Title, Publisher (each book has a unique publisher and can be identified by ISBN).
  • Authors Write Books; since many authors can co-author a book, we need to know the rank of an author contributing to a book, stored in this table (i.e. a number 1, 2, 3; for single author books, this number is 1).

NOTE: Part 2 has some sample data which may be helpful.

 

Part 2

 

  • Using your logical schema from Part1, write the necessary SQL DDL script to create the tables. Be sure to specify every primary key and every foreign key. You can make reasonable assumptions regarding the attribute domains.

 

  • Write SQL INSERT statements to populate your database with the following data (NOTE: remember that strings would need to use single quotes, e.g., ‘Asimov’)

 

    • (King, Stephen, 2, September 9 1947)
    • (Asimov, Isaac, 4, January 2 1920)
    • (Verne, Jules, 7, February 8 1828)
    • (Rowling, Joanne, 37, July 31 1965)

 

    • (Bloomsbury Publishing, 17, London Borough of Camden)
    • (Arthur A. Levine Books, 18, New York City)

 

    • (1111-111, Databases from outer space, 17)
    • (2222-222, Dark SQL, 17)
    • (3333-333, The night of the living databases, 18)

 

    • (2, 1111-111, 1)
    • (4, 1111-111, 2)
    • (4, 2222-222, 2)
    • (7, 2222-222, 1)
    • (37, 3333-333, 1)
    • (2, 3333-333, 2)

 

  • Write a python function that is going to generate and return a SQL INSERT statement given a table name and value list as parameters. For example, generateInsert(‘Students’, [‘1’, ‘Jane’, ‘A-‘]) should return “INSERT INTO Students VALUES (1, Jane, A-);”. It would be even better, but not required if your function returned the more proper “INSERT INTO Students VALUES (1, ‘Jane’, ‘A-‘);” (i.e., put quotes around strings, but not numbers).

Another example: generateInsert(‘Phones’, [’42’, ‘312-555-1212’]) would produce “INSERT INTO Phones VALUES (42, 312-555-1212);”  For simplicity, you can assume that every entry in the list of values is given as a string, even if it is a number.

 

 

You should submit your SQL and python code for this part (you may copy everything into your Word document submission, or submit it as separate files).

 

 

Part 3

 

Consider a MEETING table that records information about meetings between clients and executives in the company.  Each record contains the names of the client and the executive’s name as well as the office number, floor and the building.  Finally, each record contains the city that the building is in and the date of the meeting.  The table is in First Normal Form and the primary key is (Client, Office).

(Date, Client, Office, Floor, Building, City, Executive)

 

You are given the following functional dependencies:

Building → City

Office → Floor, Building, City

Client → Executive

Client, Office → Date

 

  1. Remove any existing partial dependencies and convert the logical schema to the Second Normal Form.  Please remember that when performing schema decomposition you need to denote primary key for every new table as well as the foreign key that will allow us to reconstruct the original data.

 

  1. Remove any existing transitive dependencies to create a set of logical schemas in Third Normal Form.  Again, remember to denote primary keys and foreign keys (including which primary key those foreign keys point to).

 

 

Part 4

 

Consider a table that stores information about students, student name, GPA, honors list and the credits that the student had completed so far.

 

(First, Last, GPA, Honor, Credits)

 

You are given the following functional dependencies

 

First, Last → GPA, Honor, Credits

GPA → Honor

 

  1. Is this schema in Second Normal Form?  If not, please state which FDs violate 2NF and decompose the schema accordingly.

 

  1. Is this schema in Third Normal Form?  If not, please state which FDs violate 3NF and decompose the schema accordingly.

 

Be sure that your name and “Assignment 1” appear at the top of your submitted file.

CSC 455: Database Processing for Large-Scale Analytics Assignment 2

Supplemental reading: SQL reference book
Oracle 11g SQL by Price, ISBN 9780071498500
(available in Books 24×7 DePaul online library
as eBook). Sections 2.1 – 2.3, 2.6, 2.9 – 2.15
(section names are included in the screenshot)
Python for Data Analysis, pp174 “Interacting
with Databases”
You can read Chapter 6: Database Design Using
Normalization in “Databases: A Beginner’s
Guide” by Andrew Oppel, ISBN 0071608478 in
the online library. Another good resource is in
this book link.
Part 1
You are given a following schema in 1NF:
(License Number, Renewed, Status, Status Date, Driver Type, License Type, Original Issue Date, Name,
Sex, Chauffer City, Chauffer State, Record Number) and the following functional dependencies:
Chauffeur City  Chauffeur State (both of these are a single column, not two columns)
Record Number  License Number, Renewed, Status, Status Date, Driver Type, License Type, Original
Issue Date, Name, Sex, Chauffer City, Chauffer State
The table is based on a real data set original taken from City of Chicago data portal (located here:
https://data.cityofchicago.org/Community-Economic-Development/Public-Chauffeurs/97wa-y6ff)
However, the data has been cleaned and reduced to approximately one thousand rows. We will revisit a
non-clean version of that data later.
Decompose the schema to make sure it is in Third Normal Form (3NF).
Write SQL DDL to create the 3NF tables you created. Remember to declare primary and foreign keys as
necessary in your SQL code.
Part 2
Write a python script that is going to create your tables from Part 1 in SQLite and populate them with
data automatically. The data file is posted in Assignment 2 dropbox folder on D2L as
Public_Chauffeurs_Short.csv
Use sqlite3 database as shown in class but remember to make data type changes to your tables from Part
1 (i.e., NUMBER(5,0)INTEGER, NUMBER(5,2)REAL). SQLite is very forgiving regarding data
types, but most databases are not.
I have some sample code that connects to a SQLite database, loads comma-separated student data and
prints the contents of the loaded table. You can find it in the Assignment2 dropbox folder as
loadStudentData.py and a Students.txt file that goes with it.
Naturally you would have to populate however many tables you have created in Part 1, not just 1 table.
For this assignment only, if you run into primary key conflict when loading data
(i.e., “sqlite3.IntegrityError: column ID is not unique” error), you may use INSERT OR IGNORE
instead of INSERT when loading data. This will cause INSERT to skip over duplicate inserts without
causing an error.
Remember to load NULLs properly (i.e. not as string) and make sure you do not load the very first line
that contains column names.
Part 3
You were hired to do some data analysis for a local zoo. Below is the data table, including the necessary
constraints and all the insert statements to populate the database.
— Drop all the tables to clean up
DROP TABLE Animal;
— ACategory: Animal category ‘common’, ‘rare’, ‘exotic’. May be NULL
— TimeToFeed: Time it takes to feed the animal (hours)
CREATE TABLE Animal
(
AID NUMBER(3, 0),
AName VARCHAR2(30) NOT NULL,
ACategory VARCHAR2(18),

TimeToFeed NUMBER(4,2),

CONSTRAINT Animal_PK
PRIMARY KEY(AID)
);
INSERT INTO Animal VALUES(1, ‘Galapagos Penguin’, ‘exotic’, 0.5);
INSERT INTO Animal VALUES(2, ‘Emperor Penguin’, ‘rare’, 0.75);
INSERT INTO Animal VALUES(3, ‘Sri Lankan sloth bear’, ‘exotic’, 2.5);
INSERT INTO Animal VALUES(4, ‘Grizzly bear’, NULL, 2.5);
INSERT INTO Animal VALUES(5, ‘Giant Panda bear’, ‘exotic’, 1.5);
INSERT INTO Animal VALUES(6, ‘Florida black bear’, ‘rare’, 1.75);
INSERT INTO Animal VALUES(7, ‘Siberian tiger’, ‘rare’, 3.75);
INSERT INTO Animal VALUES(8, ‘Bengal tiger’, ‘common’, 2.75);
INSERT INTO Animal VALUES(9, ‘South China tiger’, ‘exotic’, 2.25);
INSERT INTO Animal VALUES(10, ‘Alpaca’, ‘common’, 0.25);
INSERT INTO Animal VALUES(11, ‘Llama’, NULL, 3.5);
Since none of the managers in the zoo know SQL, it is up to you to write the queries to answer the
following list of questions.
1. Find all the animals (their names) that take less than 1.5 hours to feed.
2. Find all the rare animals and sort the query output by feeding time (any direction)
3. Find the animal names and categories for animals related to a bear (hint: use the LIKE operator)
4. Return the listings for all animals whose rarity is not specified in the database
5. Find the rarity rating of all animals that require between 1 and 2.5 hours to be fed
6. Find the names of the animals that are related to the tiger and are not common
7. Find the minimum and maximum feeding time amongst all the animals in the zoo (single query)
8. Find the average feeding time for all the rare animals
9. Find listings for animals with ID less than 10 and also require more than 2 hours to feed.
Be sure that your name and “Assignment 2” appear at the top of your submitted file.

CSC 455: Database Processing for Large-Scale Analytics Assignment 3

Part 1

In this and the next part we will use an extended version of the schema from Assignment 2. You can find it in a file ZooDatabase.sql posted with this assignment on D2L.

 

Once again, it is up to you to write the SQL queries to answer the following questions:

 

  1. List the animals (animal names) and the ID of the zoo keeper assigned to them.

 

  1. Now repeat the previous query and make sure that the animals without a handler also appear in the answer.

 

  1. Report, for every zoo keeper name, the total number of hours they spend feeding all animals in their care.

 

  1. Report every handling assignment (as a list of assignment date, zoo keeper name and animal name).  Sort the result of the query by the assignment date in an ascending order.

 

  1. Find the names of animals that have at least 1 zoo keeper assigned to them.

 

  1. Find the names of animals that have 0 or 1 (i.e., less than 2) zoo keepers assigned to them.

 

  • Optional query:

 

List all combination of animals where the difference between feeding time requirement is within 0.25 hours (e.g., Grizzly bear, 3, Bengal tiger, 2.75).  Hint: this will require a self-join. Avoid listing identical pairs such as (Grizzly bear, 3, Grizzly bear, 3)

 

 

 

Part 2

 

  1. Write a python script that is going to read the queries that you have created in Part-1 from a SQL file, execute each SQL query against SQLite3 database and print the output of that query. You must read your SQL queries from a file, please do not copy SQL directly into python code. The code that would run commands from the ZooDatabase.sql file is provided (runSQL.py), so all you have to do is to change it so that it reads your queries from a SQL file and prints the output of your queries. You can refer to example code from the previous assignment that prints query results using fetchall(). You do not have to format the output in any particular fashion – however, you must print every row individually using a loop.

 

  1. Repeat the work you did in Part-2 of the previous homework using the data file

Public_Chauffeurs_Short_hw3.csv attached in D2L in this assignment dropbox.

It contains roughly the same data, with two changes: NULL may now be represented by NULL or an empty string (,NULL, or ,,) and some of the names have the following form “Last, First” instead of “First Last”, which is problematic because when you split the string on a comma, you end up with too many values to insert.

 

 

Part 3

 

Using the company.sql database (posted in with this assignment), write the following SQL queries.

 

 

  1. Find the names of all employees who are directly supervised by ‘Franklin T Wong’.

 

  1. For each project, list the project name, project number, and the total hours per week (by all employees) spent on that project.

 

  1. For each department, retrieve the department name and the average salary of all employees working in that department. Order the output by department number in ascending order.

 

  1. Retrieve the average salary of all female employees.

 

  1. For each department whose average salary is greater than $43,000, retrieve the department name and the number of employees in that department.

 

  1. Retrieve the names of employees whose salary is within $22,000 of the salary of the employee who is paid the most in the company (e.g., if the highest salary in the company is $82,000, retrieve the names of all employees that make at least $60,000.).

 

 

Be sure that your name and “Assignment 3” appear at the top of your submitted file.

CSC 455: Database Processing for Large-Scale Analytics Assignment 4

 

  1. Implement the OR function in python that can combine two boolean NumPy matrices (do not use any built-in operators such as | or + for your answer, although you can use these operators to test your function). If you prefer, you can use regular list of lists in python instead of the NumPy matrix (e.g., like the example at the end of this paragraph). Your function should work for one-dimensional and two-dimensional matrices and it should error check to verify input matrix compatibility. Only matrices of the same size can be OR-ed together, so if the input to the function is two incompatible matrices, the function should return an error message (using return, not print).

 

For example ORFunction([[True, False], [False, False]], [[False, True], [True, False]]) should return [[True, True], [True, False]].

And ORFunction([[True, False, True], [False, False, True]], [[False, True], [True, False]]) should return an error message.

Hint: Once you check for size compatibility, you will probably want to use a double for-loop.

 

  1. We are going to work with a small extract of tweets (about 200 of them), Assignment4.txt  available in dropbox in Assignment 4.

 

For now, we are going to extract a few columns only. Do not forget that you can use json.loads(OneTweetString) to parse a tweet entity into a python dictionary. Don’t forget to add import json to your code.

 

 

    1. Create a SQL table to contain the following attributes of a tweet:

“created_at”, “id_str”, “text”, “source”, “in_reply_to_user_id”, “in_reply_to_screen_name”, “in_reply_to_status_id”, “retweet_count”, “contributors”. Please assign reasonable data types to each attribute (e.g., VARCHAR(10000) is a bad idea).

Use SQLite for this assignment.

 

    1. Write python code to read through the Assignment4.txt file and populate your table from part a.  Make sure your python code reads through the file and loads the data properly (including NULLs).

NOTE: The input data is separated by a string “EndOfTweet” which serves as a delimiter. The text itself consists of a single line, so using readlines() will still only give you one row which needs to be split by the tweet delimiter.

 

 

 

  1. Write SQL queries to do the following:

 

    1. Count the number of iPhone users (based on “source” attribute)

 

    1. Create a view that contains only tweets from users who are not replying (“in_reply_to_user_id” is NULL)

 

    1. Select tweets that have a “retweet_count” higher than the average “retweet_count” from the tweets in the view in part b

 

    1. Create a view that contains only “id_str”, “text” and “source” from each tweet that has a “retweet_count” of at least 5

 

    1. Use the view from part-d to find how many tweets have a “retweet_count” of at least 5

 

    1. Write python code to compute the answer from 3-e without using SQL, i.e., write code that is going to read data from the input file and answer the same question (find how many tweets have a “retweet_count” of at least 5).

 

 

  1. Write a python function that takes the name of a SQL table as parameter and then does the following:

Select all rows from that table (you can assume that the table already exists in SQLite) with all attributes from that table and output to a file a sequence of corresponding INSERT statements, one for each row from the table. Think of this as an exporting tool, since these INSERT statements could now be executed in Oracle (you do not need to actually do that).

 

This is similar to the question from the end of Part-2 in Assignment 1, only the values will have to be extracted form a SQLite table first. For example:

generateInsertStatements(‘Students’) should write to a file an insert statement from each row contained in the Students table:

 

inserts.txt:

INSERT INTO Students VALUES (‘1’, ‘Jane’, ‘A-‘);

INSERT INTO Students VALUES (‘1’, ‘Mike’, ‘B‘);

INSERT INTO Students VALUES (‘1’, ‘Jack’, ‘B+‘);

 

I will be sure to post sample code for Assignment1.

 

Hint: as you iterate through the rows of the given table, instead of printing the output to screen using print as you have done before, you will want to write an INSERT statement to an output file each time.

 

 

Be sure that your name and “Assignment 4” appear at the top of your submitted file.