DSC 450: Database Processing for Large-Scale Analytics Take-home Midterm

$30.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (4 votes)

Part 1 

 (9 points) Please give a “True” or a “False” rating to each statement below.

  1. A schema that is in Third Normal Form (3NF) must also be in Second Normal Form (2NF).

 

  1. A foreign key can contain a NULL.

 

  1. The union rule allows combining functional dependencies AàCD and BàCD into a functional dependency ABàCD.

 

  1. SQL query results are always sorted by the primary key.

 

  1. A column that appears in the WHERE clause must always appear in the SELECT clause.

 

  1. A UNIQUE constraint on a column allows insertion of a NULL.

 

 

Part 2

 

  1. (3 points) Write a SQL column definition for PurchaseAmt (just for one column, don’t worry about the rest of the table) that ensures that the purchase amount must be between 0 and 9999.99 (column definition should enforce both the number precision and greater-than-0 requirement).

 

  1. (3 points) Write a SQL column definition for Street (just for one column, don’t worry about the rest of the table) that ensures that the street address is up to 18 characters and has to start from ‘rue’ (hint: CHECK constraint can use LIKE operator).

 

  1. (12 points) The following set of relations records information about university students, courses and assigned grades. The Student relation contains information about the student, including the name (full name is stored in one column), address and their year of graduation. The Course relation records information about courses: course name (primary key), course department and the number of credits provided by the course.  Finally, the Grade relation records information about the grades given; CName is the foreign key referring to the primary key of the Course relation and StudentID is the foreign key referring to the primary key of the Student relation.  The grades are a numeric value given on a 4-point system.

Student(StudentID, Name, Address, GradYear)

 

Grade(CName, StudentID, CGrade)

 

Course(CName, Department, Credits)

For each part below, write a single SQL query.

 

Q1.  Display the list of student IDs and names for the students who graduated in most recent two years.  You can assume that GradYear is an integer, but your query is not allowed to assume any particular year.

 

Q2.  Display student names and their taken course names for all students with the middle name of ‘Muriel’. You may assume that name is always written as ‘First Middle Last’.  Your query output should be sorted by grade.

 

Q3.  For students who are either not enrolled in any courses or are enrolled in only 1 course, list those student’s names and graduation years.

 

Q4. Update all student records, to increase the graduation year by 3 for all students who live in Chicago

 

Q5. Modify the course table to add a Chair column that can be up to 28 characters (that question requires a DDL rather than a DML SQL statement)

 

Part 3

 

  1. (8 points)
  • For the table below, fill in the missing values in W column, consistent with functional dependencies: XYàZA ,  AàW. You can make any necessary assumptions, but be sure to state them.
X Y Z W A
1 1 1000 Cat 2
1 2 4000 Hawk 3
1 3 3000 Dog 4
2 1 1000   3
2 2 2000   4
2 3 5000   5
  • Given the schema R and the following functional dependencies:

R(X, Y, Z, W, A)  with  XYàZA ,  AàW

does XY determine W? (XYàW?) Why or why not?

 

  • Suppose that you were also given relation S:

S(P, Q, U, M, N)

What functional dependencies (if any) can you assume?

 

  1. (8 points) Consider a TVShows table that keeps track of different TV shows. The table stores the show name and the year to which the entry refers.  Additionally, each row stores channel name, length of the show, the average cost of an episode, and the filming location’s zip, city and state.  Moreover, each entry contains the name of the lead actor and their salary.

The table is already in First Normal Form, and its primary key is (ShowTitle, Year).

The schema for the TVShows table is:

(ShowTitle, Year, Channel, Length, Cost, Zip, City, State, Lead, Salary)

You are given the following functional dependencies:

ShowTitle à Cost

Zip à City, State

Lead à Salary

ShowTitle, Year à Channel, Cost

  • Remove any existing partial dependencies to create a set of linked relational schema (copying functional dependencies does not define a schema, be sure to include primary/foreign keys here) in Second Normal Form.
  • Remove any existing transitive dependencies to create a set of linked relational schemas in Third Normal Form.
  1. (7 points)

Given the schema R and the following functional dependencies:

R(A, B, C, D, E)  with  ABàC,  CàD

 

  • Describe how to identify a primary key for relation R (the primary key is a minimal set of columns that determines all columns in the relation such that ? à ABCDE )

 

  • Decompose relation R into a relational schema in third normal form

Part 4

 

(50 points) 

Create the schema from Part 2-c in SQLite and populate it with data of at least 6 students, 4 courses, and 7 enrollments (at least one of the students should not be enrolled in any courses and at least one course should have zero current enrollments).

 

  1. Create a view that joins the three tables, including all of the records from student table (i.e., including the non-enrolled students).

 

  1. Write and execute python code that uses that view to export all data into a single .txt file (that is a “de-normalized” 1NF file with some redundancy present). This code should include NULLs due to the non-enrolled students.

Include a screenshot of the output .txt file (in addition to the python code)

 

  1. Add a new row to the de-normalized .txt file (you can manually edit the .txt file from part b) that violates the following functional dependency:

CName à Credits

(you can do so by creating a new record that repeats the course name but does not repeat the number of credits associated with this course name)

 

  1. Write python code that will identify the values for which functional dependency was violated in your .txt file (hint: when the functional dependency is valid, there is only one unique value of Credits for each CName). Your solution should detect any violation of the CName à Credits functional dependency, not just your example. Keep in mind that functional dependency is violated only in the pre-joined .txt file, not in the SQLite database, so this solution must read data from .txt file.

 

  1. Suppose I have a new query: Qe: For every department, display the average graduation year.

 

  1. Use the view from Part 4-a to re-write query Qe (i.e., replace the tables in the query’s FROM clause by the view and rewrite the rest of the query accordingly to produce an answer).

 

  1. Use your .txt file containing de-normalized data from part-b to answer Qe with python instead of SQL. Note that this solution should not use SQLite or SQL, just python.