## Description

For this assignment, you will need the material covered in the lectures

• Lecture 13: Object-relational databases and queries

• Lecture 14: Nested relational and semi-structured databases

To turn in your assignment, you will need to upload to Canvas a single file with name assignment5.sql which contains the necessary SQL statements that solve the problems in this assignment. The assignment5.sql file

must be so that the AI’s can run it in their PostgreSQL environment. You

should use the Assignment-Script-2021-Fall-assignment5.sql file to construct the assignment5.sql file. (Note that the data to be used for this assignment is included in this file.) In addition, you will need to upload a separate

assignment5.txt file that contains the results of running your queries. You

will also see several problems that are listed as practice problems. You should

not include your solutions for practice problems in the materials you submit for

this assignment.

1

1 Formulating Query in Object-Relational SQL

For the problems in the section, you will need to use the polymorphically defined

functions and predicates that are defined in the document

SetOperationsAndPredicates.sql

Functions

set union(A,B) A ∪ B

set intersection(A,B) A ∩ B

set difference(A,B) A − B

add element(x,A) {x} ∪ A

remove element(x,A) A − {x}

make singleton(x) {x}

choose element(A) choose some element from A

bag union(A,B) the bag union of A and B

bag to set(A) coerce the bag A to the corresponding set

Predicates

is in(x,A) x ∈ A

is not in(x,A) x 6∈ A

is empty(A) A = ∅

is not emptyset(A) A 6= ∅

subset(A,B) A ⊆ B

superset(A,B) A ⊇ B

equal(A,B) A = B

overlap(A,B) A ∩ B 6= ∅

disjoint(A,B) A ∩ B = ∅

We now turn to the problems in this section. You will need use the data

provided for the Person, Company, companyLocation, worksFor, jobSkill,

personSkill, and Knows relations. But before turning to the problems, we will

introduce various object-relational views defined over these relations:1

• The view companyHasEmployees(cname,employees) which associates with

each company, identified by a cname, the set of pids of persons who work

for that company.

create or replace view companyHasEmployees as

select cname, array(select pid

from worksfor w

where w.cname = c.cname order by 1) as employees

from company c order by 1;

1The various order by clauses in these views are not essential: they simply aid to read the

data more easily.

2

• The view cityHasCompanies(city,companies) which associates with each

city the set of cnames of companies that are located in that city.

create or replace view cityHasCompanies as

select city, array_agg(cname order by 1) as companies

from companyLocation

group by city order by 1;

• The view companyHasLocations(cname,locations) which associates with

each company, identified by a cname, the set of cities in which that company is located.

create or replace view companyHasLocations as

select cname, array(select city

from companyLocation cc

where c.cname = cc.cname order by 1) as locations

from company c order by 1;

• The view knowsPersons(pid,persons) which associates with each person, identified by a pid, the set of pids of persons he or she knows.

create or replace view knowsPersons as

select p.pid, array(select k.pid2

from knows k

where k.pid1 = p.pid order by pid2) as persons

from person p order by 1;

• The view isKnownByPersons(pid,persons) which associates with each

person, identified by a pid, the set of pids of persons who know that person.

Observe that there may be persons who are not known by any one.

create or replace view isKnownByPersons as

select distinct p.pid, array(select k.pid1

from knows k

where k.pid2 = p.pid) as persons

from person p order by 1;

• The view personHasSkills(pid,skills) which associates with each person, identified by a pid, his or her set of job skills.

create or replace view personHasSkills as

select distinct p.pid, array(select s.skill

from personSkill s

where s.pid = p.pid order by 1) as skills

from person p order by 1;

3

• The view skillOfPersons(skills,persons) which associates with each

job skill the set of pids of persons who have that job skill.

create or replace view skillOfPersons as

select js.skill, array(select ps.pid

from personSkill ps

where ps.skill = js.skill order by pid) as persons

from jobSkill js order by skill;

In the problems in this section, you are asked to formulate queries in objectrelational SQL. You should use the set operations and set predicates defined in

the document SetOperationsAndPredicates.sql, the relations

Person

Company

Skill

worksFor

and the views

companyHasEmployees

cityHasCompanies

companyHasLocations

knowsPersons

isKnownByPersons

personHasSkills

skillOfPersons

However, you are not permitted to use the Knows, companyLocation, and

personSkill relations in the object-relation SQL formulation of the queries.

Observe that you actually don’t need these relations since they are encapsulated

in these views.

Before listing the queries that you are asked to formulate, we present some

examples of queries that are formulated in object-relational SQL using the assumptions stated in the previous paragraph. Your solutions need to be in the

style of these examples. The goals is to maximize the utilization of the functions

and predicates defined in document SetOperationsAndPredicates.sql.

Example 1 Consider the query “ Find the pid of each person who knows a

person who has a salary greater than 55000.”2

select distinct pk.pid

from knowsPersons pk, worksfor w

where is in(w.pid, pk.persons) and w.salary > 55000

order by 1;

Note that the following formulation for this query is not allowed since it uses

the relation Knows which is not permitted.

2

In this example, focus on the is in predicate.

4

select distinct k.pid1

from knows k, worksfor w

where k.pid2 = w.pid and w.salary > 55000;

Example 2 Consider the query “ Find the pid and name of each person p who

(1) has both the AI and Programming and (2) knows at least 5 persons, and

report the number of persons who know p.”3

select p.pid, p.pname, (select cardinality(kp.persons)

from isKnownByPersons kp

where kp.pid = p.pid) as ct_knownByPersons

from person p

where p.pid in (select ps.pid

from personHasSkills ps

where subset(’”AI”, “Programming”’, ps.skills)) and

cardinality((select kp.persons

from knowsPersons kp

where kp.pid = p.pid)) >= 5;

Example 3 Consider the query “Find the pid and name of each person along

with the set of his of her skills that are not among the skills of persons who

work for ‘Netflix’”.4

select p.pid, p.pname, set difference((select ps.skills

from personHasSkills ps

where ps.pid = p.pid),

array(select unnest(ps.skills)

from personHasSkills ps

where is in(ps.pid, (select employees

from companyHasEmployees

where cname = ’Netflix’))))

from person p;

1. Formulate the following queries in object-relational SQL.

(a) Find the cname and headquarter of each company that employs at

least two persons who each have both the AI and the Programming

job skills.

(b) Find each skill that is not a job skill of any person who works for

Yahoo or for Netflix.

(c) Find the set of companies that employ at least 3 persons who each

know at least five persons. (So this query returns only one object,

i.e., the set of companies specified in the query.)

(d) Find the pid and name of each person p along with the set of pids of

persons who (1) know p and (2) who have the AI skill but not the

Programming skill.

3

In this example, focus on the set (array) construction ’{“AI”, “Programming”}’ and the

subset predicate. Also focus on the use of cardinality function.

4

In this example, focus on (1) the set difference operation and (2) the unnest operation

followed by a set (array) construction.

5

(e) Find each pair (s1, s2) of different skills s1 and s2 such that the

number of employees who have skill s1 and who make strictly more

than 55000 is strictly less than the number of employees who have

skill s2 and who make at most 55000.

(f) (Practice Problem: not-graded).

Find each (c, p) pair where c is the cname of a company and p is the

pid of a person who works for that company and who is known by

all other persons who work for that company.

(g) (Practice Problem: not-graded).

Find the pid and name of each person who has all the skills of the

combined set of job skills of the highest paid persons who work for

Yahoo.

2. Find the following set of sets

{S | S ⊆ Skill ∧ |S| ≤ 3}.

I.e., this is the set consisting of each set of job skills whose size (cardinality)

is at most 3.

3. (Practice Problem: not-graded).

Reconsider Problem 2. Let

S = {S | S ⊆ Skill ∧ |S| ≤ 3}.

Find the following set of sets

{X | X ⊆ S ∧ |X| ≤ 2}.

4. Let t be a number called a threshold. We say that a (unordered) pair of

different person pids {p1, p2} co-occur with frequency at least t if there

are at least t skills that are skills of both the person with pid p1 and the

person with pid p2.

Write a function coOccur(t integer) that returns the (unordered) pairs

{p1, p2} of person pid that co-occur with frequency at least t.

Test your coOccur function for t in the range [0, 3].

5. Let A and B be sets such that A ∪ B 6= ∅. The Jaccard index J(A, B) is

defined as the quantity

|A ∩ B|

|A ∪ B|

.

The Jaccard index is a frequently used measure to determine the similarity

between two sets. Note that if A ∩ B = ∅ then J(A, B) = 0, and if A = B

then J(A, B) = 1.

Let t be a number called a threshold. We assume that t is a float in the

range [0, 1].

6

Write a function JaccardSimilar(t float) that returns the set of unordered pairs {s1, s2} of different skills such that the set of persons who

have skill s1 and the set of persons who have skill s2 have a Jaccard index

of at least t.

Test your function JaccardSimilar for the following values for t: 0, 0.25,

0.5, 0.75, and 1.

7

2 Nested Relations and Semi-structured databases

Consider the lecture on Nested relational and semi-structured databases. In that

lecture we considered the studentGrades nested relation and the jstudentGrades

semi-structured database and we constructed these using a PostgreSQL query

starting from the Enroll relation.

6. Write a PostgreSQL view courseGrades that creates the nested relation

of type

(cno, gradeInfo{(grade, students{(sid)})})

This view should compute for each course, the grade information of the

students enrolled in this course. In particular, for each course and for

each grade, this relation stores in a set the sids students who obtained

that grade in that course.

Test your view.

7. Starting from the courseGrades view in Problem 6 solve the following

queries:

(a) Find each pair (c, S) where c is the cno of a course and S is the set of

sids of students who received an ‘A’ or a ‘B’ in course c. The type of

your answer relation should be (cno : text, Students : {(sid : text)}).

(b) Find each (s, C) pairs where s is the sid of a students and C is the set

of cnos of courses in which the student received an ‘A’. The type of

your answer relation should be (sid : text, Courses : {(cno : text)}).

(c) (Practice Problem: not-graded).

Find each cno c where c is a course in which all students received the

same grade.

8. Write a PostgreSQL view jcourseGrades that creates a semi-structured

database which stores jsonb objects whose structure conforms with the

structure of tuples as described for the courseGrades in Problem 6.

Test your view.

9. Starting from the jcourseGrades view in Problem 8 solve the following

queries. Note that the output of each of these queries is a nested relation.

(a) Find each pair (c, s) where c is the cno of a course and s is the sid of

a student who did not received an ‘A’ in course c. The type of your

answer relation should be (cno:text, sid:text).

(b) Find each pair ({c1, c2}, S) where c1 and c2 are the course numbers of

two different courses and S is the set of sids of students who received

a ’B’ in both courses c1 and c2. The type of your answer relation

should be (coursePair : {(cno : text)}, Students : {(sid : text))}.

8