Description
Question 1 (16 points)
Assume that a base cuboid of 6 dimensions contains only 3 base cells:
(a1, a2, a3, a4, a5, a6), (b1, b2, a3, a4, a5, a6), and (c1, c2, a3, a4, a5, a6)
where ai 6= bi, bi 6= ci and ai 6= ci, 8i = 1, 2. There is no dimension with concept hierarchy.
The measure of the cube is count. The count of each base cell is 1.
Purpose
• Have a better understanding of cubes, multidimensional view of data, and cuboid
structures.
Requirements
• Include final results and explain how you calculate the cells. Keep it brief and clear.
a. (4’) How many cuboids are there in the full data cube?
b. (4’) How many distinct aggregated (i.e., non-base) cells will a complete cube contain?
c. (4’) How many distinct aggregated cells will an iceberg cube contain, if the condition
of the iceberg cube is count 3?
d. (4’) How many non-star dimensions does the closed cell with count = 3 have?
Question 2 (24 points)
We give you an artificially generated dataset Data Q2.txt in data.zip. It contains 100 business records. Each row is a business record and data fields in each row are separated by tabs.
For each business, it has (Business ID, City, State, Category, Price, Quarter-of-Year,
Year). The four quarters in a year are represented as Q1, Q2, Q3 and Q4. We now want
to construct a cube over four dimensions (Location, Category, Price, Time) with count as
the measure. For the Location dimension and the Time dimension, there is a concept hierarchy, i.e. City-State and Quarter-Year. You are required to answer following questions.
Purpose
• Have a better understanding of measures and cuboid structures.
Requirements
• For sub-question (a), you should show the final result with a brief explanation or
intermediate steps in the PDF file you will submit.
• For sub-questions (b), (c), (d), (e), (f), you should write scripts to manipulate data
and show your answers in the PDF file you will submit. There is no restrictions on the
language you use and you are allowed to use any built-in functions. You are required
to submit your source code.
a. (4’) How many cuboids are there in the cube?
b. (4’) How many distinct cells are there in the cuboid (Location (City), Category,
Price, Time (Year))?
c. (4’) If we roll up by climbing up in the Location dimension from City to State, how
many distinct cells are there in the cuboid (Location (State), Category, Price, Time
(Year))?
d. (4’) How many distinct cells are there in the cuboid (*, Category, Price, Time
(Quarter))?
e. (4’) What is the count for the cell (Location (State) = Illinois, Category = F ood,
*, Time (Quarter)= Q1)?
f. (4’) What is the count for the cell (Location (City) = Chicago, *, Price = cheap,
Time (Year)= 2013)?
Question 3 (15 points)
We have a data array containing 3 dimensions A, B and C. The 3-D array is divided into
small chunks. Each dimension is divided into 3 equally sized partitions. See Figure 1. For
example, dimension A is divided into a0, a1, and a2, and dimension B is divided into b0, b1,
and b2.
There are totally 27 chunks and each chunk is represented by a sub-cube aibj ck. The
cardinality (size) of the dimensions A, B, and C is 900, 300, and 600. Since we divide each
dimension into 3 parts with equal size, the sizes of the chunks on dimensions A, B, and C
are 300, 100, and 200 respectively.
Now we want to use Multiway Array Aggregation
Computation to materialize cubes. The base cuboid ABC is computed as a 3-D array. We
want to materialize the 2-D cuboids AB, AC and BC. Please answer the following questions.
Purpose
• Have a better understanding of Multiway Array Aggregation Computation.
Requirements
• Show your results in the PDF file you will submit. You should also provide some
important intermediate steps in calculation. Only providing a result will not get credits.
a. (7’) If we scan the chunk in the order 1, 2, 3…., 27 when materializing the 2-D cuboids
AB, AC and BC, to avoid reading a 3-D chunk into memory repeatedly, what is the
minimum memory requirement to hold all the related 2-D planes?
b. (8’) Do you think there exist other orders for chunk scanning so that the memory cost
is less than that in sub-question (a)? If yes, show that order for chunk scanning using
chunk numbers (e.g. 1, 2, 3…, 27) and the minimum memory requirement with your
calculating process. Otherwise, provide your reason.
Figure 1: A 3-D array with dimensions A, B and C. This array is divided into 27 smaller
chunks.
Question 4 (15 points)
We have a 3-D data array containing three dimensions A, B, C. The data contained in the
array is as follows:
(a0, b0, c0):1 (a0, b0, c1):1 (a0, b0, c2):1
(a0, b1, c0):1 (a0, b1, c1):1 (a0, b1, c2):1
(a0, b2, c0):1 (a0, b2, c1):1 (a0, b2, c2):1
(a0, b3, c0):1 (a0, b3, c1):1 (a0, b3, c2):1
You are required to use the Bottom-Up Computation (BUC) method to materialize the
cube. Please answer the following questions.
Purpose
• Have a better understanding of the BUC algorithm.
Requirements
• For sub-question (a), you are allowed to use any painting software to draw the tree
and paste your plot to the PDF file you will submit.
• For sub-questions (b) and (c), you should write your answers on the PDF file you will
submit.
a. (5’) Draw the trace tree of expansion with the exploration order: A ! B ! C.
b. (5’) If we set min support = 4 with the exploration order of A ! B ! C, how many
cells would be considered/computed? You should report the number of cells which
would be considered/computed in the PDF file you will submit. For these cells, you
should also list each cell with its count and report whether the cell is expansible in the
BUC process.
(Hint: To make you better understand the question and know how to
answer the question, please refer to the SampleQuestionBUC.pdf file in Chapter 5 on
the course website.)
c. (5’) If we set min support = 4 with the exploration order of B ! A ! C, how many
cells would be considered/computed? You should report the number of cells which
would be considered/computed in the PDF file you will submit. For these cells, you
should also list each cell with its count and report whether the cell is expansible in the
BUC process.
Question 5 (10 points)
This is a set of true or false questions. Please answer the following questions.
Purpose
• Have a better understanding of some basic concepts in Chapter 4 and Chapter 5.
Requirements
• For each sub-question, select true (T) or false (F) and provide a brief explanation
for your selection. You will not get credit without explanation.Write your answer in
the PDF file you will submit.
a. (2’) T/F. Operational update is a very important issue for data warehouse.
b. (2’) T/F. Suppose we pick two cells A and B from a data cube. A is (a0, b0, ⇤, d0) and
B is (a0, b0, c0, d0). Then, cell A is a child of cell B.
c. (2’) T/F. In OLAP operations, we can see more detailed data information by rolling
up.
d. (2’) T/F. The Bottom-Up Computation (BUC) algorithm can be used to compute
either the full cube or the partial cube.
e. (2’) T/F. The Multiway Array Aggregation Computation is most e↵ective when the
product of the cardinalities of dimensions is very high.
Mini Machine Problem (20 points)
CubesViewer is a visual, web-based tool application for exploring and analyzing OLAP
databases served by the Cubes OLAP Framework1. The CubesViewer Explorer demo can
be found at https://crow.cs.illinois.edu:8080/cubesviewer/. You can login with user
cs412, password cs412f2015
Purpose
• Have a better understanding of Data Cubes and OLAP operations.
• Get some hands-on experience with OLAP.
Requirements
• List OLAP operations necessary to reach a particular cube, and include the screenshots of final results. For each operation, you need to specify the operation type
(roll-up, drill-down, slice, dice) and the related dimension (product, geo,
browser, etc.).
• Play around with CubesViewer to find interesting insights, such as “the sale of sports
goods during quarter x in 2012 is much better than other quarters of the same year”.
a. (5’) Regarding dataset “Webshop/Sales” on CubesViewer, what is the product under
category Sports with the most revenue in Europe during the first three quarters of year
2012? And what is the least? List OLAP operations necessary to reach the cube that
can answer the questions above.
Show the screenshot of the chart that is generated
from the resulting cube by CubesViewer (you must choose the appropriate measure in
the View menu in order to generate the chart).
b. (4’) Regarding dataset “Website/Visits” on CubesViewer, what is the popular way,
specified by a particular source and a particular browser, customers from North America used to visit the online store? List OLAP operations necessary to reach the cube
that can answer the questions above. Show the screenshot of the chart that is generated
from the resulting cube by CubesViewer (you must choose the appropriate measure in
the View menu in order to generate the chart).
c. (3’) Regarding dataset “Website/Visits” on CubesViewer, show the screenshot of the
chart that describes the changes of the visit counts from North America along time.
d. (8’) For each dataset ( “Webshop/Sales” and “Website/Visits”), come up with an
interesting cube that might help the shop owner make some decisions. This is an open
question. You will have the full mark if you can list the OLAP operations to reach the
cubes, and what kinds of decisions we can make after looking at the cubes.
1https://github.com/jjmontesl/cubesviewer