Assignment #2 CSCI 5408 (Data Management, Warehousing, Analytics) solution

$30.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (3 votes)

Problem #1: This problem contains one reading task. Reading Material #1: To retrieve the paper, visit IEEE database through libraries.dal.ca Y. Gao, X. Gao, X. Yang, J. Liu and G. Chen, “An Efficient Ring-Based Metadata Management Policy for Large-Scale Distributed File Systems,” in IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 9, pp. 1962-1974, 1 Sept. 2019, doi: 10.1109/TPDS.2019.2901883. → Read the paper and perform the following: • Write a summary ( 1.5 page) on the paper in your own words. (you do not need to add images/figures/tables from the paper. However, you can add your own block diagrams or flowcharts to support the summary you have written) • What is the central idea of discussion? • Did you find any topic of interest in this paper? If Yes, what are those, and why do you think those are interesting? If No, then as per you, what are the shortcomings of this paper? Problem #1 Submission Requirements: Approx. 1.5-page Report (problem1.pdf) containing the summary and analysis Winter 2022 saurabh.dey@dal.ca Problem #2: This problem contains series of tasks. You need to fill a worksheet and provide screenshots to support your worksheet Research and Development: You need to create distributed DBMS infrastructure, and perform a comparative study of local and distributed transactions Visit the website and extract the following datasets: https://www.kaggle.com/olistbr/brazilian-ecommerce Task 1: Building database from given dataset • Download the CSV files given in the Kaggle link, and consider those as your data points. o Similar to previous assignment (Assignment 1), clean and format each csv file (replace null values with ‘0’ if it is a number column || replace null values with ‘ ’ if it is a text/string column) – Write a ½ page summary on what type of cleaning and formatting you have applied on the CSV files. A single ½ page summary for all CSV files is sufficient. Add summary to problem2.pdf o You do not need to remove any columns, keep all columns as it is in CSVs. • Create a single database (name of the database “A2”) in your local MySQL RDBMS. • Create 9 tables within the newly created database. These tables will contain structure identical to the CSV files you cleaned, and you will populate the tables by importing data from the CSV files. (if files are large, you need to import using command prompt. Workbench import option may not work) • Export the SQL (structure + value) of the database with all tables and data. Name the file “MyDatabase.SQL” • Create a GCP MySQL Virtual Machine instance and use the MyDatabase.SQL to create the database, tables, and uploading data. • If you have successfully completed the above steps – it means, at this point, you have two identical databases. One in your local machine and another in the cloud. • In Problem2.pdf, attach screenshots of your cloud instance, and local instance running mysql command “SHOW TABLES FROM A2;” Task 2: Perform Transaction in your local machine • In your local machine MySQL, you have 9 tables in the database A2. You need to write a Transaction with at least (2 insert, 2 delete, 2 update, 3 select operations) considering all the 9 tables. For insert, and update you can create your own set of dummy data. Based on your transaction query formation, complexity, and uniqueness you will get points for Novelty • Capture the transaction execution time (i.e. execution time for all 9 or more statements Begin Transaction —– —– Winter 2022 saurabh.dey@dal.ca End Transaction Task 3: Perform Transaction in the Remote machine • In your GCP MySQL virtual machine MySQL, you have 9 tables in the database A2. You need to write a Remote Transaction with at least (2 insert, 2 delete, 2 update, 3 select operations) considering all the 9 tables. For insert, and update you can create your own set of dummy data. Based on your transaction query formation, complexity, and uniqueness you will get points for Novelty. You should not use the same queries that you used in Task 2. Please do some major modifications in the SQL queries • Capture the transaction execution time (i.e. execution time for all 9 or more statements Begin Transaction —– —– End Transaction Task 4: Perform Distributed Transaction • In your GCP MySQL virtual machine, and in your local MySQL, you have 9 tables in the database A2. You need to write a Distributed Transaction with at least (2 insert, 2 delete, 2 update, 3 select operations) considering all the 9 tables. • This time, from the 9 unique tables, you need to use any 5 tables (your choice) from your Local machine database, and the remaining 4 tables in the GCP instance for performing the distributed transaction. • For insert, and update you can create your own set of dummy data. Based on your transaction query formation, complexity, and uniqueness you will get points for Novelty. Please do some major modifications in the SQL queries. • You need to write a Java program to call the distributed transaction. Hint: Set connection.setAutoCommit(false); in your Java program to control the execution and database update o Run Java Program: o within Java program select the 1st database (Local machine) Begin Transaction —– —– select the 2nd database (Remote machine) —– —– End Transaction connection.setAutoCommit(true); • Capture the distributed transaction execution time (i.e. execution time for all 9 or more statements • Fill the worksheet and submit it as part of Problem2.pdf Tasks Execution Time Transaction Query Your Observations Task 2 Write the transaction query here for each transaction Task 3 Task 4 Winter 2022 saurabh.dey@dal.ca Problem #2 Submission Requirements: 1. Upload your program code for Problem 2 (Task 4) to gitlab (https://git.cs.dal.ca). 2. Provide summary needed for Task 1, and screenshots 3. Provide Queries of all transactions you performed in Task2, Task3, and Task4. 4. Provide the filled worksheet