CS6350 Big data Management Analytics and Management Homework 1 solution

$24.99

Original Work ?

Download Details:

  • Name: Assignment-1-ponnvn.zip
  • Type: zip
  • Size: 21.35 MB

Category: You will Instantly receive a download link upon Payment||Click Original Work Button for Custom work

Description

5/5 - (5 votes)

In this homework, you will be using hadoop/mapreduce to analyze social network data. Q1: Write a MapReduce program in Hadoop that implements a simple “Mutual/Common friend list of two friends”. The key idea is that if two people are friend then they have a lot of mutual/common friends. This program will find the common/mutual friend list for them. For example: Alice’s friends are Bob, Sam, Sara, Nancy Bob’s friends are Alice, Sam, Clara, Nancy Sara’s friends are Alice, Sam, Clara, Nancy As Alice and Bob are friend and so, their mutual friend list is [Sam, Nancy] As Sara and Bob are not friend and so, their mutual friend list is empty. (In this case you may exclude them from your output). Input: 1. mutual.txt The input contains the adjacency list and has multiple lines in the following format: Hence, each line represents a particular user’s friend list separated by comma. 2. userdata.txt The userdata.txt contains dummy data which consist of column1 : userid column2 : firstname column3 : lastname column4 : address column5: city column6 :state column7 : zipcode column8 :country column9 :username column10 : date of birth. Here, is a unique integer ID corresponding to a unique user and is a commaseparated list of unique IDs corresponding to the friends of the user with the unique ID . Note that the friendships are mutual (i.e., edges are undirected): if A is friend with B then B is also friend with A. The data provided is consistent with that rule as there is an explicit entry for each side of each edge. So, when you make the pair, always consider (A, B) or (B, A) for user A and B but not both. Output: The output should contain one line per user in the following format: , <Mutual/Common Friend List> where & are unique IDs corresponding to a user A and B (A and B are friend). < Mutual/Common Friend List > is a comma-separated list of unique IDs corresponding to mutual friend list of User A and B. Please generate/print the Mutual/Common Friend list for the following users: (0,1), (20, 28193), (1, 29826), (6222, 19272), (28041, 28056) Q2. Please use in-memory join at the Mapper to answer the following question. Given any two Users (they are friend) as input, output the list of the first names and the number of unique states their mutual friends stay in. Note that the userdata.txt will be used to get the extra user information and cached/replicated at each mapper. Output format: , < List of mutual friends [name1, name2, … namen], Number of unique states> Sample output: 1234 4312 [John, Jane, Ted], 2 Q3. Please use in-memory join at the Reducer to answer the following question. For each user print User ID and average age of direct friends of this user. Output format: Sample output: 1234 60 Q4. Please use a Combiner to answer the following question. Find friend pair(s) whose number of common friends is the maximum in all the pairs. Note that you need to use the same dataset from Q1. Output Format: , <Mutual/Common friend number> Q5. Write a program that will construct inverted index in the following ways. The map function parses each line in an input file, userdata.txt, and emits a sequence of <word, line number> pairs. The reduce function accepts all pairs for a given word, sorts the corresponding line numbers, and emits a <word, list(line numbers)> pair. The set of all the output pairs forms a simple inverted index. What to submit (i) Submit the source code via the eLearning website. (ii) Submit the output file for each question.