Description
The required task is to write a map-reduce program that will perform equijoin.
The code should be in Java (use Java 1.8.x) using HadoopFramework (use Hadoop 2.7.x).
The code would take two inputs, one would be the HDFS location of the file on which the
equijoin should be performed and other would be the HDFS location of the file, where the
output should be stored.
Format of the Input File: –
Table1Name, Table1JoinColumn, Table1Other Column1, Table1OtherColumn2, ……..
Table2Name, Table2JoinColumn, Table2Other Column1, Table2OtherColumn2, ………
Format of the Output File: –
If Table1JoinColumn value is equal to Table2JoinColumn value, simply append both line side by
side for Output. If Table1JoinColumn value does not match any value of Table2JoinColumn,
simply remove them for the output file. You should not include two joins contains same row (No
duplicate joins in output file).
Note: –
Table1JoinColumn and Table2JoinColumn would both be Integer or Real or Double or Float,
basically Numeric.
Example Input : –
R, 2, Don, Larson, Newark, 555-3221
S, 1, 33000, 10000, part1
S, 2, 18000, 2000, part1
R, 3, Sal, Maglite, Nutley, 555-6905
S, 3, 24000, 5000, part1
S, 4, 22000, 7000, part1
R, 4, Bob, Turley, Passaic, 555-8908
Example Output: –
R, 2, Don, Larson, Newark, 555-3221, S, 2, 18000, 2000, part1
S, 1, 33000, 10000, part1
S, 2, 18000, 2000, part1, R, 2, Don, Larson, Newark, 555-3221
R, 3, Sal, Maglite, Nutley, 555-6905, S, 3, 24000, 5000, part1
S, 3, 24000, 5000, part1, R, 3, Sal, Maglite, Nutley, 555-6905
S, 4, 22000, 7000, part1, R, 4, Bob, Turley, Passaic, 555-8908
R, 4, Bob, Turley, Passaic, 555-8908, S, 4, 22000, 7000, part1
Lines marked, as Red should not be present in the Output file, it is just for your clear
understanding.
Submission Instructions: –
Upload a zip file named Assignment4.zip, which will contain three files, equijoin.java,
equijoin.jar and a ReadMe.txt. I will use equijoin.jar to run the assignment. ReadMe.txt should
contain the approach you used for doing this work, basically how your mapper, reducer and
driver is working in short.
This is how I am going to run your submission: –
sudo -u