Description
1. Problem Statement
Given four input matrices π΄, π΅, πΆ, and π·. Compute the output matrix, π = (π΄ + π΅ π ) πΆ π· π Write an efficient code to compute the output matrix. While writing the code, consider aspects like memory coalescing, shared memory, degree of divergence, etc.
2. Input and Output
2.1. Input β 4 integers: π, π, π and π β Matrix π΄ of size π Γ π β Matrix π΅ of size π Γ π β Matrix πΆ of size π Γ π β Matrix π· of size π Γ π 2.2. Output β Matrix π of size π Γ π 2.3. Constraints β 2 β€ π, π, π, π β€ 2 10 β All the elements in the input matrices will be in the range [-10, 10]
3. Sample Testcase
β Input matrices π΄, π΅, πΆ and π·: Input will be given as: 2 3 3 2 2 5 0 3 -2 1 6 1 -4 2 1 3 1 9 6 -6 7 2 2 4 -3 10 0 5 1 3 -3 First line represents the values π, π, π and π Next π lines represents the rows of matrix π΄ Next π lines represents the rows of matrix π΅ Next π lines represents the rows of matrix πΆ Next π lines represents the rows of matrix π· β (π΄ + π΅ π ) β Output matrix, π = (π΄ + π΅ π ) πΆ π· π
4. Points to be noted
β The file βmain.cuβ provided by us contains the code, which takes care of taking the input, printing the result and printing the execution time. β Donβt write any code in the main() function. β You need to implement the compute() function provided in the βmain.cuβ. β You are free to use any number of functions/kernels. β You can launch the kernels as you wish. β It is compulsory to optimize for coalesced accesses. Also, make use of shared memory. β Do not write any print statements. β Test your code on large input matrices.
5. Submission Guidelines
β Use the file βmain.cuβ provided by us. β Donβt change anything in the main() function. β Rename the file βmain.cuβ, which contains the implementation of the above-described functionality, to .cu β For example, if your roll number is CS20M039, then the name of the file you submit on the Moodle should be CS20M039.cu (submit only the .cu file). β After submission, download the file and make sure it was the one you intended to submit.
6. Learning Suggestions
β Write a CPU-version of code achieving the same functionality. Time the CPU code and GPU code separately for large matrices and compare the performances. β Exploit shared memory as much as possible to gain performance benefits. β Try reducing thread divergence as much as possible.
Custom Work, Just for You!
Canβt find the tutorial you need? No worries! We create custom, original work at affordable prices! We specialize in Computer Science, Software, Mechanical, and Electrical Engineering, as well as Health Sciences, Statistics, Discrete Math, Social Sciences, Law, and English.
Custom/Original Work Essays cost as low as $10 per page.
Programming Custom Work starts from $50.
Get top-quality help now!