Description
Problem 1 (40 points):
Consider the following two processors. P1 has a clock rate of 4 GHz, average CPI of 0.9, and requires the
execution of 5.0E9 instructions. P2 has a clock rate of 3 GHz, an average CPI of 0.75, and requires the
execution of 1.0E9 instructions.
a. One usual fallacy is to consider the computer with the largest clock rate as having the highest
performance. Check if this is true for P1 and P2.
b. Another fallacy is to consider that the processor executing the largest number of instructions will
need a larger CPU time. Considering that processor P1 is executing a sequence of 1.0E9
instructions and that the CPI of processors P1 and P2 do not change, determine the number of
instructions that P2 can execute in the same time that P1 needs to execute 1.0E9 instructions.
c. A common fallacy is to use MIPS (millions of instructions per second) to compare the
performance of two different processors and consider that the processor with the largest MIPS has
the largest performance. Check if this is true for P1 and P2.
d. Another common performance figure is MFLOPS (millions of floating-point operations per
second), defined as:
������ = ��. �� ����������⁄(��������� ���� × 1�6)
Assume that 40% of the instructions executed on both P1 and P2 are floating-point instructions.
Find the MFLOPS figures for the processors.
Problem 2 (20 points):
A program P running on a single-processor system takes time T to complete. Let us assume that 40% of
the program’s code is associated with “data management housekeeping” (according to Amdahl) and,
therefore, can only execute sequentially on a single processor. Let us further assume that the rest of the
program (60%) is “embarrassingly parallel” in that it can easily be divided into smaller tasks executing
concurrently across multiple processors (without any interdependencies or communications among the
tasks).
a. Calculate T2, T4, T8, which are the times to execute program P on a two-, four-, eight-processor
system, respectively.
b. Calculate T∞ on a system with an infinite number of processors. Calculate the speedup of the
program on this system, where speedup is defined as !
!”. What does this correspond to?
Problem 3 (15 points):
Assume that we are considering enhancing a machine by adding a vector mode to it. When a computation
is performed in vector mode, it is 20 times faster than the normal mode of execution. We call percentage
of time that could be spent using vector mode the percentage of vectorization.
a. What percentage of vectorization is needed to achieve a speedup of 2?
b. What percentage of vectorization is needed to achieve one-half of the maximum speedup
attainable from using vector mode?
c. Suppose you have measured the percentage of vectorization for programs to be 70%. The
hardware design group says they can double the speed of vector rate with a significant additional
engineering investment. You wonder whether the compiler crew could increase the use of vector
mode as another approach to increasing performance. How much of an increase in the percentage
of vectorization (relative to the current usage) would the compiler team need to obtain the same
performance gain? Which investment would you recommend?
Problem 4 (15 points):
Assume a program requires the execution of 50 × 10# FP (Floating Point) instructions, 110 × 10# INT
(integer) instructions, 80 × 10# L/S (Load/Store) instructions, and 16 × 10# branch instructions. The CPI
for each type of instruction is 1, 1, 4, and 2, respectively. Assume that the processor has a 2GHz clock
rate.
a. By how much must we improve the CPI of FP (Floating Point) instructions if we want the
program to run two times faster?
b. By how much must we improve the CPI of L/S (Load/Store) instructions if we want the program
to run two times faster?
c. By how much is the execution time of the program improved if the CPI of INT (Integer) and FP
(Floating Point) instructions are reduced by 40% and the CPI of L/S (Load/Store) and Branch is
reduced by 30%?
Problem 5 (10 points):
Processor A has a clock rate of 3.6 GHz and voltage 1.25V. Assume that, on average, it consumes 90W of
dynamic power.
Processor B has a clock rate of 3.4 GHz and voltage of 0.9V. Assume that, on average, it consumes
40W of dynamic power.
For each processor find the average capacitive loads.