## Description

Problem 1 (40 points):

Consider the following two processors. P1 has a clock rate of 4 GHz, average CPI of 0.9, and requires the

execution of 5.0E9 instructions. P2 has a clock rate of 3 GHz, an average CPI of 0.75, and requires the

execution of 1.0E9 instructions.

a. One usual fallacy is to consider the computer with the largest clock rate as having the highest

performance. Check if this is true for P1 and P2.

b. Another fallacy is to consider that the processor executing the largest number of instructions will

need a larger CPU time. Considering that processor P1 is executing a sequence of 1.0E9

instructions and that the CPI of processors P1 and P2 do not change, determine the number of

instructions that P2 can execute in the same time that P1 needs to execute 1.0E9 instructions.

c. A common fallacy is to use MIPS (millions of instructions per second) to compare the

performance of two different processors and consider that the processor with the largest MIPS has

the largest performance. Check if this is true for P1 and P2.

d. Another common performance figure is MFLOPS (millions of floating-point operations per

second), defined as:

������ = ��. �� ����������⁄(��������� ���� × 1�6)

Assume that 40% of the instructions executed on both P1 and P2 are floating-point instructions.

Find the MFLOPS figures for the processors.

Problem 2 (20 points):

A program P running on a single-processor system takes time T to complete. Let us assume that 40% of

the program’s code is associated with “data management housekeeping” (according to Amdahl) and,

therefore, can only execute sequentially on a single processor. Let us further assume that the rest of the

program (60%) is “embarrassingly parallel” in that it can easily be divided into smaller tasks executing

concurrently across multiple processors (without any interdependencies or communications among the

tasks).

a. Calculate T2, T4, T8, which are the times to execute program P on a two-, four-, eight-processor

system, respectively.

b. Calculate T∞ on a system with an infinite number of processors. Calculate the speedup of the

program on this system, where speedup is defined as !

!”. What does this correspond to?

Problem 3 (15 points):

Assume that we are considering enhancing a machine by adding a vector mode to it. When a computation

is performed in vector mode, it is 20 times faster than the normal mode of execution. We call percentage

of time that could be spent using vector mode the percentage of vectorization.

a. What percentage of vectorization is needed to achieve a speedup of 2?

b. What percentage of vectorization is needed to achieve one-half of the maximum speedup

attainable from using vector mode?

c. Suppose you have measured the percentage of vectorization for programs to be 70%. The

hardware design group says they can double the speed of vector rate with a significant additional

engineering investment. You wonder whether the compiler crew could increase the use of vector

mode as another approach to increasing performance. How much of an increase in the percentage

of vectorization (relative to the current usage) would the compiler team need to obtain the same

performance gain? Which investment would you recommend?

Problem 4 (15 points):

Assume a program requires the execution of 50 × 10# FP (Floating Point) instructions, 110 × 10# INT

(integer) instructions, 80 × 10# L/S (Load/Store) instructions, and 16 × 10# branch instructions. The CPI

for each type of instruction is 1, 1, 4, and 2, respectively. Assume that the processor has a 2GHz clock

rate.

a. By how much must we improve the CPI of FP (Floating Point) instructions if we want the

program to run two times faster?

b. By how much must we improve the CPI of L/S (Load/Store) instructions if we want the program

to run two times faster?

c. By how much is the execution time of the program improved if the CPI of INT (Integer) and FP

(Floating Point) instructions are reduced by 40% and the CPI of L/S (Load/Store) and Branch is

reduced by 30%?

Problem 5 (10 points):

Processor A has a clock rate of 3.6 GHz and voltage 1.25V. Assume that, on average, it consumes 90W of

dynamic power.

Processor B has a clock rate of 3.4 GHz and voltage of 0.9V. Assume that, on average, it consumes

40W of dynamic power.

For each processor find the average capacitive loads.