## Description

1. Catastrophic cancellation in the subtraction x y can occur when x and y are subject to

round-o↵ error. Specifically, if fl(x) = x(1 + u) and fl(y) = y(1 + v) then

fl(x) fl(y) = x y + (xu yv)

where the absolute error |xu yv| can be very large if both x and y are large; in some cases,

this error may swamp the object we are trying to compute, namely x y, particularly if

|x y| is relatively small compared to |x| and |y|. For example, if we compute the sample

variance using the right hand side of the identity

Xn

i=1

(xi x¯)

2 = Xn

i=1

x2

i 1

n

Xn

i=1

xi

!2

, (1)

a combination of round-o↵ errors from the summations and catastrophic cancellation in the

subtraction may result in the computation of a negative sample variance! (In older versions of

Microsoft Excel, certain statistical calculations were prone to this unpleasant phenomenon.)

In this problem, we will consider two algorithms for computing the sample variances that

avoid this catastrophic cancellation. Both are “one pass” algorithms, in the sense that we

only need to cycle once through the data (as is the case if we use the right hand side of (1));

to use the left hand side of (1), we need two passes since we need to first compute ¯x before

computing the sum on the left hand side of (1). In parts (a) and (b) below, define ¯xk be the

sample mean of x1, ··· , xk and note that

x¯k+1 = k

k + 1

x¯k +

1

k + 1

xk+1

with ¯x = ¯xn.

(a) Show that Xn

i=1

(xi x¯)

2 can be computed using the recursion

k

X

+1

i=1

(xi x¯k+1)

2 = X

k

i=1

(xi x¯k)

2 +

k

k + 1(xk+1 x¯k)

2

for k = 1, ··· , n 1. (This is known as West’s algorithm.)

(b) A somewhat simpler one-pass method replaces ¯x by some estimate x0 and then corrects

for the error in estimation. Specifically, if x0 is an arbitrary number, show that

Xn

i=1

(xi x¯)

2 = Xn

i=1

(xi x0)

2 n(x0 x¯)

2

.

(c) The key in using the formula in part (b) is to choose x0 to avoid catastrophic cancellation,

that is, x0 should be close to ¯x. How might you choose x0 (without first computing ¯x) to

minimize the possibility of catastrophic cancellation? Ideally, x0 should be calculated using

o(n) operations.

(An interesting paper on computational algorithms for computing the variance is “Algorithms

for computing the sample variance: analysis and recommendations” by Chan, Golub, and

LeVeque; this paper is available on Blackboard.)

2. Suppose that X1, X2, ··· is an infinite sequence of independent identically distributed

random variables with some distribution function F and N is a Poisson distribution with

mean that is independent of the sequence {Xi}. Then we can define a compound Poisson

random variable Y by

Y = X

N

i=1

Xi

where Y = 0 if N = 0. This distribution arises naturally in risk theory and insurance – for

example, if N represents the number of claims and X1, X2, ··· the amounts paid for each

claim then Y is the total sum paid. For the purposes of risk management, it is useful to

know the distribution of Y , particularly its tail.

(a) Suppose that {Xi} are discrete integer-valued random variables with probability generating function (s) = E(sXi ). Show that the probability generating function of Y is g((s))

where g(s) is the probability generating function of N, which is given by

g(s) = exp((1 s)).

(b) The distribution of Y can be approximated using the Fast Fourier Transform. Assume

that the random variables {Xi} have a distribution p(x) on the integers 0, 1, ··· , `. The

complication is that, unlike the distribution of S = X1 + ··· + Xn, the distribution of Y is

not concentrated on a finite set of integers. Therefore, we need to find an integer M such

that P(Y M) is smaller than some pre-determined threshold ✏; M will depend on the

Poisson mean as well as the integer ` (or more precisely, the distribution p(x)). (Also

to optimize the FFT algorithm, M should be a power of 2 although this isn’t absolutely

necessary unless M is very large.) Show that if P(N m) ✏ then P(Y m`) ✏ and so

we can take M m`.

(c) The bound M determined in part (b) is typically very conservative and can be decreased

substantially. One approach to determining a better bound is based on the probability

generating function of Y derived in part (a) and Markov’s inequality. Specifically, if s > 1

we have

P(Y M) = P(sY sM)

E(sY )

sM = exp((1 (s)))

sM .

Use this fact to show that for P(Y M) < ✏, we can take

M = inf

s>1

ln(✏) (1 (s))

ln(s) .

(d) Given M (which depends on ✏), the algorithm for determining the distribution of Y goes

as follows:

1. Evaluate the DFT of {p(x) : x = 0, ··· , M 1}:

pb(j) =

M

X1

x=0

exp ✓

2⇡◆

j

M x

◆

p(x)

where p(` + 1) = ··· = p(M 1) = 0.

2. Evaluate g(pb(j)) = exp((1 pb(j))) for j = 0, ··· , M 1.

3. Evaluate P(Y = y) by computing the inverse FFT:

P(Y = y) = 1

M

M

X1

j=0

exp ✓

2⇡◆

y

M j

◆

g(pb(j))

Write an R function to implement this algorithm where M is determined using the method

in part (c) with ✏ = 105. Use this function to evaluate the distribution of Y in the case

where = 7 and the distribution of {Xi} is

p(x) = P(Xi = x) = 4 |3 x|

16

for x = 0, ··· , 6.

(You do not need to evaluate the bound M from part (c) with great precision; for example,

a simple approach is to take a discrete set of points S = {1 < s1 < s2 < ··· < sk} and define

M = min s2S

ln(✏) (1 (s))

ln(s)

where = si+1 si and sk are determined graphically (that is, by plotting the appropriate

function) so that you are convinced that the value of M is close to the actual infimum.

On Blackboard, there is a paper by Embrechts and Frei “Panjer recursion versus FFT for

compound distributions”, which (not surprisingly) compares the FFT approach to alternative approach (Panjer recursion). This paper includes some R code for the FFT algorithm

described above and discusses some modifications to the basic FFT algorithm – feel free to

use this R code as a template for your function. (Feel free to explore some of the other

aspects of this paper for your own edification.)

Supplemental problems (not to hand in):

3. An alternative to the Quicksort algorithm is Mergesort (see p. 122 of the text). The

Mergesort algorithm works by merging pairs of ordered lists. If the two lists contain r and

s elements, respectively, then the number of comparisons needed to merge the two lists into

single list lies between min(r, s) and r + s 1.

Define C(n) to be the (random) number of comparisons in the Mergesort algorithm. It can

be shown that the expected number of comparisons, E[C(n)] satisfies the recursive formula

E[C(n)] = E(Zn) + E[C(bn/2c)] + E[C(dn/2e)]

where bn/2c = dn/2e = n/2 if n is even with bn/2c = (n 1)/2, dn/2e = (n + 1)/2 if n is

odd, and Zn is a random variable whose distribution is

P(Zn = k) =

k 1

bn/2c 1

!

+

k 1

dn/2e 1

!

n

bn/2c

! for bn/2c k n 1

and whose expected value is

E(Zn) = bn/2cdn/2e

1

bn/2c + 1 +

1

dn/2e + 1!

(Zn represents the number of comparisons needed to merge two ordered lists of lengths bn/2c

and dn/2e, respectively.)

(a) In class, we derived the following recursive formula for E[C(n)] for the Quicksort algorithm:

E[C(n)] = n 1 +

2

n

Xn

k=1

E[C(k 1)]

where E[C(0)] = E[C(1)] = 0. Using R, compute E[C(n)] for n = 1, ··· , 1000 and compare

the average performance of Quicksort to that of Mergesort.

(b) Calculate the worst case behaviour of C(n) for Mergesort.

4. Problem 3.6 of the text. As the hint suggests, prove the result for n = pk and then show

that it holds for n between pk and pk+1. Use the result to show that the worst case behaviour

of C(n) for Mergesort is O(n ln(n)).