Description
1. A web server has “session” interarrival times that are independent and identically
distributed, following an exponential distribution with parameter
λ
= 5
sessions/minute.
(a) (2 marks) What is the mean interarrival time, in seconds?
(b) (2 marks) Suppose that we observe the system at some randomly chosen time T,
and find that 6 seconds have elapsed since the last arrival. What is the mean time
from T until the next arrival?
(c) (2 marks) Give the mean number of session arrivals in a randomly chosen period
of duration 2 minutes.
(d) (2 marks) Give the mean number of session arrivals in a period of duration 3
minutes, conditional on the first of these arrivals occurring exactly 18 seconds
into the period.
2. Consider the following random variables:
W: exponential with parameter
λ
;
X: Erlang with 3 exponential stages, each with parameter
γ
;
Y: Lognormal with parameters
μ =1.0, and
σ
;
1
Z: Pareto with minimum value k, and shape parameter of 1.25.
Suppose that each of these random variables has mean value 12.
(a) (4 marks) What are the values of
λ , γ , σ
, and k?
(b) (4 marks) For each random variable, give the probability that it is less than the
mean.
(c) (4 marks) For each random variable, give the probability that it exceeds 240.
1
I.e., lnY is normally distributed with mean value
μ
and variance
2 σ
. Note that the mean value
of the lognormal distribution with parameters
μ ,σ
is given by
μ (σ / 2)
2
+
e
, and Excel, for
example, has a built-in function for the lognormal cumulative distribution function.
3. (4 marks) Consider a content provider with a large collection of 10,000,000 available
items (for example, user-generated videos). Suppose that the popularities of these
items can be modeled as following a (bounded) Zipf(1.25) distribution. If the most
popular item is accessed at a rate of 1 request/second, what percentage of requests (in
total) are to the “cold” items with access rate less than 1 request/day? Repeat, but
now assuming a Zipf(0.85) distribution.
4. (6 marks) On the class web site, there is a file giving some artificially-generated data
on page reference counts for a hypothetical Web server. The data was generated by
sampling from a (bounded) Zipf(
) distribution, with 125 pages and a total of 3000
references. Give a graph with the natural logarithm (base e) of the page popularity
rank (based on the observed page reference counts) on the x-axis and the natural
logarithm of the page access frequency on the y-axis, and from your graph estimate
the value of
used when generating the data. (Note: the reference count for the i’th
page in the file was incremented whenever the value i was sampled from the Zipf(
)
distribution. However, sometimes by chance a page j ended up with more references
than a page k, for j > k, and so when making your graph you will need to sort the data
so that pages are ordered according to the actual reference counts.)
5. On the class web site, there are two traces of client request arrival times to a media
server. Each trace file contains one client arrival time per line, formatted as
hh:mm:ss, in the order the arrivals occurred. The sep27h12-16 trace records the
arrivals that occurred between 12:00 noon and 4:00pm on Sept. 27th, 2000. The
sep27 trace records all arrivals that occurred during a full 24-hour period from
3:00am on Sept. 27th to 3:00am on Sept. 28th
.
(a) (6 marks) For each trace, give the mean interarrival time and the squared
coefficient of variation of interarrival times. (The squared coefficient of variation
is defined as the variance divided by the square of the mean; for the exponential
distribution, for example, it is equal to 1.)
(b) (6 marks) For the sep27h12-16 trace, create a graph with a base 10 log scale and
maximum value 1 on the y-axis, and a linear scale from 1 to 250 seconds on the xaxis. On your graph plot: (i) the complementary cumulative distribution function
(CCDF, 1-F(x)) of the request interarrival times, (ii) the CCDF of an exponential
distribution with the same mean interarrival time as in the trace, and (iii) the
CCDF of a lognormal distribution with parameters
μ , σ
chosen using maximum
likelihood estimation. (Given n data points x1, x2, …, xn, the maximum likelihood
estimators for the lognormal distribution parameters are
( ) ( ( ) )
n
x
n
x
k k
−
=
=
2
ln μ
,σ
ln
μ
. When applying these formulas with the trace
file data, treat any interarrival times that were recorded as 0 seconds in the trace
file as interarrival times of 0.5 seconds instead.) Use symbols and spacings for
the plotted points, and lengths of the x and y axes, that make the shapes of the
curves easy to see and compare. Which one of the exponential and lognormal
distributions provides a better fit to the data?
(c) (6 marks) Repeat part (b), but with the sep27 trace instead.