Description
Question 1) Normal Distribution
We say x is a normal or Gaussian random variable with parameter π and π
2
if its density
function is given by:
π(π₯; π, π
2
) =
1
β2ππ
2
π
β(π₯βπ)
2/2π
2
and its distribution function is given by:
πΉ(π₯; π, π
2
) = β« π(π¦; π, π
2
)ππ¦
π₯
ββ
We can express πΉ(π₯; π, π
2
) in term of the error function (erf) as follows:
πΉ(π₯; π, π
2
) =
1
2
erf (
π₯ β π
β2π
2
) +
1
2
The probability density function (pdf) and cumulative distribution function (cdf) of normal
distribution can also be calculated using two built-in functions norm.pdf and norm.cdf from the
scipy.stats package in Python.
a) Write two Python function on your own based on above equations, one for calculating
normal pdf and one for calculating normal cdf. (treating π₯, π, π
2
as inputs of the
functions)
b) With x=[-6, 6], calculate the pdf and cdf using the functions you wrote above, and plot
them for the following pairs of (π, π
2
): (0, 1), (0,10β1
), (0, 10β2
), (-3, 1), (-3, 10β1
), (-3,
10β2
). (Please plot them in two figures: one contains all the pdf curves, and one
contains all the cdf curves)
c) What can you observe about the affect of π πππ π
2 on normal pdf and cdf curves?
Question 2) Central Limit Theorem
Assuming π1,π2, β¦ , ππ are independent random variables having the same probability
distribution with mean π and standard deviation π, consider the sum ππ = π1 + π2 + β― + ππ.
This sum ππ is a random variable with mean πππ = ππ and standard deviation πππ = πβπ.
The Central Limit Theorem states that as the probability distribution of the random variable ππ
will approach a normal distribution with mean πππ
and standard deviation πππ
, regardless of the
original distribution of the random variables π1,π2, β¦ , ππ.
It is noted that the PDF of the normally distributed random variable ππ is given by:
π(ππ
) =
1
πππβ2π
π
β
(π₯βπππ
)
2
2πππ
2
This problem will help you get more understanding about the Central Limit Theorem. After
plotting the required plots, you can see that even if the individual distributions of a RV do not
look anything like Gaussian, when you add enough of the identical RVs together, the result is a
Gaussian with a mean equal to the sum of the individual means of the RVs, and a standard
deviation equal to the square root of the sum times the individual RVβs standard deviation.
Below is the question:
Consider a collection of books, each of which has thickness W. The thickness W is a random
variable, uniformly distributed between a minimum of a=1 and a maximum of b=3 cm. use the
values of a and b that were provided to you, and calculate the mean and standard deviation of
the thickness.
Use the following table to report the results:
Mean thickness of a single book (cm) Standard deviation of thickness (cm)
ππ = ππ =
The books are piled in stacks of n=1, 5, 10, or 15 books. The width ππ of a stack of n books is a
random variable (the sum of the widths of the n books). This random variable has a mean πππ =
ππ and a standard deviation of πππ = πβπ.
Calculate the mean and standard deviation of the stacked books, for the different values of n=1,
5, 10, or 15. Use the following table to report the results:
Number of books n Mean thickness of a stack of
n books (cm)
Standard deviation of the
thickness for n books
n=1 πππ
= πππ
=
n=5 πππ
= πππ
=
n=15 πππ
= πππ
=
Perform the following simulation experiments, and plot the results.
a) Make n=1 and run N=10,000 experiments, simulating the random variable π = π1.
b) After the N experiments are completed, create and plot a probability histogram of the
random variable S.
c) On the same figure, plot the normal distribution probability function f(x), and compare
the probability histogram with the plot of f(x)
π(ππ
) =
1
πππβ2π
π
β
(π₯βπππ
)
2
2πππ
2
d) Make n=5 and repeat steps (a)-(c)
e) Make n=15 and repeat steps (a)-(c)
Notice: For question 2, you need to submit:
The above tables
The histogram for n={1,5,15} and the overlapping normal probability distribution plots.
Make sure that the graphs are properly labeled.
An example of creating the PDF graph for n=2 is shown below. The code below provides a
suggestion on how to generate a bar graph for a continuous random variable X, which
represents the total bookwidth for n=2, a=1, b=3.
Note that the value of βbarwidthβ is adjusted as the number of bins changes, to
provide a clear and understandable bar graph.
Also note that the βdensity=Trueβ parameter is needed to ensure that the total area of the
bargraph is equal to 1.0.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
# Generate the values of the RV X
N=100000; nbooks=2; a=1; b=3;
mu_x=(a+b)/2 ; sig_x=np.sqrt((b-a)**2/12)
X=np.zeros((N,1))
for k in range(0,N):
x=np.random.uniform(a,b,nbooks)
w=np.sum(x)
X[k]=w
# Create bins and histogram
nbins=30; # Number of bins
edgecolor=’w’; # Color separating bars in the bargraph
#
bins=[float(x) for x in np.linspace(nbooks*a, nbooks*b,nbins+1)]
h1, bin_edges = np.histogram(X,bins,density=True)
# Define points on the horizontal axis
be1=bin_edges[0:np.size(bin_edges)-1]
be2=bin_edges[1:np.size(bin_edges)]
b1=(be1+be2)/2
barwidth=b1[1]-b1[0] # Width of bars in the bargraph
plt.close(‘all’)
# PLOT THE BAR GRAPH
fig1=plt.figure(1)
plt.bar(b1,h1, width=barwidth, edgecolor=edgecolor)
#PLOT THE GAUSSIAN FUNCTION
def gaussian(mu,sig,z):
f=np.exp(-(z-mu)**2/(2*sig**2))/(sig*np.sqrt(2*np.pi))
return f
f=gaussian(mu_x*nbooks,sig_x*np.sqrt(nbooks),b1)
plt.plot(b1,f,’r’)
plt.show()
Question 3) Distribution of the sum of exponential random variables
This problem involves a battery-operated critical medical monitor. The lifetime (T) of the battery is a
random variable with an exponentially distributed lifetime. A battery lasts an average of π½ = 45 πππ¦π .
Under these conditions, the PDF of the battery lifetime is given by:
The mean and variance of the random variable T are:
ππ = π½ ππ = π½
When a battery fails it is replaced immediately by a new one. Batteries are purchased in a carton of 24.
The objective is to simulate the RV representing the lifetime of a carton of 24 batteries, and create a
histogram. To do this, follow the steps below.
a) Create a vector of 24 elements that represents a carton. Each one of the 24 elements
in the vector is an exponentially distributed random variable (T) as shown above,
with mean lifetime equal to Ξ². Use the same procedure as in the previous problem to
generate the exponentially distributed random variable T.
Use the Python function βnumpy.random.exponential(beta,n)β to generate n values
of the random variable T with exponential probability distribution. Its mean and
variance are given by:
b) The sum of the elements of this vector is a random variable (C), representing the life
of the carton, i.e.
πΆ = π1 + π2 + β― + π24
where ππ
, j=1,2,β¦,24 each is an exponentially distributed random variable. Create the
random variable C, i.e simulate one carton of batteries. This is considered one experiment.
c) Repeat this experiment for a total of N=10,000 times, i.e. for N cartons. Use the
values from the N=10,000 experiments to create the experimental PDF of the
lifetime of a carton, f(c).
d) According to the Central Limit Theorem the PDF for one carton of 24 batteries can
be approximated by a normal distribution with mean and standard deviation given
by:
Plot the graph of normal distribution with mean ππΆ and standard deviation ππΆ over plot of
the experimental PDF on the same figure, and compare the results.
e) Create and plot the CDF of the lifetime of a carton, F(c) . To do this use the Python
“numpy.cumsum” function on the values you calculated for the experimental PDF.
Since the CDF is the integral of the PDF, you must multiply the PDF values by the
barwidth to calculate the areas, i.e. the integral of the PDF.
If your code is correct the CDF should be a nondecreasing graph, starting at 0.0 and
ending at 1.0.
Answer the following questions:
1. Find the probability that the carton will last longer than three years, i.e. π(π > 3 β 365) =
1 β π(π β€ 3 β 365) = 1 β πΉ(1095). Use the graph of the CDF F(t) to estimate this
probability.
2. Find the probability that the carton will last between 2.0 and 2.5 years (i.e between 730 and
912 days): π(730 < π < 912) = πΉ(912) β πΉ(730) .Use the graph of the CDF F(t) to
estimate this probability.