EE381 Project 5 Confidence Intervals solution

$30.00

Original Work ?
Category: You will Instantly receive a download link for .ZIP solution file upon Payment

Description

5/5 - (1 vote)

I. Introduction and Background Material

Sample size and confidence intervals
Assume that you are measuring a statistic in a large population of size N. The statistic has mean
πœ‡ and standard deviation 𝜎.

Drawing a sample of size n from the population, produces a
distribution for the sample mean (𝑋̅) with:
In this project we will explore the relation of 𝑋̅ to the population mean πœ‡.

β€’ As a first example consider a barrel of a million ball bearings (i.e. population size
N=1,000,000 ) where someone has actually weighed all one million of them and found
the exact mean to be πœ‡ = 100 grams and the exact standard deviation to be 𝜎 = 12
grams. This is obviously an unrealistic assumption, but assume for the time being that
these parameters have been measured exactly.

β€’ Now pick a sample of size n (for example n=5 ) of bearings from the barrel, weigh them
and find the mean of the sample, 𝑋̅ =
𝑋1+𝑋2+𝑋3+𝑋4+𝑋5
5
β€’ Next take a larger sample (for example n=10 ) and find the new mean
β€’ Continue this process for larger and larger n, until n=100.

β€’ Plot the points (𝑛, 𝑋̅
𝑛) using a point marker (for example a blue β€˜x’) as shown in Figure
1.
β€’ Next for each value of n, calculate the standard deviation of the sample from πœŽπ‘‹Μ…π‘› =
𝜎
βˆšπ‘›
and plot:
i. The values of πœ‡ Β± 1.96 𝜎
βˆšπ‘›
as a function of n, shown as the red curves in Figure

1a. These curves define the 95% confidence interval, which means that
approximately 95% of the sample means will fall within the two red curves in
Figure 1a. This can also be visually confirmed by looking at how many of the
sample means fall outside of the red curves (approx. 5%).
ii. The values of πœ‡ Β± 2.58 𝜎
βˆšπ‘›
as a function of n, shown as the red curves in Figure

1b. These curves define the 99% confidence interval, which means that
approximately 99% of the sample means will fall within the two green curves in
Figure 1b.
Figure 1: Sample mean as a function of the sample size
Using the sample mean to estimate the population mean

In reference to the previous section, it is obviously unrealistic to think that anyone actually
measured the exact mean and standard deviation of all one million ball bearings. More
realistically, you would not have any idea what the mean or standard deviation was, and you
would need to weigh random samples of different sizes (for example n=5, 35, or 100 bearings)
and then draw reasonable conclusions about the weight distribution of all one million bearings.

To simulate this problem, generate a barrel of a million ball bearings with weights normally
distributed, with a mean ΞΌ and a standard deviation Οƒ.

As an example, take a sample of n bearings from the population of N=1,000,000. Then calculate
the mean of the sample:
The standard deviation of the sample mean can be calculated by:

The question is: Can the value of 𝑋̅, which is calculated for a sample of size n , be used to
estimate the mean ΞΌ of the population of N=1,000,000 bearings?

The answer is given in terms of confidence intervals, typically the 95% and 99% confidence
intervals.
Large samples (𝒏 β‰₯ πŸ‘πŸŽ)
Consider the standardized variable 𝑧 =
π‘‹Μ…βˆ’ΞΌ
πœŽπ‘›
. Based on the Central Limit Theorem, it is known
that for large samples the standardized variable will approach the normal distribution.

The 95% confidence interval for large samples. The 95% confidence interval is determined by
the critical values [βˆ’π‘§π‘
, 𝑧𝑐
] such that
From the tables of the normal distribution it is seen that these critical values correspond to 𝑧𝑐 =
1.96 and βˆ’π‘§π‘ = βˆ’1.96.

Hence:
which is written as
If you define πœ‡π‘™π‘œπ‘€π‘’π‘Ÿ = 𝑋̅ βˆ’ 1.96 𝑆̂
βˆšπ‘›
and πœ‡π‘’π‘π‘π‘’π‘Ÿ = 𝑋̅ + 1.96 𝑆̂
βˆšπ‘›
, then the above equation is written
as:
This equation can be interpreted as follows: Based on a sample of size , we are 95% confident
that the mean ΞΌ of the population lies in the interval [πœ‡π‘™π‘œπ‘€π‘’π‘Ÿ, πœ‡π‘’π‘π‘π‘’π‘Ÿ].

Another way to interpret this statement is:
i. Obtain a large number of different samples, of size n each
ii. For each of these samples calculate 𝑋̅ and πœŽπ‘› =
𝑆̂
βˆšπ‘›
iii. For each of these samples generate the corresponding intervals [πœ‡π‘™π‘œπ‘€π‘’π‘Ÿ, πœ‡π‘’π‘π‘π‘’π‘Ÿ].

iv. Then the mean ΞΌ of the population will be contained in 95% of these intervals.
For example if you obtain 500 samples of size each, and you create the corresponding 500
intervals [πœ‡π‘™π‘œπ‘€π‘’π‘Ÿ, πœ‡π‘’π‘π‘π‘’π‘Ÿ], then 475 of these intervals (95% of 500) will contain the population
mean ΞΌ.

The 99% confidence interval for large samples. Similarly, the 99% confidence interval is
determined by the critical values [βˆ’π‘§π‘
, 𝑧𝑐
] such that
From the tables of the normal distribution it is seen that these critical values correspond to 𝑧𝑐 =
2.58 and βˆ’π‘§π‘ = βˆ’2.58.

Hence, if you define: πœ‡π‘™π‘œπ‘€π‘’π‘Ÿ = 𝑋̅ βˆ’ 2.58 𝑆̂
βˆšπ‘›
and πœ‡π‘’π‘π‘π‘’π‘Ÿ = 𝑋̅ + 2.58 𝑆̂
βˆšπ‘›
, then you can write:
Which can be interpreted as follows: Based on a sample of size , we are 99% confident that the
mean ΞΌ of the population lies in the interval [πœ‡π‘™π‘œπ‘€π‘’π‘Ÿ, πœ‡π‘’π‘π‘π‘’π‘Ÿ].

Small sample (n<30)
When the sample size is small, the standardized variable π‘‹Μ…βˆ’ΞΌ
πœŽπ‘›
does not approach the normal
distribution, since the Central Limit Theorem does not hold for small n.
In this case the standardized variable 𝑇 =
π‘‹Μ…βˆ’ΞΌ
πœŽπ‘›
follows the Student’s t distribution with v=n-1
degrees of freedom.

The 95% confidence interval for small samples. The 95% confidence interval is determined by
the critical values [βˆ’π‘‘π‘
,𝑑𝑐
] such that
Note that the critical values [βˆ’π‘‘π‘
,𝑑𝑐
] depend on two values: (i) the probability value (0.95) and

(ii) the degrees of freedom (v)
Example: sample size n=5. In that case v=n-1=4 degrees of freedom. For the 95% confidence
interval, the critical value is: 𝑑𝑐 = 𝑑0.975 = 2.78, as seen from the Student’s t distribution tables.

Hence, if you define πœ‡π‘™π‘œπ‘€π‘’π‘Ÿ = 𝑋̅ βˆ’ 2.78 𝑆̂
βˆšπ‘›
and πœ‡π‘’π‘π‘π‘’π‘Ÿ = 𝑋̅ + 2.78 𝑆̂
βˆšπ‘›
, then the above equation is
written as:
which can be interpreted as follows: Based on a sample of size , we are 95% confident that the
mean ΞΌ of the population lies in the interval [πœ‡π‘™π‘œπ‘€π‘’π‘Ÿ, πœ‡π‘’π‘π‘π‘’π‘Ÿ].

The 99% confidence interval for large samples. Similarly, the 99% confidence interval is
determined by the critical values [βˆ’π‘‘π‘
,𝑑𝑐
] such that
Example: sample size n=5. In that case v=n-1=4 degrees of freedom. For the 99% confidence
interval, the critical value is: 𝑑𝑐 = 𝑑0.995 = 4.60, as seen from the Student’s t distribution tables.

Hence, if you define πœ‡π‘™π‘œπ‘€π‘’π‘Ÿ = 𝑋̅ βˆ’ 4.60 𝑆̂
βˆšπ‘›
and πœ‡π‘’π‘π‘π‘’π‘Ÿ = 𝑋̅ + 4.60 𝑆̂
βˆšπ‘›
, then the above equation is
written as:
which can be interpreted as follows: Based on a sample of size , we are 99% confident that the
mean ΞΌ of the population lies in the interval [πœ‡π‘™π‘œπ‘€π‘’π‘Ÿ, πœ‡π‘’π‘π‘π‘’π‘Ÿ].

II. Questions for Lab 5:
1) Create two plots as in Figure 1a and Figure 1b, showing the effect on sample size on the
confidence intervals.
β€’ Use the following parameters:
β€’ Total number of bearings: N=1,000,000
β€’ Population mean: 100 grams
β€’ Population standard deviation: 12 grams
β€’ Sample sizes: n=1, 2, 3, 4,…,200

2) Part A
Perform the following simulation experiment. Use Table 1 below to tabulate the results.
Table 1: Success rate (percentage) for different sample sizes
i. Step 1: Choose a random sample of n=5 bearings from the N bearings you created
in the previous problem. Calculate the sample mean and the sample standard
deviation:

ii. Step 2: Create the 95% confidence interval using the normal distribution to fill in
the first two entries in the top row. You realize, however, that this is not an
appropriate distribution to use because you have a small sample n=5<30.

iii. Step 3: Check if the confidence interval includes the actual mean ΞΌ of the
population of N bearings. If it does, then Step 2 is considered a success.

iv. Step 4: The appropriate distribution for small samples (n<30) is the t-distribution.
Create the 95% confidence interval using the t-distribution with v=n-1=4
At the 95% confidence level with v=4 degrees of freedom, the value of 𝑑0.975 can
be found from the tables, and it is seen to be 𝑑0.975 = 2.78.

This is the value that
will be used to determine the 95% confidence interval:
For a different sample size, the values of will be different than the ones above.

You should find these values from the t-distribution tables, and you should
modify the confidence intervals accordingly.

v. Step 5: Check if the confidence interval includes the actual means ΞΌ of the
population. If it does, then Step 4 is considered a success.
vi. Step 6: Repeat the experiment for M=10,000 times and count the number of
successes.

vii. Step 7: Enter the percentage of successful outcomes in Table 1.
viii. Step 8: Repeat steps 1-7 above with n=5 and 99% confidence interval.
ix. Step 9: After completing all of the above steps you have to fill out the first row of
the table.

Part B:
Repeat part (A) with n=40 using the normal distribution and the t-distribution to complete the
second row of the table.

Part C:
Repeat part (A) with n=120 using the normal distribution and the t-distribution to complete the
third row of the table. You realize, however, that for a large sample (n>30) the t-distribution will
be very close to normal, so the differences between Student’s -t and Normal will be minimal.