Statistics/Interval Estimation

Introduction edit

Previously, we have discussed point estimation, which gives us an estimator for the value of an unknown parameter . Now, suppose we want to know the size of error of the point estimator , i.e. the difference between and the unknown parameter . Of course, we can make use of the value of the mean squared error of , , or other things.

However, what if we only know about one specific point estimates? We cannot calculate the mean squared error of its corresponding point estimator with just this point estimates, right? So, how do we know the possible size of error of this point estimates? Indeed, it is impossible to tell, since we are only given a particular estimated value of parameter , but of course we do know the value of the unknown parameter , thus the difference between this point estimate and is also unknown.

To illustrate this, consider the following example: suppose we take a random sample of 10 students from one particular course in university to estimate the mean score of the students in the final exam in that course, denoted by , (assume the score is normally distributed), and the observed value of the sample mean is . Then, what is the difference between this point estimate and the true unknown parameter ? Can we be "confident" that this sample mean is close to , say ?

It is possible that is, say 90, and somehow the students in the sample are the one with very poor performance. On the other hand, it is also possible that is, say 30, and somehow the students in the sample are the one who perform well (relatively). Of course, it is also possible that the is quite close to 60, say 59. From this example, we can see that a particular value does not tell us the possible size of error: the error can be very large, and also can be very small.

In this chapter, we will introduce interval estimation where we use interval estimator that can describe the size of error through providing the probability for the random interval (i.e. interval with at least one of its bounds to be a random variable) given by the interval estimator to contain the unknown parameter , which measures the "accuracy" of the interval estimator of , and hence the size of error.

As suggested by the name interval estimator, the estimator involves some sort of intervals. Also, as one may expect, interval estimation is also based on statistics:

Definition. (Interval estimation) Interval estimation is a process of using the value of a statistic to estimate an interval of plausible values of an unknown parameter.

Of course, we would like the probability for the unknown parameter to lie in the interval to be close to 1, so that the interval estimator is very accurate. However, a very accurate interval estimator may have a very bad "precision", i.e. the interval covers "too many" plausible values of an unknown parameter, and therefore even if we know that is very likely to be one of such values, there are too many different possibilities. Hence, such interval estimator is not very "useful". To illustrate this, suppose the interval concerned is , which is the parameter space of . Then, of course (so the "confidence" is high) since must lie in its parameter space. However, such interval has basically "zero precision", and is quite "useless", since the "plausible values" of in the intervals are essentially all possible values of .

From this, we can observe the need of the "precision" of the interval, that is, we also want the width of the interval to be small, so that we can have some ideas about the "location" of . However, as the interval becomes smaller, it is more likely that such interval misses , i.e. does not cover the actual value of , and therefore the probability for to lie in that interval becomes smaller, i.e. the interval becomes less "accurate". To illustrate this, let us consider the extreme case: the interval is so small that it becomes an interval containing a single point (the two end-points of the interval coincide). Then, the "interval estimator" basically becomes a "point estimator" in some sense, and we know that it is very unlikely that the true value of equals the value of the point estimator ( lies in that "interval" is equivalent to in this case). Indeed, if the distribution of is continuous, then .

As we can see from above, although we want the interval to have a very high "confidence" and also "very precise" (i.e. the interval is very narrow), we cannot have both of them, since an increase in confidence causes a decrease in "precision", and an increase in "precision" causes a decrease in confidence. Therefore, we need to make some compromises between them, and pick an interval that gives a sufficiently high confidence, and also is quite precise. In other words, we would like to have a narrow interval that will cover with a large probability.

Terminologies edit

Now, let us formally define some terminologies related to interval estimation.

Definition. (Interval estimator) Let be a random sample. An interval estimator of an unknown parameter is a random interval where and are two statistics such that always.

Remark.

  • We call the interval as random interval since both endpoints and are random variables.
  • The interval involved may also be an open interval (), a half-open and half-closed interval ( or ), or an one-sided interval ( or ) (we may take and (in extended real number sense).
  • When we observe that , we call the interval estimate of , denoted by ( and are no longer random).

Definition. (Coverage probability) The coverage probability of an interval estimator is .

Example. Let be a random sample from the normal distribution . Consider an interval estimator of : .

(a) Calculate the probability .

(b) Calculate the coverage probability .

Solution:

(a) Since the distribution of is continuous, .

(b) The coverage probability

Clipboard

Exercise.

(a) Guess that whether the coverage probability is greater than .

(b) Calculate to see whether your guess in (a) is correct or not.

(c) (construction of interval estimator) Find such that (Hint: where ).

(d) Suppose it is observed that . Find the interval estimate of the given interval estimator .

(e) Suppose the actual parameter is 1.2. Does lie in the interval estimate in (d)?


Solution

(a) Intuitively, one should guess that this is true.

(b)

(c) Such is .

Proof.


(d) Under this observation, . Hence, the interval estimate is .

(e) Since , lies in the interval estimate .


Definition. (Confidence coefficient) For an interval estimator of , the confidence coefficient of , denoted by , is the infimum of the (set of) coverage probabilities (over all in the parameter space ), .

Remark.

  • Infimum means the greatest lower bound (it is the same as minimum under some conditions). Thus is the greatest lower bound of the coverage probabilities over all . Intuitively, this means the confidence coefficient is chosen conservatively: when there is some making the coverage probability low, it will decrease the confidence coefficient.
  • In simple cases, the value of coverage probability does not depend on the choice of (i.e. is a constant function of ) [1]. Hence, the confidence coefficient . Unless otherwise specified, you can assume this is true in the following.
  • The reason for choosing the notation to be "" is related to hypothesis testing, where "" has some special meanings.
  • As we shall see in the next chapter, there is a close relationship between confidence intervals and hypothesis testing, in the sense that one of them can be constructed by using another one.
  • Interval estimator with a measure of "confidence" is called confidence interval. In this case, the confidence coefficient is a measure of confidence. Hence, the interval estimator with the confidence coefficient is a confidence interval, or more specifically confidence interval (usually is expressed as a percentage).

Example. (Interpretation of confidence coefficient) Consider an interval estimator of a unknown parameter : . Suppose its confidence coefficient is .

  • Student A's claim: since the confidence coefficient is , the coverage probability . It follows that the probability for to lie in interval estimate in an experiment is also .
  • Student B's claim: from an interval estimate coming from an experiment, we know that it either contains or does not contain . In the former case, the coverage probability is 1, and in the latter case, the coverage probability is 0. Hence, student A's claim is wrong.
  • Student C's claim: when we perform a large number of experiments, we will expect the interval estimate in of them contains , and the interval estimate in another of them does not contain .

Comment on each claim.

Solution:

Student B's claim is correct, since in a single experiment, the interval estimate is already decided (and thus fixed). Also, the unknown parameter is fixed (the population distribution is given). This means that whether lies or does not lies in the fixed interval estimate is not a random event. Instead, it is already decided based on the fixed and .

For student A's claim, it is wrong since the student B's claim is correct. It may be more natural to understand this why it is wrong if we rephrase the claim a little bit: "the probability for fixed to lie in fixed interval estimate is ." This is incorrect since the event involved is not even random! To see this more clearly, we can consider what happen if we "hypothetically" repeat this particular experiment with fixed and fixed interval estimate many times. We can see that the "outcome" in every experiment is the same, that is either lies in , or does not lie in in all experiments. Then, it follows by the definition of frequentist probability that the probability is either 1 (former case) or 0 (latter case).

We may modify student A's claim to make it correct: the probability for to lie in an interval estimator is . This can be interpreted as: the probability for to lie in an interval estimate calculated from a future and not yet realized sample (NOT a realized sample, which is a past sample) is .

Student C's claim is also correct, since we can interpret the probability from frequentist point of view, i.e. consider the probability as the "long-run" proportion for the interval estimates (for each trial, an interval estimate is observed from the interval estimator ) that contains the true parameter .

Remark.

  • We may say that we "feel confident" that lies in an interval estimate , corresponding to a confidence interval, from an experiment.
  • To understand this, we may refer to the student C's claim above. When we think about how "confident" are we about the statement that lies in , we may consider this:
  • we "hypothetically" repeat the generation of interval estimates many times, and we will expect that of them contain .
  • Then, it is natural to "feel" confident that the interval estimate contains based on these hypothetical experiments.
  • Alternatively, as suggested above, it is correct to say that the probability for to lie in an interval estimate calculated from a future and not yet realized sample is .
  • Hence, the probability measures the "reliability" of estimation procedure and method (the higher the probability, the higher the reliability).
  • Therefore, it is natural to feel confident that the interval estimate contains based on the above reliability.
  • We may regard "we feel confident that lies in the interval estimate " to be an intuitive and alternative expression of "the interval estimate is a confidence interval".

Example. Continue from the previous example about normal distribution . The confidence coefficient of interval estimator of , , is 0.9545, or approximately 95%. Hence, such interval may be called 95% confidence interval.

Clipboard

Exercise. Consider a continuous distribution with an unknown real-valued parameter , and a random sample drawn from it. Suppose and where and are statistics of such that always (Can ? [2]) ( is the parameter space of ).

1 Which of the following is/are a 90% confidence interval?

None of the above.

2 Which of the following is/are a 95% confidence interval?

None of the above.

3 Which of the following is/are a 97.5% confidence interval?

None of the above.


4. Can you suggest a (i) 0% confidence interval; (ii) 100% confidence interval?

Solution

(i) Since the distribution is continuous, one may take , for example, as the 0% confidence interval since .

(ii) One may take (i.e. ), which is the parameter space of as the 100% confidence interval. This is because . (In general, a 100% confidence interval for an unknown parameter is the parameter space of that unknown parameter.)


Construction of confidence intervals edit

After understanding what confidence interval is, we would like to know how to construct one naturally. A main way for such construction is using the pivotal quantity, which is defined below.

Definition. (Pivotal quantity) A random variable is a pivotal quantity (of ) (which is function of the random sample and the unknown parameter (vector) ) if the distribution of is independent from the parameter (vector) , that is, the distribution is the same for each value of .

Remark.

  • A pivotal quantity may not be a statistic, since statistic is only a function of random sample (but not the unknown parameter(s)), while pivotal quantity is a function of the random sample and the unknown parameter (vector) .
  • If the expression of a pivotal quantity does not involve , such pivotal quantity is a statistic, and is called ancillary statistic.
  • Here, we focus on the pivotal quantities with expressions involving , so that we can use them to construct confidence intervals.

After having such pivotal quantity , we can construct a confidence interval for by the following steps:

  1. For that value of , find such that [3] ( does not involve since is a pivotal quantity).
  2. After that, we can transform to since the expression of involves , as we have assumed (the resulting inequalities should be equivalent to the original inequalities, that is, , so that ).

Example. Consider a random sample from normal distribution with unknown mean and known variance . Find a pivotal quantity (of ).

Solution: By the property of normal distribution, . Since is independent of the unknown parameter , is a pivotal quantity.

Alternatively, is also a pivotal quantity, since is independent of (both and are known, so the variance of this distribution is known).

Clipboard

Exercise.

(a) Is a pivotal quantity?

(b) Is a pivotal quantity?


Solution

(a) No, since , and this distribution depends on .

(b) Yes, since , and this distribution is independent of .


Clipboard

Exercise. Consider a random sample from normal distribution with unknown mean and variance . Apart from , suggest a pivotal quantity of .


Solution

A pivotal quantity is , since , and the distribution is independent from both and .

Example. Consider a random sample from exponential distribution . Find a pivotal quantity. (Hint: and if , then .)

Solution: A pivotal quantity is , since , where the distribution is independent from .

Example. (A pivotal quantity for general distributions) Consider a distribution with unknown parameter (vector) , where its cdf is bijective (so that exists).

(a) Prove that .

(b) Suppose a random sample is taken from that distribution. Suggest a pivotal quantity.

Solution:

(a)

Proof. Let , and be the cdf of . Then, . Differentiating the cdf gives . This means that the pdf of is 1. Also, we know that the support of is since is essentially a probability. Hence, we have .


(b) From (a), we know that (the cdf involves the parameter (vector) ), and this distribution is clearly independent from the parameter (vector) . Hence, a pivotal quantity is (or , which is the same since is taken from the distribution with cdf ).

Clipboard

Exercise. Suppose a single observation is taken from the exponential distribution . Find a pivotal quantity using the above method.


Solution

Since the cdf of is , as suggested by above, a pivotal quantity is , which follows the uniform distribution .



Confidence intervals for means of normal distributions edit

In the following, we will use the concept of pivotal quantity to construct confidence intervals for means and variances of normal distributions. After that, because of the central limit theorem, we can construct approximated confidence intervals for means and variances of other types of distributions that are not normal.

Mean of a normal distribution edit

Before discussing this confidence interval, let us first introduce a notation:

  • is the upper percentile of at level , i.e. it satisfies where .

We can find (or calculate) the values of for different from standard normal table.

Theorem. (Confidence interval of when is known) Let be a random sample from . When is known, a confidence interval for is

Remark.

  • By the definition of interval estimate, the corresponding interval estimate of is , with observed value . For simplicity, we usually also call such interval estimate as confidence interval.
  • We can know the meaning of confidence interval by referring to the context.
  • Usually, when the realization of random sample is given, then confidence interval is referring to the interval estimate (since the interval estimate is more "useful" and "suggestive" in this context).
  • Unless otherwise specified, the confidence intervals referred are constructed according to this theorem (if applicable).

Proof. Let . Since is a pivotal quantity (its distribution is independent from ), we set

where is a constant (and does not involve ). Then, we have
The result follows.

The following graph illustrates :

                    |
                  *-|-*
                 /##|##\   
                /###|###\  <----- area 1-a
               /####|####\
              /#####|#####\
             /######|######\
            /|######|######|\
 area    --*.|######|######|.*-- 
 a/2 --> ....|######|######|....  <---  area a/2
        ------------*---------------
           -z_{a/2}       z_{a/2}

Example. Consider a random sample from . Suppose it is observed that .

Construct a 95% confidence interval for .

Solution: Since , and (from standard normal table, we know that where ), it follows that a 95% confidence interval for is .

Clipboard

Exercise.

(a) Construct a 99% confidence interval for .

(b) Construct a 90% confidence interval for .

(c) (alternative way of constructing confidence interval) Using a similar argument as in the proof of the previous theorem, another confidence interval for is since . Construct another 95% confidence interval for by this method.

(d) Is the width of the confidence interval (i.e. its upper bound minus its lower bound) constructed in (c) the same as that constructed in the example?

Solution

(a) Since (from standard normal table), a 99% confidence interval for is .

(b) Since (from standard normal table), a 90% confidence interval for is .

(c) Since and from standard normal table, another 95% confidence interval for is

(d) The width of the confidence interval in the example is 1.753 (approximately), while the width of the confidence interval in (c) is 1.825 (approximately). Hence, their widths are different.

Remark.

  • As we can see, when the confidence coefficient is higher, the corresponding confidence interval becomes wider.
  • This matches with our previous discussion.



Example. An undergraduate student John wants to estimate the average daily time spent on playing computer games of all teenagers aged 14-16 in the previous week. Clearly, it is infeasible to ask all such teenagers about their time spent. Therefore, John decides to take a random sample of 10 teenagers from the population (all teenagers aged 14-16), and their time spent (in hours) are

3,8,10,5,9,9,1,3,0,4

The distribution of the daily time spent is assumed to be normal, with mean and variance [4]. Also, based on the past data about the daily time spent, John assumes that the standard deviation of the distribution is .

(a) Construct a 95% confidence interval for .

(b) According to John, the computer game addiction problem is serious among teenagers aged 14-16 if the average daily time spent on playing computer games is at least a quarter of a day, i.e. 6 hours, and is not serious otherwise. Can John be (95%) confident that the computer game addiction problem is (i) serious; (ii) not serious among teenagers aged 14-16, based on the 95% confidence interval in (a)?

(c) To be more certain about the time spent, John would like to construct a 99% confidence interval for , with width not exceeding 1 hour. At least how many teenagers should be in the random sample to satisfy this requirement?

(d) Suppose John take another random sample from the population where the number of teenagers involved is the number suggested in (c). If in this random sample, construct a 99% confidence interval for , and verify that its width does not exceed 1 hour.

(e) Can John be (99%) confident that the computer game addiction problem is not serious among teenagers aged 14-16 based on the 99% confidence interval in (d)?

Solution:

(a) Since the realization of the sample mean is , and , the 95% confidence interval for is .

(b) (i) No, since the confidence interval contains some values that are strictly less than 6 and some that are at least 6. Thus, although John is 95% confident that lies in , it is uncertain that whether the time spent will be at least 6 when lies in .

(b) (ii) No, and the reason is similar to that in (i) (it is uncertain that whether the time spent will be lower than 6 when lies in ).

(c) Since a 99% confidence interval for is , its width is (which is independent from ). Also, we know that . Thus, to satisfy the requirement, we need to have

Since the sample size must be an integer, it follows that the minimum value of is 238. That is, at least 238 teenagers should be in the random sample to satisfy the requirement.

(d) A 99% confidence interval for is . Its width is approximately 0.999535, which is less than 1.

(e) Yes, since all values in the interval in (d) are strictly less than 6.

Clipboard

Exercise. Suppose John decides to take another random sample consisting of even more teenagers, 500 of them. If in this random sample,

(a) Construct a 99% confidence interval for .

(b) Can John be (99%) confident that the computer game addiction problem is not serious among teenagers aged 14-16 based on the 99% confidence interval in (a)?


Solution

(a) A 99% confidence interval for is .

(b) No, since some values in the interval are at least 6.


We have previously discussed a way to construct confidence interval for the mean when the variance is known. However, this is not always the case in practice. We may not know the variance, right? Then, we cannot use the in the confidence interval from the previous theorem.

Intuitively, one may think that we can use the sample variance to "replace" the , according to the weak law of large number. Then, we can simply replace the unknown in the confidence interval by the known (or its realization for interval estimate). However, the flaw in this argument is that the sample size may not be large enough to apply the weak law of large number for approximation.

Remark.

  • A rule of thumb is that we may regard the sample size is large enough for applying this kind of convergence theorem (e.g. weak law of large number and central limit theorem) for approximation, when the sample size is at least 30. Otherwise, the approximation is not accurate enough, i.e. the error can be quite large, and thus we should not use such theorem for approximation.

So, you may now ask that when the sample size is large enough, can we do such "replacement" for approximation. The answer is yes, and we will discuss in the last section about approximated confidence intervals.

Before that section, the confidence intervals discussed is exact in the sense that no approximation is used to construct them. Therefore, the confidence intervals constructed "work" for every sample size, no matter how large or how small it is (it works even if the sample size is 1, although such confidence interval constructed may not be very "nice", in the sense that the width of the interval may be quite large).

Before discussing how to construct an confidence interval for the mean when the variance is unknown, we first give some results that are useful for deriving such confidence interval.

Proposition. (Several properties about sample mean and variance) Let be a random sample from . Also let be the sample mean and be the sample variance, where is the sample size. Then,

(i) and are independent.

(ii) where is a chi-squared distribution with degrees of freedom.

(iii) where is a -distribution with degrees of freedom.

Proof.

(i) One may use Basu's theorem to prove this, but the details about Basu's theorem and the proof are omitted here, since they are a bit complicated.

(ii) We will use the following definition of chi-squared distribution  : where are independent. Also, we will use the fact that the mgf of is .

Now, first let which follows since are independent. Then, we write as

Applying the definition of chi-squared distribution, we have .

By (i), and are independent. Thus, (a function of ) is independent from (a function of ). Now, let and . Since and are independent, and also we have from above derivation, the mgf

Since and , we can further write
which implies that the mgf of is , which is exactly the mgf of . Hence, .

(iii) We will use the following definition of -distribution : where , , and and are independent.

After using this definition, it is easy to prove (iii) with (ii), as follows:

By (ii), . Also, we know that and are independent since and are independent by (i). Then, it follows by the above definition that .

Using this proposition, we can prove the following theorem. Again, before discussing this confidence interval, let us introduce a notation:

  • is the upper percentile of at level , i.e. it satisfies where .

Theorem. (Confidence interval of when is unknown) Let be a random sample . When is unknown, a confidence interval for is

Remark.

  • The corresponding interval estimate is , with observed value and (sample standard deviation is nonnegative. Thus, this is equivalent to ).
  • We can find values of for some values of and from "-table"
  • In this "-table", the first column indicates the value of , and the first row (one-sided) indicates (it is "one-sided" since in our definition of , "" is involved, which is "one-sided". For instance, if we want to get , we can look at in the first row (one-sided).
  • Alternatively, we can look at the second row (two-sided) which indicates the confidence coefficient of the confidence interval (), corresponding to . For instance, if we want to get , we can look at in the second row (two-sided).
  • When , the -distribution tends to the standard normal distribution . Hence, when is large, . Thus, if one cannot find the value of from -table since is so large that it does not appear at the table, then one can simply get from the standard normal table for an approximation.

Proof. By (iii) in the previous proposition, we have . Since is independent from , is a pivotal quantity of . Hence, we set

where is a constant (-distribution is symmetric (about ), so we have ). It follows that
The result follows.

Example. A government officer of country A would like to know the daily average time spent on exercises of all citizens in country A. Suppose the variance of the time spent is unknown, and a random sample of 10 citizens are taken from the population. The following is the time spent on exercises in a particular day for the citizens in that sample (in minutes):

10, 0, 60, 20, 30, 30, 120, 40, 30, 10.

Assuming the time spent follows normal distribution, construct a 95% confidence interval for the daily average time spent on exercises of all citizens in country A, denoted by .

Solution: First, we have , and .

Also, from "97.5% (one-sided) and 9" (or "95% (two-sided) and 9") in -table.

Thus, a 99% confidence interval for is

.

Clipboard

Exercise. The government officer also want to know the mean monthly wage of all citizens in country A, . Suppose the standard deviation of the monthly wage is 2000 (all wages in this example are in USD). From a salary survey which asks for 15 citizens for their monthly wages, the following monthly wages (in USD) are obtained:

1500, 3000, 1200, 4000, 3500, 10000, 5000, 1000, 6000, 3000, 2000, 2000, 1500, 3000, 8000.

(a) Construct a 90% confidence interval for the mean monthly wage , assuming the underlying distribution for the wage is normal.

(b) For the salary survey, it is found that a respondent gives a wrong monthly wage: he enters one more "0" accidentally, and thus answers 10000 instead of 1000. Thus, after the correction, the corrected sample data of the monthly wages is:

1500, 3000, 1200, 4000, 3500, 1000, 5000, 1000, 6000, 3000, 2000, 2000, 1500, 3000, 8000.

Update the confidence interval in (a) to a correct one, based on this correct data.

Solution

(a) First, we can get , and . Also, (from "95% (one-sided) (or 90% (two-sided)) and 14" in -table).

Hence, a 90% confidence interval for is

(b) First, we update and : and . Then, a new 90% confidence interval for is


Example.

A farmer Tom owns an apple orchard. He just harvests a large amount of apples (1000 apples) from his orchard. To access the "quality" of this batch of apples, he wants to know the mean weight of the apples in this batch, . However, since there are too many apples, it is cumbersome to weigh every apple in this batch. Hence, Tom decides to take a random sample of 5 apples, and use them to roughly estimate the mean weight of the apples. The following is the weight of the apples in that sample (in g):

100, 120, 200, 220, 80.

Assume the distribution of the weight is normal.

(a) Based on past experiences, Tom knows that the standard deviation of the weight of the apples is 30g. Construct a 95% confidence interval for .

(b) Tom finds out that in this batch, the apples grown are of new kind, that have not been grown before. Therefore, the standard deviation of the weight based on past experiences cannot be applied to estimation of the mean weight for this batch. Hence, the standard deviation of the weight is now unknown. Construct an updated 95% confidence interval for .

Solution:

(a) We have . Also, from standard normal table. Hence, a 95% confidence interval for is

(b) We have , and from -table. Hence, a 95% confidence interval for is

Clipboard

Exercise. Tom sells this batch of apple to a nearby shop, and it is known that the shop will pay Tom in USD for each apple, where is the mean weight of the batch of apples.

(a) Construct a 95% confidence interval for the total revenue of Tom from this transaction (in USD), , based on the above confidence interval in (b) of example.

(b) Suppose the cost for Tom to grow this batch of apples is USD 6000. Can Tom be 95% confident that he can earn a positive profit (i.e. the revenue exceeds the cost) from this transaction.


Solution

(a) Since , and a 95% confidence interval for is based on (b). From the construction of confidence interval, we have

Hence, the corresponding confidence interval for is (approximately)

(b) Yes, since Tom can be 95% confident that lies in , which exceeds the cost USD 6000.


Difference in means of two normal distributions edit

Sometimes, apart from estimating mean of a single normal distribution, we would like to estimate the difference in means of two normal distributions for making comparison. For example, apart from estimating the mean amount of time (lifetime) for a bulb until it burns out, we are often interested in estimating the difference between life of two different bulbs, so that we know which of the bulbs will last longer in average, and then we know that bulb has a higher "quality".

First, let us discuss the case where the two normal distributions are independent.

Now, the problem is that how should we construct a confidence interval for the difference in two means. It seems that we can just construct two confidence intervals for each of the two means respectively. Then, the confidence interval for is . However, this is indeed incorrect since when we have and , it does not mean that (there are no results in probability that justify this).

On the other hand, it seems that since and are independent (since the normal distributions we are considering are independent), then we have

Then, when and , we have
so
which means is a confidence interval.

However, this is actually also incorrect. The flaw is that "when and , we have " only means

(we do not have the reverse subset inclusion in general). This in turn means
So, is actually not a confidence interval (in general).

So, the above two "methods" to construct confidence intervals for difference in means of two independent normal distributions actually do not work. Indeed, we do not use the confidence interval for each of the two means, which is constructed previously, to construct a confidence interval for difference in the two means. Instead, we consider a pivotal quantity of the difference in the two means, which is a standard way for constructing confidence intervals.

Theorem. (Confidence interval of when and is known) Let and be a random sample from two independent distributions and (i.e. the random variables and are independent) respectively, where and are known. Then, a confidence interval for is

Remark.

  • The corresponding interval estimate is with observed values and .
Clipboard

Exercise. Show that (the meaning of the notations follows the above theorem).

Solution

Proof. First, we have and by property of normal distribution (, and are independent random samples). Then, applying the property of normal distribution again (the two distributions and are independent, and hence and are independent), we have

It follows by applying the property again that


Now, we will prove the above theorem based on the result shown in the previous exercise:

Proof. Let (from the previous exercise). Then, is a pivotal quantity of . Hence, we have

Example. A statistician wants to compare two kinds of light bulbs (brand A vs. brand B) by their lifetime (amount of time until the bulb burns out). He takes a random sample of 10 light bulbs from the light bulbs of each of the brands, and measure their lifetime. The following is the summary of the results:

Based on past studies, the statistician knows that the standard deviation of the lifetime for brand A light bulb and brand B light bulb is 600 hours and 150 hours respectively. Assume the distribution of the lifetime is normal.

(a) Construct a 95% confidence interval for the mean lifetime of brand A light bulb () and brand B light bulb () respectively.

(b) Construct a 95% confidence interval for .

(c) Can the statistician conclude with 95% confidence that brand B light bulb has a longer lifetime than brand A light bulb on average?

Solution.

(a) Since and the sample size for each of the random samples is 10, a 95% confidence interval for is

and a 95% confidence interval for is

(b) A 95% confidence interval for is

(c) Since all values in the 95% confidence interval in (b) are positive, it means the statistician can be 95% confident that mean lifetime of brand B light bulb is longer than brand A light bulb.

Remark.

  • Notice that some values in the 95% confidence interval for exceed all values in the 95% confidence interval for . However, we are still 95% confident that exceeds .
Clipboard

Exercise. Suppose there is a brand C light bulb, and the statistician also takes a random sample of 10 light bulbs from brand C light bulbs. It is observed that the sample mean of this random sample is 4210 hours, and the standard deviation of brand C light bulbs is a known to be hours. Assume the distribution of the lifetime is normal.

After constructing 95% confidence intervals using the above theorem, the statistician is 95% confident that the brand C light bulb has a longer or same lifetime than both brand A and B light bulbs on average. Show that the maximum value of is (approximately) 110.31.


Solution

Proof. Let be the mean lifetime of brand C light bulb.

A 95% confidence interval for is

and a 95% confidence interval for is
In order for the statistician to be 95% confident that the brand C light bulb has a longer or same lifetime than both brand A and B light bulbs, the lower bound of both of these confidence intervals should be at least 0, i.e.
Hence, the maximum value of is 110.31.



Now, we will consider the case where the variances are unknown. In this case, the construction of the confidence interval for the difference in means is more complicated, and even more complicated when . Thus, we will only discuss the case where is unknown. As you may expect, we will also use some results mentioned previously for constructing confidence interval for when is unknown in this case.

Theorem. (Confidence interval of when is unknown) Let and be a random sample from two independent distributions and ) respectively. Then, a confidence interval for is

where and are the sample variance of the random sample and respectively.

Remark.

  • The corresponding interval estimate is , with observed values .

Proof. Let (the reason for this to follow is shown in a previous exercise). From a previous result, we know that and . Then, we know that the mgf of is and the mgf of is . Since the distributions and are independent, the mgf of is

Hence, .

By the independence of sample mean and sample variance ( and are independent, and are independent), we can deduce that and are independent. Thus, by the definition of -distribution,

follows . Therefore, is a pivotal quantity of . Hence, we have
The result follows.

Example.

There are two lakes in a country, one located at north, called North Lake, and another located at south, called South Lake. Suppose the weight of the fishes in North Lake and South Lake follows and , where is unknown. A fisher Bob wants to compare the mean weight of the fishes in North Lake and South Lake so that he can choose the lake with a greater mean weight of fishes for fishing. For comparison, Bob went to North Lake and fished there in day 1. In day 2, he went to South Lake instead and fished there. The following are some descriptions about the fishes caught:

Can Bob be 90% confident that he should choose South Lake for fishing?

Solution. First, we have . Since the degree of freedom is so large that the corresponding value cannot be found in -table, we may use to approximate it. A 90% confidence interval for is

Since all values in the confidence interval exceed 0, Bob can be 90% confident that , i.e. the mean weight of fishes in South Lake is greater than that in North Lake, and hence he can be 90% confident that he should choose South Lake for fishing.


Now, what if the two normal distributions concerned are dependent? Clearly, we cannot use the above results anymore, and we need to develop a new method to construct a confidence interval for the difference of means in this case. In this case, we need to consider the notion of paired samples.

Proposition. Let and (the sample sizes must be the same) be an independent (this is referring to each of the random sample) random sample from two normal distributions (may be dependent) and , and for each . Then, are independent and where and ( and .

Remark.

  • are called paired samples in this case.

Proof.

1. Independence of :

Since are independent, and are independent, it follows that are independent, which is what we want to show.

2. :

  • To show that still follow normal distribution, we can consider the pdf of for each . The pdf can be obtained using the transformation of random variables formula: e.g., let and where and . Then, the pdf of obtained, which is the pdf of , should be in the form of normal distribution.
  • However, since the actual derivation process is somewhat complicated, it is omitted here.
  • Of course, the mean and variance of can be observed from the pdf of determined previously. Alternatively, before determining the pdf of , we can also know that the mean of is (we use the linearity of expectation here, which does not require independence assumption), and the variance of is ( and ).

Corollary. (Confidence interval of when is known) Let and be a random sample from two normal distributions and . Then, a confidence interval for is

where for each , , is the standard deviation of , and is known.

Remark.

  • The corresponding interval estimate is with observed value .

Proof. From the previous proposition, we know that is a random sample from . Since is known, it follows from a previous theorem that a confidence interval for is

Corollary. (Confidence interval of when is unknown) Let and be a random sample from two normal distributions and . Then, a confidence interval for is

where for each , , is the sample standard deviation of , and the variance of is unknown.

Remark.

  • The corresponding interval estimate is with observed values and .
Clipboard

Exercise. Prove the above corollary.

Solution

Proof. From the previous proposition, we know that is a random sample from . Since is unknown, it follows from a previous theorem that a confidence interval for is


Example. A fertilizer company wants to advertise the effect of its corn fertilizer. Thus, the company plants 5 corn seeds in each of two neighbouring places: places X and Y, where the corn seeds involved are identical. After that, the company uses fertilizer in place Y only (apart from this, all other conditions in place X and Y are the same). The weight of the corns harvested (in g) in the two places is summarized below:

Let and (the realizations of random variables and respectively) be the weight (in g) of the corn harvested in place X and place Y respectively, and (the realization of random variable ) be the improvement of the weight by using the fertilizer. Suppose is a random sample from and is a random sample from , and the variance of is unknown ().

Construct a 95% confidence interval for .

Solution.

First, the values of is summarized as follows:

Hence, the sample mean and the sample standard deviation of are observed to be and respectively. Since , it follows that a 95% confidence interval for is

Clipboard

Exercise. Suppose it is known that , and the correlation coefficient of and is . Construct a 95% confidence interval for based on the theorem for the case of known variance.


Solution

The variance of is . Hence, . Since , a 95% confidence interval for is



Confidence intervals for variances of normal distributions edit

Variance of a normal distribution edit

After discussing the confidence intervals for means of normal distributions, let us consider the confidence intervals for variances of normal distributions. Similarly, we need to consider a pivotal quantity of . Can you suggest a pivotal quantity of , based on a previous result discussed?

Recall that we have , and is independent from with some suitable assumptions. Thus, this result gives us a pivotal quantity of , namely . Before discussing the theorem for constructing a confidence interval for . Let us introduce a notation:

  • is the upper percentile of at level , i.e. it satisfies where .

Some values of can be found in the chi-squared table.

  • To find the value of , locate the row for degrees of freedom and the column for "probability content" .

Theorem. (Confidence interval of ) Let be a random sample from . Then, a confidence interval for is

Remark.

  • The corresponding interval estimate is with observed value .

Proof. Since , set

[5] Then, we have
The result follows.

Example.

A candy company recently offers a new type of chocolate, where each chocolate is supposed to weigh 10g. To have a quality control (QC) on the production process of a batch of the chocolates, the company takes a random sample of 20 chocolates from a factory for producing this type of chocolate. After measuring the weight of these 20 chocolates, it is found that the sample standard deviation of these 20 chocolates is 0.03g. To pass the QC, the standard deviation of the weight of the whole batch of chocolates, , should not exceed 0.5% of the weight each chocolate is supposed to weigh, with 99% confidence (based on the above construction of confidence interval). Assume the distribution of the weight is normal.

Can the QC be passed?

Solution. Since and , a 99% confidence interval for is

Considering the proof of the theorem for constructing this confidence interval, we know that a 99% confidence interval for can be obtained by taking positive square root for both lower and upper bounds of the above confidence interval. Thus, a 99% confidence interval for is
Since , and some values in this confidence interval for exceed 0.05, the QC cannot be passed.

Clipboard

Exercise.

(a) What is the maximum/minimum value of the sample standard deviation of the 20 chocolates to pass the QC?

(b) Suppose the requirement to pass the QC becomes less strict. Can the QC be passed if

(i) the "0.5%" is increased to "1%";

(ii) the "99% confidence" is decreased to "95% confidence"?

Solution

(a) To pass the QC, the upper bound of the 99% confidence interval should be at most 0.05. Hence,

(we have , so we consider the positive square root only). Thus, the maximum value of the sample standard deviation is 0.131g (approximately).

(b) (i) In this case, since , and all values in the above confidence interval do not exceed 0.1, the QC can be passed.

(ii) Since and , a 95% confidence interval for is

Hence, the corresponding 95% confidence interval for is
Since all values in this confidence interval do not exceed 0.05, the QC can be passed.


Remark.

  • Notice that the (sample) mean is not considered in the above calculations. Indeed, it does not play any role in the above construction of confidence interval, so it is not important in this context.

Ratio of variances of two independent normal distributions edit

Similar to the case for means, we would also sometimes like to compare the variances of two normal distributions. One may naturally expect that we should construct a confidence interval for difference in variances, similar to the case for means. However, there are simple ways to do this, since we do not have some results that help with this construction. Therefore, we need to consider an alternative way to compare the variances, without using the difference in variances. Can you suggest a way?

Recall the definition of efficiency in point estimation. Efficiency gives us a nice way to compare two variances without considering their difference, where the ratio of two variances is considered. Fortunately, we have some results that help us to construct a confidence interval for the ratio of two variances.

Recall that the definition of -distribution: if and are independent, then follows the -distribution with and degrees of freedom, denoted by . From the definition of -distribution, we can see that it involves a ratio of two independent chi-squared random variables. How can it be linked to the ratio of two variances?

Recall that we have with some suitable assumptions. This connects the variance with the chi-squared random variable, and thus we can use this property together with the definition of -distribution to construct a pivotal quantity, and hence a confidence interval.

Let us introduce a notation before discussing the construction of confidence interval:

  • is the upper percentile of at level , i.e. it satisfies .

Some values of can be found in -tables (there is different -tables for different values of , and the row and column of each table indicates the first and second degrees of freedom respectively). Also, using the property that , we can obtain some more values of which are not included in the -tables.

Theorem. (Confidence interval of ) Let and be a random sample from two independent normal distributions and respectively. Then, a confidence interval for is

where and are the sample variances of and respectively.

Remark.

  • The corresponding interval estimate is , with observed values and .

Proof. By the assumptions, we have

Thus, by the definition of -distribution, we have
which is a pivotal quantity of . Hence, we have
as desired.

Apart from using this confidence interval to compare variances (or standard deviations), it can also be useful to justify some assumptions about variances. Let us illustrate these two usages in the following examples.

Example. (Comparison of standard deviations) An economist wants to compare the severity of the income inequality of countries A and B. Using the Gini coefficient for the comparison is a common way, but somehow both countries do not publish their Gini coefficient, or other measures of income inequality. Thus, the economist decides to have a comparison between the severity of the income inequality of countries A and B, by conducting a survey to the citizens in countries X and Y to ask for their monthly income (in USD), and then compare the standard deviation of the income in country X, , and that of the income in country Y, .

The following is summary for the results from the survey:

(a) Construct a 95% confidence interval for .

(b) The economist will think that the income inequality in a country is at least as severe as another country if the standard deviation of the income in that country is greater or equal to that in another country. Can the economist be 95% confident that the income inequality in country X is at least as severe as country Y?

Solution.

(a) Since (see the column for first degree of freedom to be 12 and the row for second degree of freedom to be 24 in -table for ), and (the property of -distribution), a 95% confidence interval for is

Taking the positive square root for lower and upper bounds of the above confidence interval (considering the above proof, we can do this), a 95% confidence interval for is

(b) No, since , and there are some values less than to 1 in the confidence interval in (a).

Clipboard

Exercise. What is the minimum/maximum value of for the economist to be 95% confident that the income inequality in country X is at least as severe as that in country Y?

Solution

For the economist to be 95% confident that the income inequality in country X is at least as severe as that in country Y, the lower bound of the above 95% confidence interval should be at least 1. That is,

Hence, the minimum value of is (approximately) 2339.79.


Example. (Justification of assumptions about variance) A statistics question is given to each student in two groups of high school students from high schools X and Y, and the result about the time taken to finish the question (in minutes) is summarized as follows:

Assume the distribution of the time is normal. Is using the confidence interval
"reasonable" to estimate difference between the mean time taken to finish the question of all students in high school X and that in high school?

Solution. For the estimation to be reasonable, we should be able to assume the population variances of the time taken in high school X and Y, and respectively, to be the same (this is the assumption for constructing the given confidence interval).

To be able to assume the variances are equal, we need to be quite confident that the ratio is "close to" 1. In other words, there should be a confidence interval for that covers the value 1, whose width is sufficiently small (recall that when increases, the width will become smaller. A rule of thumb for the width to be "sufficiently small", while the confidence is still "sufficiently large" is that the can take the value of 0.1 [6]. Notice that for the confidence interval to cover 1, is required to be smaller or equal to a certain value, since smaller gives wider confidence interval, and thus when gets smaller, the confidence interval gets wider and eventually covers 1 for a certain value of .).

In this case, a confidence interval for is

For this confidence interval to contain 1, we need to have
For (1) to hold, needs to be "very small" (, which is still quite small compared to 8.007. So, we know that must at least be smaller than 0.01) [7]. Hence, to satisfy both inequalities, needs to be very small. In other words, in order for the confidence interval to contain 1, needs to be very small, which means the width of the confidence interval is very large. Therefore, we are not confident that the ratio to be close to 1. Hence, we are not able to assume the variances are equal.

Remark.

  • Graphically, the inequalities look like:
|                  
|     #     
|   #    #  
|  #        #     area = alpha/2 is very small
| #            #   | 
|#               |#v 
|                |//#
*----------------*----
                8.007
|                  
|   #       
|  //////#      area = alpha/2 is very large
| #/////////#    |                            
| |/////////// # v   
|#|///////////////#  
| |/////////////////#
*-*-------------------
0.125              8.007


Approximated confidence intervals for means edit

Previously, the distributions for the population are assumed to be normal, but the distributions are often not normal in reality. So, does it mean our previous discussions are meaningless in reality? No. The discussions are indeed still quite meaningful in reality, since we can use the central limit theorem to "connect" the distributions in reality (which are usually not normal) to normal distribution. Through this, we can construct approximated confidence intervals, since we use central limit theorem for approximation.

To be more precise, recall that the central limit theorem suggests that with some suitable assumptions. Therefore, if the sample size is large enough (a rule of thumb: at least 30), then follows approximately standard normal distribution. Hence, is a pivotal quantity (approximately). Recall from the property of normal distribution that if is a random sample from , then we have exactly (not approximately), and we have used this for the pivotal quantity for the confidence interval for mean when variance is known, and also the confidence interval for when is known. Therefore, we can just use basically the same confidence interval in these cases, but we need to notice that such confidence intervals are approximated, but not exact since we have used the central limit theorem for constructing the pivot quantity.

Now, how about the other confidence intervals where the pivotal quantity is "not in this form"? In the confidence interval for difference in means when variance is unknown, the pivotal quantity is similar in some sense: (see the corresponding theorem for the meaning of the notations involved). Can we use the central limit theorem to conclude that when the distributions involved are not normal (but are still independent), and the sample sizes and are both large enough, then approximately? The answer is yes. For the proof, see the following exercise.

Clipboard

Exercise. Use the central limit theorem to prove that when the distributions involved are not normal (but are still independent), and the sample sizes and are both large enough, then approximately.

Solution

Proof. Under the assumptions, by the central limit theorem, we know that approximately and approximately. Using these approximations (the distributions involved are still assumed to be independent, so their corresponding approximated distributions are also independent), we apply the property of normal distribution and get . The result then follows in a similar way as in the previous proof of this result when the distributions involved are normal.


As a result, we know that we can again just use basically the same confidence interval in this case, but of course such confidence interval is approximated.

There are still some confidence intervals that are not considered yet. Let us first consider the confidence interval for mean when the variance is unknown.

Recall that we have mentioned that we can simply replace the "" by "" according to the weak law of large number, which is quite intuitive. But why can we do this? Consider the following theorem.

Theorem. (Approximated confidence interval for mean when variance is unknown). Let be a random sample from a certain distribution, with finite mean and variance. When the variance is unknown and the sample size is large (at least 30), an approximated confidence interval for the mean is

Remark.

  • The corresponding interval estimate is , with observed values and .
  • We can also apply this result similarly for constructing the confidence interval for when is unknown: we just replace the "" by in the confidence interval when is known to get an approximated confidence interval.

Proof. Under the assumption that the random sample has finite mean and variance, applying weak law of large number gives (we have shown that , then we can just apply continuous mapping theorem to get this). Hence, () by property of convergence in probability.

By central limit theorem, we have . Thus,

by Slutsky's theorem.

Therefore, is a pivotal quantity, which follows approximately. Notice that its approximated distribution, , is the same as that of pivotal quantity for confidence interval for when is known, namely . As a result, we can use similar steps to obtain the approximated confidence interval, where "" is replaced by "".

So far, we have not discussed how to construct an approximated confidence interval for when is unknown, as well as approximated confidence intervals of variances. Since the pivotal quantities used are constructed according to some results that are exclusive to normal distributions, they all do not work when the distributions involved are not normal. Therefore, there are no simple ways to perform such constructions.

The following table summarizes the approximated confidence intervals in different cases:

Remark.

  • We will more often use the second row in this table for constructing the approximated confidence intervals, since the variances are often unknown in these cases.

Example. Let be a random sample from the Bernoulli distribution . Suppose it is observed that and . Construct an approximated 95% confidence interval for the mean of the Bernoulli distribution, .

Solution. Notice that the variance of the Bernoulli distribution is . Since is unknown (it is what we want to estimate, so it does not make sense to be known), the variance is unknown. Hence, we consider the above approximated confidence interval for mean in the case where the variance is unknown.

Since , an approximated 95% confidence interval for is

Clipboard

Exercise. You are given a (fair or unfair) coin, and you want to estimate the probability for heads coming up, denoted by . Define a random variable such that if heads comes up and otherwise (we assume the coin never land on edge). Suppose you toss the coin 100 times independently. Let be the independent random sample corresponding to these 100 tosses, with the same distribution as the random variable . After tossing the coin 100 times, heads comes up in 68 tosses and tail comes up in 32 tosses. Construct an approximated 90% confidence interval for . What does it suggest about the coin?


Solution

Notice that follows the Bernoulli distribution . Thus, the mean of the population is , which is what we want to estimate. Also, we know that the population variance is unknown.

Based on the result from the 100 tosses, we know that 68 of equal one, and 32 of them equal zero. As a result, the sample mean is , and the sample standard deviation is .

Since , an approximated 90% confidence interval for is

which suggests with 90% confidence that the coin is biased toward heads (since all values in the confidence interval exceed 0.5).


Let us consider an application of the approximated confidence intervals.

Proposition. (Confidence interval for probability) Let be a random variable, and where is a set of real numbers. Define another (Bernoulli) random variable [8]. Let be a random sample with the same distribution as , and let be the corresponding independent random sample, given by respectively. When the sample size is large (at least 30), an approximated confidence interval for is

where and is the sample mean and standard deviation of respectively.

Remark.

  • We may regard the event as "success". Then the probability is the probability of "success", and is the relative frequency of "success" for the sample (i.e. the ratio of the number of "success" to the sample size).
  • The reason for to be the relative frequency is that, for each , when the th outcome is "success", the value of is one (and zero otherwise). Hence, the sample sum gives the number of "success" outcomes.
  • Often, the probability involved is interpreted as a proportion for a large population. For example, the proportion of the people labelled with "success" in a large population.

Proof. Since , by the fundamental bridge between probability and expectation, we have

Applying the result for constructing an approximated confidence interval for mean when variance is unknown (the variance of is since follows the Bernoulli distribution actually), an approximated confidence interval for is

Example. Consider the above proposition. Show that the sample variance .

Proof. We will use the result that . First, we have

For each , we have , since

  • case 1: . Then, .
  • case 2: . Then, .

Hence,

It follows that


Example. A box contains an unknown and large number of balls, where some are red. To estimate the proportion of the red balls in the box, we draw a single ball from the box, and then put it back into the box, for 100 times. Suppose we get 7 red balls in these 100 draws. Construct an approximated 95% confidence interval for the proportion of the red balls in the box.

Solution. From the given result, we have , and thus . Since , an approximated 95% confidence interval for the proportion is

Clipboard

Exercise. Suppose we repeat the drawing process 10000 times, and we get 700 red balls in these 10000 draws. Construct an approximated 95% confidence interval for the proportion of the red balls in the box.

Solution

Notice that we also have and in this case. Hence, an approximated 95% confidence interval for the proportion is



  1. In more complicated cases, the coverage probability may vary with , i.e. is a varying function of .
  2. No, since if this is the case, then
  3. Usually, we choose and such that and because of convenience (if the pdf of is symmetric about , then we know that ).
  4. Although this assumption may not make sense (since clearly the time spent cannot be negative, while the support of normal distribution is ), we use this assumption for illustration purpose. Nevertheless, if the mean of the normal distribution is "positive enough", then the probability for getting negative values is very low, and close to 0 anyway. Also, as we can see in the last section, no matter what the underlying distribution is, we can use central limit theorem to construct an approximated confidence interval, provided that the sample size is large enough.
  5. We need to do this since chi-squared distribution is not symmetric about . Graphically, it looks like
    |         area: 1-a
    |     #    |
    |   #....# v
    |  # .......#   
    | # |..........#
    |#  |..........|  #
    *---*----------*------
    chi^2 1-a/2  chi^2 a/2
    
  6. Of course, this "cutoff" value is somewhat subjective, and different people may have different opinions about this. But in the examples here, the conditions imposed on will be quite "extreme" so that the decision is clear.
  7. For (2) to hold, can take a wide range of values ( which is still much greater than 0.125).
  8. is the greek letter Xi, which may be regarded as "the greek letter corresponding to x".