# Statistics/Interval Estimation

## Introduction edit

Previously, we have discussed *point estimation*, which gives us an estimator for the value of an unknown parameter .
Now, suppose we want to know the size of *error* of the point estimator ,
i.e. the difference between and the unknown parameter .
Of course, we can make use of the value of the *mean squared error* of , ,
or other things.

However, what if we only know about one specific *point estimates*?
We cannot calculate the mean squared error of its corresponding point estimator with *just* this point estimates, right?
So, how do we know the possible size of error of this *point estimates*? Indeed, it is impossible to tell, since we are only given a particular estimated value of parameter ,
but of course we do know the value of the unknown parameter , thus the difference between this point estimate and is also unknown.

To illustrate this, consider the following example: suppose we take a random sample of 10 students from one particular course in university to estimate the mean score of the students in the final exam in that course, denoted by , (assume the score is normally distributed), and the observed value of the sample mean is . Then, what is the difference between this point estimate and the true unknown parameter ? Can we be "confident" that this sample mean is close to , say ?

It is possible that is, say 90, and somehow the students in the sample are the one with very poor performance. On the other hand, it is also possible that is, say 30, and somehow the students in the sample are the one who perform well (relatively). Of course, it is also possible that the is quite close to 60, say 59. From this example, we can see that a particular value does not tell us the possible size of error: the error can be very large, and also can be very small.

In this chapter, we will introduce *interval estimation * where we use *interval estimator* that can describe the size of error through providing the probability for the random interval (i.e. interval with at least one of its bounds to be a random variable) given by the interval estimator to contain the unknown parameter , which measures the "accuracy" of the interval estimator of , and hence the size of error.

As suggested by the name *interval estimator*, the estimator involves some sort of *intervals*.
Also, as one may expect, *interval estimation* is also based on *statistics*:

**Definition.**
(Interval estimation)
*Interval estimation* is a process of using the value of a *statistic* to estimate an *interval* of plausible values of an unknown parameter.

Of course, we would like the probability for the unknown parameter to lie in the interval to be close to 1, so that the interval estimator is very accurate. However, a very accurate interval estimator may have a very bad "precision", i.e. the interval covers "too many" plausible values of an unknown parameter, and therefore even if we know that is very likely to be one of such values, there are too many different possibilities. Hence, such interval estimator is not very "useful". To illustrate this, suppose the interval concerned is , which is the parameter space of . Then, of course (so the "confidence" is high) since must lie in its parameter space. However, such interval has basically "zero precision", and is quite "useless", since the "plausible values" of in the intervals are essentially all possible values of .

From this, we can observe the need of the "precision" of the interval, that is, we also want the *width* of the interval to be small, so that we can have some ideas about the "location" of .
However, as the interval becomes smaller, it is more likely that such interval misses , i.e. does not cover the actual value of , and therefore the probability for to lie in that interval becomes smaller, i.e. the interval becomes less "accurate".
To illustrate this, let us consider the extreme case: the interval is so small that it becomes an interval containing a single point (the two end-points of the interval coincide).
Then, the "interval estimator" basically becomes a "point estimator" in some sense, and we know that it is very unlikely that the true value of equals the value of the point estimator ( lies in that "interval" is equivalent to in this case). Indeed, if the distribution of is *continuous*, then .

As we can see from above, although we want the interval to have a very high "confidence" and also "very precise" (i.e. the interval is very narrow), we cannot have both of them, since an increase in confidence causes a decrease in "precision", and an increase in "precision" causes a decrease in confidence.
Therefore, we need to make some compromises between them, and pick an interval that gives a sufficiently high confidence, and also is quite precise.
In other words, we would like to have a *narrow* interval that will cover with a *large probability*.

## Terminologies edit

Now, let us formally define some terminologies related to *interval estimation*.

**Definition.**
(Interval estimator)
Let be a random sample.
An *interval estimator* of an unknown parameter is a *random interval* where
and are two statistics such that always.

**Remark.**

- We call the interval as
*random*interval since both endpoints and are random variables. - The interval involved may also be an open interval (), a half-open and half-closed interval ( or ), or an one-sided interval ( or ) (we may take and (in extended real number sense).
- When we observe that , we call the
*interval estimate*of , denoted by ( and are no longer random).

**Definition.**
(Coverage probability)
The *coverage probability* of an *interval estimator* is .

**Example.**
Let be a random sample from the normal distribution .
Consider an interval estimator of : .

(a) Calculate the probability .

(b) Calculate the coverage probability .

*Solution*:

(a) Since the distribution of is continuous, .

(b) The coverage probability

**Exercise.**

(a) Guess that whether the coverage probability is greater than .

(b) Calculate to see whether your guess in (a) is correct or not.

(c) (construction of interval estimator) Find such that (Hint: where ).

(d) Suppose it is observed that . Find the interval estimate of the given interval estimator .

(e) Suppose the actual parameter is 1.2. Does lie in the interval estimate in (d)?

(a) Intuitively, one should guess that this is true.

(b)

(c) Such is .

**Proof.**

(d) Under this observation, . Hence, the interval estimate is .

(e) Since , lies in the interval estimate .

**Definition.**
(Confidence coefficient)
For an *interval estimator* of , the confidence coefficient of , denoted by , is the *infimum* of the (set of) *coverage probabilities* (over all in the parameter space ), .

**Remark.**

*Infimum*means the greatest lower bound (it is the same as minimum under some conditions). Thus is the greatest lower bound of the coverage probabilities over all . Intuitively, this means the*confidence coefficient*is chosen*conservatively*: when there is some making the coverage probability low, it will decrease the confidence coefficient.- In simple cases, the value of coverage probability does not depend on the choice of (i.e. is a constant function of )
^{[1]}. Hence, the confidence coefficient . Unless otherwise specified, you can assume this is true in the following. - The reason for choosing the notation to be "" is related to
*hypothesis testing*, where "" has some special meanings.

- As we shall see in the next chapter, there is a close relationship between
*confidence intervals*and*hypothesis testing*, in the sense that one of them can be constructed by using another one.

- As we shall see in the next chapter, there is a close relationship between

- Interval estimator with a measure of "confidence" is called
*confidence interval*. In this case, the confidence coefficient is a measure of confidence. Hence, the interval estimator with the confidence coefficient is a confidence interval, or more specifically confidence interval (usually is expressed as a percentage).

**Example.**
(Interpretation of confidence coefficient)
Consider an interval estimator of a unknown parameter : . Suppose its confidence coefficient is .

- Student A's claim: since the confidence coefficient is , the coverage probability . It follows that the probability for to lie in interval estimate in an experiment is also .
- Student B's claim: from an interval estimate coming from an experiment, we know that it either contains or does not contain . In the former case, the coverage probability is 1, and in the latter case, the coverage probability is 0. Hence, student A's claim is wrong.
- Student C's claim: when we perform a large number of experiments, we will expect the interval estimate in of them contains , and the interval estimate in another of them does not contain .

Comment on each claim.

*Solution*:

Student B's claim is correct, since in a single experiment, the interval estimate is already decided (and thus fixed). Also, the unknown parameter is fixed (the population distribution is given). This means that whether lies or does not lies in the fixed interval estimate is not a random event. Instead, it is already decided based on the fixed and .

For student A's claim, it is wrong since the student B's claim is correct. It may be more natural to understand this why it is wrong if we rephrase the claim a little bit: "the probability for *fixed* to lie in *fixed* interval estimate is ."
This is incorrect since the event involved is not even random!
To see this more clearly, we can consider what happen if we "hypothetically" repeat this particular experiment with *fixed* and *fixed* interval estimate many times. We can see that the "outcome" in every experiment is the same, that is either lies in , or does not lie in in *all* experiments. Then, it follows by the definition of frequentist probability that the probability is either 1 (former case) or 0 (latter case).

We may modify student A's claim to make it correct: the probability for to lie in an interval estimator is . This can be interpreted as: the probability for to lie in an interval estimate calculated from a *future* and *not yet realized* sample (NOT a realized sample, which is a *past* sample) is .

Student C's claim is also correct, since we can interpret the probability from frequentist point of view, i.e. consider the probability as the "long-run" proportion for the interval estimates (for each trial, an interval estimate is observed from the interval estimator ) that contains the true parameter .

**Remark.**

- We may say that we "feel confident" that lies in an interval estimate , corresponding to a confidence interval, from an experiment.
- To understand this, we may refer to the student C's claim above. When we think about how "confident" are we about the statement that lies in , we may consider this:

- we "hypothetically" repeat the generation of interval estimates many times, and we will expect that of them contain .
- Then, it is natural to "feel" confident that the interval estimate contains based on these hypothetical experiments.
- Alternatively, as suggested above, it is
*correct*to say that the probability for to lie in an interval estimate calculated from a*future*and*not yet realized*sample is . - Hence, the probability measures the "reliability" of
*estimation procedure*and*method*(the higher the probability, the higher the reliability). - Therefore, it is natural to feel confident that the interval estimate contains based on the above reliability.

- We may regard "we feel confident that lies in the interval estimate " to be an intuitive and alternative expression of "the interval estimate is a confidence interval".

**Example.**
Continue from the previous example about normal distribution . The confidence coefficient of interval estimator of , , is 0.9545, or approximately 95%.
Hence, such interval may be called 95% confidence interval.

**Exercise.**
Consider a continuous distribution with an unknown real-valued parameter , and a random sample drawn from it.
Suppose and where and are statistics of such that always (Can ? ^{[2]}) ( is the parameter space of ).

4. Can you suggest a (i) *0% confidence interval*; (ii) *100% confidence interval*?

(i) Since the distribution is continuous, one may take , for example, as the 0% confidence interval since .

(ii) One may take (i.e. ), which is the parameter space of as the 100% confidence interval. This is because . (In general, a 100% confidence interval for an unknown parameter is the parameter space of that unknown parameter.)

## Construction of confidence intervals edit

After understanding what confidence interval is, we would like to know how to construct one naturally.
A main way for such construction is using the *pivotal quantity*, which is defined below.

**Definition.**
(Pivotal quantity)
A random variable is a *pivotal quantity* (of ) (which is function of the random sample and the unknown parameter (vector) ) if
the distribution of is independent from the parameter (vector) ,
that is, the distribution is the same for each value of .

**Remark.**

- A pivotal quantity may
*not*be a statistic, since*statistic*is only a function of random sample (but not the unknown parameter(s)), while*pivotal quantity*is a function of the random sample*and the unknown parameter (vector)*. - If the expression of a pivotal quantity does
*not*involve , such pivotal quantity is a statistic, and is called*ancillary statistic*. - Here, we focus on the pivotal quantities with expressions
*involving*, so that we can use them to construct confidence intervals.

After having such pivotal quantity , we can construct a confidence interval for by the following steps:

- For that value of , find such that
^{[3]}( does not involve since is a pivotal quantity). - After that, we can transform to since the expression of involves , as we have assumed (the resulting inequalities should be
*equivalent*to the original inequalities, that is, , so that ).

**Example.**
Consider a random sample from normal distribution with *unknown* mean and *known* variance . Find a pivotal quantity (of ).

*Solution*:
By the property of normal distribution, .
Since is independent of the unknown parameter , is a pivotal quantity.

Alternatively, is also a pivotal quantity, since is independent of (both and are known, so the variance of this distribution is known).

**Exercise.**

(a) Is a pivotal quantity?

(b) Is a pivotal quantity?

(a) No, since , and this distribution depends on .

(b) Yes, since , and this distribution is independent of .

**Exercise.**
Consider a random sample from normal distribution with *unknown* mean and variance . Apart from , suggest a pivotal quantity of .

A pivotal quantity is , since , and the distribution is independent from both and .

**Example.**
Consider a random sample from exponential distribution .
Find a pivotal quantity.
(Hint: and if , then .)

*Solution*:
A pivotal quantity is , since , where the distribution is independent from .

**Example.**
(A pivotal quantity for general distributions)
Consider a distribution with unknown parameter (vector) , where its cdf is bijective (so that exists).

(a) Prove that .

(b) Suppose a random sample is taken from that distribution. Suggest a pivotal quantity.

*Solution*:

**Proof.**
Let , and be the cdf of .
Then, .
Differentiating the cdf gives .
This means that the pdf of is 1. Also, we know that the support of is since is essentially a probability. Hence, we have .

(b) From (a), we know that (the cdf involves the parameter (vector) ), and this distribution is clearly independent from the parameter (vector) . Hence, a pivotal quantity is (or , which is the same since is taken from the distribution with cdf ).

**Exercise.**
Suppose a single observation is taken from the exponential distribution . Find a pivotal quantity using the above method.

Since the cdf of is , as suggested by above, a pivotal quantity is , which follows the uniform distribution .

## Confidence intervals for means of normal distributions edit

In the following, we will use the concept of pivotal quantity to construct confidence intervals for means and variances of *normal* distributions.
After that, because of the central limit theorem, we can construct *approximated* confidence intervals for means and variances of other types of distributions that are not normal.

### Mean of a normal distribution edit

Before discussing this confidence interval, let us first introduce a notation:

- is the upper percentile of at level , i.e. it satisfies where .

We can find (or calculate) the values of for different from *standard normal table*.

**Theorem.**
(Confidence interval of when is known)
Let be a random sample from . When is known, a confidence interval for is

**Remark.**

- By the definition of interval estimate, the corresponding interval estimate of is , with observed value . For simplicity, we usually also call such interval estimate as confidence interval.

- We can know the meaning of confidence interval by referring to the context.
- Usually, when the realization of random sample is given, then confidence interval is referring to the interval
*estimate*(since the interval estimate is more "useful" and "suggestive" in this context).

- Unless otherwise specified, the confidence intervals referred are constructed according to this theorem (if applicable).

**Proof.**
Let . Since is a pivotal quantity (its distribution is independent from ), we set

The following graph illustrates :

| *-|-* /##|##\ /###|###\ <----- area 1-a /####|####\ /#####|#####\ /######|######\ /|######|######|\ area --*.|######|######|.*-- a/2 --> ....|######|######|.... <--- area a/2 ------------*--------------- -z_{a/2} z_{a/2}

**Example.**
Consider a random sample from .
Suppose it is observed that .

Construct a 95% confidence interval for .

*Solution*:
Since , and (from standard normal table, we know that where ),
it follows that a 95% confidence interval for is .

**Exercise.**

(a) Construct a 99% confidence interval for .

(b) Construct a 90% confidence interval for .

(c) (alternative way of constructing confidence interval) Using a similar argument as in the proof of the previous theorem, another confidence interval for is since . Construct another 95% confidence interval for by this method.

(d) Is the width of the confidence interval (i.e. its upper bound minus its lower bound) constructed in (c) the same as that constructed in the example?

(a) Since (from standard normal table), a 99% confidence interval for is .

(b) Since (from standard normal table), a 90% confidence interval for is .

(c) Since and from standard normal table, another 95% confidence interval for is

(d) The width of the confidence interval in the example is 1.753 (approximately), while the width of the confidence interval in (c) is 1.825 (approximately). Hence, their widths are *different*.

**Remark.**

- As we can see, when the confidence coefficient is higher, the corresponding confidence interval becomes wider.
- This matches with our previous discussion.

**Example.**
An undergraduate student John wants to estimate the average daily time spent on playing computer games of all teenagers aged 14-16 in the previous week.
Clearly, it is infeasible to ask *all* such teenagers about their time spent.
Therefore, John decides to take a random sample of 10 teenagers from the population (all teenagers aged 14-16), and their time spent (in hours) are

The distribution of the daily time spent is assumed to be normal, with mean and variance ^{[4]}.
Also, based on the past data about the daily time spent, John assumes that the standard deviation of the distribution is .

(a) Construct a 95% confidence interval for .

(b) According to John, the computer game addiction problem is *serious* among teenagers aged 14-16 if the average daily time spent on playing computer games is at least a quarter of a day, i.e. 6 hours, and is *not serious* otherwise. Can John be (95%) confident that the computer game addiction problem is (i) *serious*; (ii) *not serious* among teenagers aged 14-16, based on the 95% confidence interval in (a)?

(c) To be more certain about the time spent, John would like to construct a 99% confidence interval for , with width not exceeding 1 hour. At least how many teenagers should be in the random sample to satisfy this requirement?

(d) Suppose John take another random sample from the population where the number of teenagers involved is the number suggested in (c). If in this random sample, construct a 99% confidence interval for , and verify that its width does not exceed 1 hour.

(e) Can John be (99%) confident that the computer game addiction problem is *not serious* among teenagers aged 14-16 based on the 99% confidence interval in (d)?

*Solution*:

(a) Since the realization of the sample mean is , and , the 95% confidence interval for is .

(b) (i) No, since the confidence interval contains some values that are strictly less than 6 and some that are at least 6. Thus, although John is 95% confident that lies in , it is uncertain that whether the time spent will be at least 6 when lies in .

(b) (ii) No, and the reason is similar to that in (i) (it is uncertain that whether the time spent will be lower than 6 when lies in ).

(c) Since a 99% confidence interval for is , its width is (which is independent from ). Also, we know that . Thus, to satisfy the requirement, we need to have

(d) A 99% confidence interval for is . Its width is approximately 0.999535, which is less than 1.

(e) Yes, since all values in the interval in (d) are strictly less than 6.

**Exercise.**
Suppose John decides to take another random sample consisting of even more teenagers, 500 of them. If in this random sample,

(a) Construct a 99% confidence interval for .

(b) Can John be (99%) confident that the computer game addiction problem is *not serious* among teenagers aged 14-16 based on the 99% confidence interval in (a)?

(a) A 99% confidence interval for is .

(b) No, since some values in the interval are at least 6.

We have previously discussed a way to construct confidence interval for the mean when the variance is *known*. However, this is not always the case in practice.
We may not know the variance, right? Then, we cannot use the in the confidence interval from the previous theorem.

Intuitively, one may think that we can use the *sample variance* to "replace" the ,
according to the weak law of large number. Then, we can simply replace the unknown in the confidence interval by the known (or its realization for interval estimate).
However, the flaw in this argument is that the sample size may not be large enough to apply the weak law of large number for approximation.

**Remark.**

- A rule of thumb is that we may regard the sample size is large enough for applying this kind of convergence theorem (e.g. weak law of large number and central limit theorem) for approximation, when the sample size is
*at least 30*. Otherwise, the approximation is not accurate enough, i.e. the error can be quite large, and thus we should not use such theorem for approximation.

So, you may now ask that when the sample size is large enough, can we do such "replacement" for approximation. The answer is *yes*, and we will discuss in the last section about approximated confidence intervals.

Before that section, the confidence intervals discussed is *exact* in the sense that no approximation is used to construct them. Therefore, the confidence intervals constructed "work" for *every* sample size, no matter how large or how small it is (it works even if the sample size is 1, although such confidence interval constructed may not be very "nice", in the sense that the width of the interval may be quite large).

Before discussing how to construct an confidence interval for the mean when the variance is unknown, we first give some results that are useful for deriving such confidence interval.

**Proposition.**
(Several properties about sample mean and variance)
Let be a random sample from .
Also let be the sample mean and be the sample variance, where is the sample size.
Then,

(i) and are independent.

(ii) where is a chi-squared distribution with degrees of freedom.

(iii) where is a -distribution with degrees of freedom.

**Proof.**

(i) One may use Basu's theorem to prove this, but the details about Basu's theorem and the proof are omitted here, since they are a bit complicated.

(ii) We will use the following definition of chi-squared distribution : where are independent. Also, we will use the fact that the mgf of is .

Now, first let which follows since are independent. Then, we write as

By (i), and are independent. Thus, (a function of ) is independent from (a function of ). Now, let and . Since and are independent, and also we have from above derivation, the mgf

(iii) We will use the following definition of -distribution : where , , and and are independent.

After using this definition, it is easy to prove (iii) with (ii), as follows:

Using this proposition, we can prove the following theorem. Again, before discussing this confidence interval, let us introduce a notation:

- is the upper percentile of at level , i.e. it satisfies where .

**Theorem.**
(Confidence interval of when is unknown)
Let be a random sample . When is unknown, a confidence interval for is

**Remark.**

- The corresponding interval estimate is , with observed value and (sample standard deviation is nonnegative. Thus, this is equivalent to ).
- We can find values of for some values of and from "-table"

- In this "-table", the first column indicates the value of , and the first row (one-sided) indicates (it is "one-sided" since in our definition of , "" is involved, which is "one-sided". For instance, if we want to get , we can look at in the first row (one-sided).
- Alternatively, we can look at the second row (two-sided) which indicates the confidence coefficient of the confidence interval (), corresponding to . For instance, if we want to get , we can look at in the second row (two-sided).

- When , the -distribution tends to the standard normal distribution . Hence, when is large, . Thus, if one cannot find the value of from -table since is so large that it does not appear at the table, then one can simply get from the standard normal table for an approximation.

**Proof.**
By (iii) in the previous proposition, we have . Since is independent from , is a pivotal quantity of .
Hence, we set

**Example.**
A government officer of country A would like to know the daily average time spent on exercises of all citizens in country A.
Suppose the variance of the time spent is unknown, and a random sample of 10 citizens are taken from the population.
The following is the time spent on exercises in a particular day for the citizens in that sample (in minutes):

Assuming the time spent follows normal distribution, construct a 95% confidence interval for the daily average time spent on exercises of all citizens in country A, denoted by .

*Solution*:
First, we have , and .

Also, from "97.5% (one-sided) and 9" (or "95% (two-sided) and 9") in -table.

Thus, a 99% confidence interval for is

**Exercise.**
The government officer also want to know the mean monthly wage of all citizens in country A, . Suppose the standard deviation of the monthly wage is 2000 (all wages in this example are in USD).
From a salary survey which asks for 15 citizens for their monthly wages, the following monthly wages (in USD) are obtained:

(a) Construct a 90% confidence interval for the mean monthly wage , assuming the underlying distribution for the wage is normal.

(b) For the salary survey, it is found that a respondent gives a wrong monthly wage: he enters one more "0" accidentally, and thus answers 10000 instead of 1000. Thus, after the correction, the corrected sample data of the monthly wages is:

Update the confidence interval in (a) to a correct one, based on this correct data.

(a) First, we can get , and . Also, (from "95% (one-sided) (or 90% (two-sided)) and 14" in -table).

Hence, a 90% confidence interval for is

(b) First, we update and : and . Then, a new 90% confidence interval for is

**Example.**

A farmer Tom owns an apple orchard. He just harvests a large amount of apples (1000 apples) from his orchard. To access the "quality" of this batch of apples, he wants to know the mean weight of the apples in this batch, . However, since there are too many apples, it is cumbersome to weigh every apple in this batch. Hence, Tom decides to take a random sample of 5 apples, and use them to roughly estimate the mean weight of the apples. The following is the weight of the apples in that sample (in g):

Assume the distribution of the weight is normal.

(a) Based on past experiences, Tom knows that the standard deviation of the weight of the apples is 30g. Construct a 95% confidence interval for .

(b) Tom finds out that in this batch, the apples grown are of new kind, that have not been grown before. Therefore, the standard deviation of the weight based on past experiences cannot be applied to estimation of the mean weight for this batch. Hence, the standard deviation of the weight is now unknown. Construct an updated 95% confidence interval for .

*Solution*:

(a) We have . Also, from standard normal table. Hence, a 95% confidence interval for is

(b) We have , and from -table. Hence, a 95% confidence interval for is

**Exercise.**
Tom sells this batch of apple to a nearby shop, and it is known that the shop will pay Tom in USD *for each* apple, where is the mean weight of the batch of apples.

(a) Construct a 95% confidence interval for the total revenue of Tom from this transaction (in USD), , based on the above confidence interval in (b) of example.

(b) Suppose the cost for Tom to grow this batch of apples is USD 6000. Can Tom be 95% confident that he can earn a positive profit (i.e. the revenue exceeds the cost) from this transaction.

(a) Since , and a 95% confidence interval for is based on (b). From the construction of confidence interval, we have

(b) Yes, since Tom can be 95% confident that lies in , which exceeds the cost USD 6000.

### Difference in means of two normal distributions edit

Sometimes, apart from estimating mean of a *single* normal distribution, we would like to estimate the *difference* in means of *two* normal distributions for making comparison.
For example, apart from estimating the mean amount of time (lifetime) for a bulb until it burns out, we are often interested in estimating the *difference* between life of two different bulbs, so that we know which of the bulbs will last longer in average, and
then we know that bulb has a higher "quality".

First, let us discuss the case where the two normal distributions are *independent*.

Now, the problem is that how should we construct a confidence interval for the *difference* in two means.
It seems that we can just construct two confidence intervals for each of the two means respectively.
Then, the confidence interval for is .
However, this is indeed incorrect since when we have and , it does *not* mean that (there are no results in probability that justify this).

On the other hand, it seems that since and are independent (since the normal distributions we are considering are independent), then we have

However, this is actually also incorrect. The flaw is that "*when and , we have *" only means

*not*a confidence interval (in general).

So, the above two "methods" to construct confidence intervals for difference in means of two independent normal distributions actually do not work.
Indeed, we do *not* use the confidence interval for each of the two means, which is constructed previously, to construct a confidence interval for difference in the two means.
Instead, we consider a *pivotal quantity* of the difference in the two means, which is a standard way for constructing confidence intervals.

**Theorem.**
(Confidence interval of when and is known)
Let and be a random sample from two *independent* distributions and (i.e. the random variables and are independent) respectively, where and are known.
Then, a confidence interval for is

**Remark.**

- The corresponding interval estimate is with observed values and .

**Exercise.**
Show that (the meaning of the notations follows the above theorem).

**Proof.**
First, we have and by property of normal distribution (, and are *independent* random samples).
Then, applying the property of normal distribution again (the two distributions and are *independent*, and hence and are *independent*), we have

Now, we will prove the above theorem based on the result shown in the previous exercise:

**Proof.**
Let (from the previous exercise). Then, is a pivotal quantity of .
Hence, we have

**Example.**
A statistician wants to compare two kinds of light bulbs (brand A vs. brand B) by their lifetime (amount of time until the bulb burns out).
He takes a random sample of 10 light bulbs from the light bulbs of each of the brands, and measure their lifetime.
The following is the summary of the results:

(a) Construct a 95% confidence interval for the mean lifetime of brand A light bulb () and brand B light bulb () respectively.

(b) Construct a 95% confidence interval for .

(c) Can the statistician conclude with 95% confidence that brand B light bulb has a longer lifetime than brand A light bulb on average?

*Solution*.

(a) Since and the sample size for each of the random samples is 10, a 95% confidence interval for is

(b) A 95% confidence interval for is

(c) Since all values in the 95% confidence interval in (b) are positive, it means the statistician can be 95% confident that mean lifetime of brand B light bulb is longer than brand A light bulb.

**Remark.**

- Notice that some values in the 95% confidence interval for exceed all values in the 95% confidence interval for . However, we are still 95% confident that exceeds .

**Exercise.**
Suppose there is a brand C light bulb, and the statistician also takes a random sample of 10 light bulbs from brand C light bulbs.
It is observed that the sample mean of this random sample is 4210 hours, and the standard deviation of brand C light bulbs is a known to be hours.
Assume the distribution of the lifetime is normal.

After constructing 95% confidence intervals using the above theorem, the statistician is 95% confident that the brand C light bulb has a longer or same lifetime than *both* brand A and B light bulbs on average.
Show that the maximum value of is (approximately) 110.31.

**Proof.**
Let be the mean lifetime of brand C light bulb.

A 95% confidence interval for is

*maximum*value of is 110.31.

Now, we will consider the case where the variances are *unknown*.
In this case, the construction of the confidence interval for the difference in means is more complicated,
and even more complicated when .
Thus, we will only discuss the case where is unknown.
As you may expect, we will also use some results mentioned previously for constructing confidence interval for when is unknown in this case.

**Theorem.**
(Confidence interval of when is unknown)
Let and be a random sample from two *independent* distributions and ) respectively.
Then, a confidence interval for is

**Remark.**

- The corresponding interval estimate is , with observed values .

**Proof.**
Let (the reason for this to follow is shown in a previous exercise).
From a previous result, we know that and .
Then, we know that the mgf of is and the mgf of is .
Since the distributions and are independent, the mgf of is

By the independence of sample mean and sample variance ( and are independent, and are independent), we can deduce that and are independent. Thus, by the definition of -distribution,