Statistics/Interval Estimation

Statistics
Interval Estimation

Introduction

Previously, we have discussed point estimation, which gives us an estimator ${\hat {\theta }}$ for the value of an unknown parameter $\theta$ . Now, suppose we want to know the size of error of the point estimator ${\hat {\theta }}$ , i.e. the difference between ${\hat {\theta }}$ and the unknown parameter $\theta$ . Of course, we can make use of the value of the mean squared error of ${\hat {\theta }}$ , $\mathbb {E} [({\hat {\theta }}-\theta )^{2}]$ , or other things.

However, what if we only know about one specific point estimates? We cannot calculate the mean squared error of its corresponding point estimator with just this point estimates, right? So, how do we know the possible size of error of this point estimates? Indeed, it is impossible to tell, since we are only given a particular estimated value of parameter $\theta$ , but of course we do know the value of the unknown parameter $\theta$ , thus the difference between this point estimate and $\theta$ is also unknown.

To illustrate this, consider the following example: suppose we take a random sample of 10 students from one particular course in university to estimate the mean score of the students in the final exam in that course, denoted by $\mu$ , (assume the score is normally distributed), and the observed value of the sample mean is ${\overline {x}}=60$ . Then, what is the difference between this point estimate and the true unknown parameter $\mu$ ? Can we be "confident" that this sample mean is close to $\mu$ , say $\mu \in [{\overline {x}}-5,{\overline {x}}+5]=[55,65]$ ?

It is possible that $\mu$ is, say 90, and somehow the students in the sample are the one with very poor performance. On the other hand, it is also possible that $\mu$ is, say 30, and somehow the students in the sample are the one who perform well (relatively). Of course, it is also possible that the $\mu$ is quite close to 60, say 59. From this example, we can see that a particular value ${\overline {x}}=60$ does not tell us the possible size of error: the error can be very large, and also can be very small.

In this chapter, we will introduce interval estimation where we use interval estimator that can describe the size of error through providing the probability for the random interval (i.e. interval with at least one of its bounds to be a random variable) given by the interval estimator to contain the unknown parameter $\theta$ , which measures the "accuracy" of the interval estimator of $\theta$ , and hence the size of error.

As suggested by the name interval estimator, the estimator involves some sort of intervals. Also, as one may expect, interval estimation is also based on statistics:

Definition. (Interval estimation) Interval estimation is a process of using the value of a statistic to estimate an interval of plausible values of an unknown parameter.

Of course, we would like the probability for the unknown parameter $\theta$ to lie in the interval to be close to 1, so that the interval estimator is very accurate. However, a very accurate interval estimator may have a very bad "precision", i.e. the interval covers "too many" plausible values of an unknown parameter, and therefore even if we know that $\theta$ is very likely to be one of such values, there are too many different possibilities. Hence, such interval estimator is not very "useful". To illustrate this, suppose the interval concerned is $\mathbb {R}$ , which is the parameter space of $\theta$ . Then, of course $\mathbb {P} (\theta \in \mathbb {R} )=1$ (so the "confidence" is high) since $\theta$ must lie in its parameter space. However, such interval has basically "zero precision", and is quite "useless", since the "plausible values" of $\theta$ in the intervals are essentially all possible values of $\theta$ .

From this, we can observe the need of the "precision" of the interval, that is, we also want the width of the interval to be small, so that we can have some ideas about the "location" of $\theta$ . However, as the interval becomes smaller, it is more likely that such interval misses $\theta$ , i.e. does not cover the actual value of $\theta$ , and therefore the probability for $\theta$ to lie in that interval becomes smaller, i.e. the interval becomes less "accurate". To illustrate this, let us consider the extreme case: the interval is so small that it becomes an interval containing a single point (the two end-points of the interval coincide). Then, the "interval estimator" basically becomes a "point estimator" in some sense, and we know that it is very unlikely that the true value of $\theta$ equals the value of the point estimator ${\hat {\theta }}$ ( $\theta$ lies in that "interval" is equivalent to $\theta ={\hat {\theta }}$ in this case). Indeed, if the distribution of ${\hat {\theta }}$ is continuous, then $\mathbb {P} ({\hat {\theta }}=\theta )=0$ .

As we can see from above, although we want the interval to have a very high "confidence" and also "very precise" (i.e. the interval is very narrow), we cannot have both of them, since an increase in confidence causes a decrease in "precision", and an increase in "precision" causes a decrease in confidence. Therefore, we need to make some compromises between them, and pick an interval that gives a sufficiently high confidence, and also is quite precise. In other words, we would like to have a narrow interval that will cover $\theta$ with a large probability.

Terminologies

Now, let us formally define some terminologies related to interval estimation.

Definition. (Interval estimator) Let $X_{1},\dotsc ,X_{n}$ be a random sample. An interval estimator of an unknown parameter $\theta$ is a random interval $[L(\mathbf {X} ),U(\mathbf {X} )]$ where $L=L(X_{1},\dotsc ,X_{n})$ and $U=U(X_{1},\dotsc ,X_{n})$ are two statistics such that $L(\mathbf {X} )\leq U(\mathbf {X} )$ always.

Remark.

We call the interval $[L(\mathbf {X} ),U(\mathbf {X} )]$ as random interval since both endpoints $L(\mathbf {X} )$ and $U(\mathbf {X} )$ are random variables.
The interval involved may also be an open interval ( $(L(\mathbf {X} ),U(\mathbf {X} ))$ ), a half-open and half-closed interval ( $(L(\mathbf {X} ),U(\mathbf {X} )]$ or $[L(\mathbf {X} ),U(\mathbf {X} ))$ ), or an one-sided interval ( $(-\infty ,U]$ or $[L,\infty )$ ) (we may take $L(\mathbf {X} )=-\infty$ and $U(\mathbf {X} )=\infty$ (in extended real number sense).
When we observe that $X_{1}=x_{1},\dotsc ,X_{n}=x_{n}$ , we call $[L(x_{1},\dotsc ,x_{n}),U(x_{1},\dotsc ,x_{n})]$ the interval estimate of $\theta$ , denoted by $[L(\mathbf {x} ),U(\mathbf {x} )]$ ( $L(\mathbf {x} )$ and $U(\mathbf {x} )$ are no longer random).

Definition. (Coverage probability) The coverage probability of an interval estimator $[L(\mathbf {X} ),U(\mathbf {X} )]$ is $\mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])$ .

Example. Let $X_{1},X_{2},X_{3},X_{4}$ be a random sample from the normal distribution ${\mathcal {N}}(\mu ,1)$ . Consider an interval estimator of $\mu$ : $[{\overline {X}}-1,{\overline {X}}+1]$ .

(a) Calculate the probability $\mathbb {P} ({\overline {X}}=\mu )$ .

(b) Calculate the coverage probability $\mathbb {P} (\mu \in [{\overline {X}}-1,{\overline {X}}+1])$ .

Solution:

(a) Since the distribution of ${\overline {X}}$ is continuous, $\mathbb {P} ({\overline {X}}=\mu )=0$ .

(b) The coverage probability ${\begin{aligned}\mathbb {P} (\mu \in [{\overline {X}}-1,{\overline {X}}+1])&=\mathbb {P} ({\overline {X}}-1\leq \mu \leq {\overline {X}}+1)\\&=\mathbb {P} (-1\leq \mu -{\overline {X}}\leq 1)\\&=\mathbb {P} (1\geq {\overline {X}}-\mu \geq -1)\\&=\mathbb {P} \left({\frac {-1}{\sqrt {1/4}}}\leq {\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\leq {\frac {1}{\sqrt {1/4}}}\right)\\&=\mathbb {P} \left(-2\leq Z\leq 2\right)&\left(Z={\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\sim {\mathcal {N}}(0,1),{\text{ by property of normal distribution}}\right)\\&\approx 0.97725-0.02275&({\text{standard normal table}})\\&=0.9545.\end{aligned}}$

Exercise.

(a) Guess that whether the coverage probability $\mathbb {P} (\mu \in [{\overline {X}}-2,{\overline {X}}+2])$ is greater than $\mathbb {P} (\mu \in [{\overline {X}}-1,{\overline {X}}+1])\approx 0.9545$ .

(b) Calculate $\mathbb {P} (\mu \in [{\overline {X}}-2,{\overline {X}}+2])$ to see whether your guess in (a) is correct or not.

(c) (construction of interval estimator) Find $k$ such that $\mathbb {P} (\mu \in [{\overline {X}}-k,{\overline {X}}+k])\approx 0.9973$ (Hint: $\mathbb {P} (-3\leq Z\leq 3)\approx 0.9973$ where $Z\sim {\mathcal {N}}(0,1)$ ).

(d) Suppose it is observed that $X_{1}=1,X_{2}=3,X_{3}=2.5,X_{4}=1.5$ . Find the interval estimate of the given interval estimator $[{\overline {X}}-1,{\overline {X}}+1]$ .

(e) Suppose the actual parameter $\mu$ is 1.2. Does $\mu$ lie in the interval estimate in (d)?

Solution

(a) Intuitively, one should guess that this is true.

(b) ${\begin{aligned}\mathbb {P} (\mu \in [{\overline {X}}-{\color {blue}2},{\overline {X}}+{\color {blue}2}])&=\mathbb {P} ({\overline {X}}-{\color {blue}2}\leq \mu \leq {\overline {X}}+{\color {blue}2})\\&=\mathbb {P} (-{\color {blue}2}\leq \mu -{\overline {X}}\leq {\color {blue}2})\\&=\mathbb {P} ({\color {blue}2}\geq {\overline {X}}-\mu \geq -{\color {blue}2})\\&=\mathbb {P} \left({\frac {-{\color {blue}2}}{\sqrt {1/4}}}\leq {\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\leq {\frac {\color {blue}2}{\sqrt {1/4}}}\right)\\&=\mathbb {P} \left(-{\color {blue}4}\leq Z\leq {\color {blue}4}\right)&\left(Z={\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\sim {\mathcal {N}}(0,1),{\text{ by property of normal distribution}}\right)\\&\approx 0.99997-0.00003&({\text{standard normal table}})\\&=0.99994.\\\end{aligned}}$

(c) Such $k$ is ${\frac {3}{2}}$ .

Proof. ${\begin{aligned}\mathbb {P} (\mu \in [{\overline {X}}-{\color {blue}3/2},{\overline {X}}+{\color {blue}3/2}])&=\mathbb {P} ({\overline {X}}-{\color {blue}3/2}\leq \mu \leq {\overline {X}}+{\color {blue}3/2})\\&=\mathbb {P} (-{\color {blue}3/2}\leq \mu -{\overline {X}}\leq {\color {blue}3/2})\\&=\mathbb {P} ({\color {blue}3/2}\geq {\overline {X}}-\mu \geq -{\color {blue}3/2})\\&=\mathbb {P} \left({\frac {-{\color {blue}3/2}}{\sqrt {1/4}}}\leq {\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\leq {\frac {\color {blue}3/2}{\sqrt {1/4}}}\right)\\&=\mathbb {P} \left(-{\color {blue}3}\leq Z\leq {\color {blue}3}\right)&\left(Z={\frac {{\overline {X}}-\mu }{\sqrt {1/4}}}\sim {\mathcal {N}}(0,1),{\text{ by property of normal distribution}}\right)\\&\approx 0.9973.&({\text{hint}})\\\\\end{aligned}}$

$\Box$

(d) Under this observation, ${\overline {x}}={\frac {1+3+2.5+1.5}{4}}=2$ . Hence, the interval estimate is $[1,3]$ .

(e) Since $1.2\in [1,3]$ , $\mu$ lies in the interval estimate $[1,3]$ .

Definition. (Confidence coefficient) For an interval estimator $[L(\mathbf {X} ),U(\mathbf {X} )]$ of $\theta$ , the confidence coefficient of $[L(\mathbf {X} ),U(\mathbf {X} )]$ , denoted by $1-\alpha$ , is the infimum of the (set of) coverage probabilities (over all $\theta$ in the parameter space $\Theta$ ), ${\underset {\theta \in \Theta }{\inf }}\;\mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])$ .

Remark.

Infimum means the greatest lower bound (it is the same as minimum under some conditions). Thus ${\underset {\theta \in \Theta }{\inf }}\;\mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])$ is the greatest lower bound of the coverage probabilities over all $\theta \in \Theta$ . Intuitively, this means the confidence coefficient is chosen conservatively: when there is some $\theta$ making the coverage probability low, it will decrease the confidence coefficient.
In simple cases, the value of coverage probability does not depend on the choice of $\theta$ (i.e. is a constant function of $\theta$ ) ^[1]. Hence, the confidence coefficient $1-\alpha ={\underset {\theta \in \Theta }{\inf }}\;\mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])=\mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])$ . Unless otherwise specified, you can assume this is true in the following.
The reason for choosing the notation to be " $1-\alpha$ " is related to hypothesis testing, where " $\alpha$ " has some special meanings.

As we shall see in the next chapter, there is a close relationship between confidence intervals and hypothesis testing, in the sense that one of them can be constructed by using another one.

Interval estimator with a measure of "confidence" is called confidence interval. In this case, the confidence coefficient is a measure of confidence. Hence, the interval estimator with the confidence coefficient is a confidence interval, or more specifically $1-\alpha$ confidence interval (usually $1-\alpha$ is expressed as a percentage).

Example. (Interpretation of confidence coefficient) Consider an interval estimator of a unknown parameter $\theta$ : $[L(\mathbf {X} ),U(\mathbf {X} )]$ . Suppose its confidence coefficient is $1-\alpha$ .

Student A's claim: since the confidence coefficient is $1-\alpha$ , the coverage probability $\mathbb {P} (\theta \in [L(\mathbf {X} ),U(\mathbf {X} )])=1-\alpha$ . It follows that the probability for $\theta$ to lie in interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ in an experiment is also $1-\alpha$ .
Student B's claim: from an interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ coming from an experiment, we know that it either contains $\theta$ or does not contain $\theta$ . In the former case, the coverage probability is 1, and in the latter case, the coverage probability is 0. Hence, student A's claim is wrong.
Student C's claim: when we perform a large number of experiments, we will expect the interval estimate in $1-\alpha$ of them contains $\theta$ , and the interval estimate in another $\alpha$ of them does not contain $\theta$ .

Comment on each claim.

Solution:

Student B's claim is correct, since in a single experiment, the interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ is already decided (and thus fixed). Also, the unknown parameter $\theta$ is fixed (the population distribution is given). This means that whether $\theta$ lies or does not lies in the fixed interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ is not a random event. Instead, it is already decided based on the fixed $\theta$ and $[L(\mathbf {x} ),U(\mathbf {x} )]$ .

For student A's claim, it is wrong since the student B's claim is correct. It may be more natural to understand this why it is wrong if we rephrase the claim a little bit: "the probability for fixed $\theta$ to lie in fixed interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ is $1-\alpha$ ." This is incorrect since the event involved is not even random! To see this more clearly, we can consider what happen if we "hypothetically" repeat this particular experiment with fixed $\theta$ and fixed interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ many times. We can see that the "outcome" in every experiment is the same, that is either $\theta$ lies in $[L(\mathbf {x} ),U(\mathbf {x} )]$ , or does not lie in $[L(\mathbf {x} ),U(\mathbf {x} )]$ in all experiments. Then, it follows by the definition of frequentist probability that the probability is either 1 (former case) or 0 (latter case).

We may modify student A's claim to make it correct: the probability for $\theta$ to lie in an interval estimator $[L(\mathbf {X} ),U(\mathbf {X} )]$ is $1-\alpha$ . This can be interpreted as: the probability for $\theta$ to lie in an interval estimate calculated from a future and not yet realized sample (NOT a realized sample, which is a past sample) is $1-\alpha$ .

Student C's claim is also correct, since we can interpret the probability from frequentist point of view, i.e. consider the probability as the "long-run" proportion for the interval estimates (for each trial, an interval estimate is observed from the interval estimator $[L(\mathbf {X} ),U(\mathbf {X} )]$ ) that contains the true parameter $\theta$ .

Remark.

We may say that we "feel $(1-\alpha )100\%$ confident" that $\theta$ lies in an interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ , corresponding to a $1-\alpha$ confidence interval, from an experiment.
To understand this, we may refer to the student C's claim above. When we think about how "confident" are we about the statement that $\theta$ lies in $[L(\mathbf {x} ),U(\mathbf {x} )]$ , we may consider this:

we "hypothetically" repeat the generation of interval estimates many times, and we will expect that $1-\alpha$ of them contain $\theta$ .
Then, it is natural to "feel" $(1-\alpha )100\%$ confident that the interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ contains $\theta$ based on these hypothetical experiments.
Alternatively, as suggested above, it is correct to say that the probability for $\theta$ to lie in an interval estimate calculated from a future and not yet realized sample is $1-\alpha$ .
Hence, the probability $1-\alpha$ measures the "reliability" of estimation procedure and method (the higher the probability, the higher the reliability).
Therefore, it is natural to feel $(1-\alpha )100\%$ confident that the interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ contains $\theta$ based on the above reliability.

We may regard "we feel $(1-\alpha )100\%$ confident that $\theta$ lies in the interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ " to be an intuitive and alternative expression of "the interval estimate $[L(\mathbf {x} ),U(\mathbf {x} )]$ is a $1-\alpha$ confidence interval".

Example. Continue from the previous example about normal distribution ${\mathcal {N}}(\mu ,1)$ . The confidence coefficient of interval estimator of $\mu$ , $[{\overline {X}}-1,{\overline {X}}+1]$ , is 0.9545, or approximately 95%. Hence, such interval may be called 95% confidence interval.

Exercise. Consider a continuous distribution with an unknown real-valued parameter $\theta$ , and a random sample $X_{1},\dotsc ,X_{n}$ drawn from it. Suppose $\mathbb {P} (\theta \leq T_{1})=0.025$ and $\mathbb {P} (\theta \geq T_{2})=0.025$ where $T_{1}$ and $T_{2}$ are statistics of $X$ such that $T_{2}\geq T_{1}$ always (Can $T_{2}=T_{1}$ ? ^[2]) ( $\mathbb {R}$ is the parameter space of $\theta$ ).

4. Can you suggest a (i) 0% confidence interval; (ii) 100% confidence interval?

Solution

(i) Since the distribution is continuous, one may take $[1,1]$ , for example, as the 0% confidence interval since $\mathbb {P} (\theta \in [1,1])=\mathbb {P} (\theta =1)=0$ .

(ii) One may take $(-\infty ,\infty )$ (i.e. $\mathbb {R}$ ), which is the parameter space of $\theta$ as the 100% confidence interval. This is because $\mathbb {P} (\theta \in (-\infty ,\infty ))=1$ . (In general, a 100% confidence interval for an unknown parameter is the parameter space of that unknown parameter.)

Construction of confidence intervals

After understanding what confidence interval is, we would like to know how to construct one naturally. A main way for such construction is using the pivotal quantity, which is defined below.

Definition. (Pivotal quantity) A random variable $Q(\mathbf {X} ,\theta )=Q(X_{1},\dotsc ,X_{n},\theta )$ is a pivotal quantity (of $\theta$ ) (which is function of the random sample $X_{1},\dotsc ,X_{n}$ and the unknown parameter (vector) $\theta$ ) if the distribution of $Q(\mathbf {X} ,\theta )$ is independent from the parameter (vector) $\theta$ , that is, the distribution is the same for each value of $\theta$ .

Remark.

A pivotal quantity may not be a statistic, since statistic is only a function of random sample $X_{1},\dotsc ,X_{n}$ (but not the unknown parameter(s)), while pivotal quantity is a function of the random sample and the unknown parameter (vector) $\theta$ .
If the expression of a pivotal quantity does not involve $\theta$ , such pivotal quantity is a statistic, and is called ancillary statistic.
Here, we focus on the pivotal quantities with expressions involving $\theta$ , so that we can use them to construct confidence intervals.

After having such pivotal quantity $Q(\mathbf {X} ,\theta )$ , we can construct a $1-\alpha$ confidence interval for $\theta$ by the following steps:

For that value of $\alpha$ , find $a,b$ such that $\mathbb {P} (a\leq Q(\mathbf {X} ,\theta )\leq b)=1-\alpha$ ^[3] ( $a,b$ does not involve $\theta$ since $Q(\mathbf {X} ,\theta )$ is a pivotal quantity).
After that, we can transform $a\leq Q(\mathbf {X} ,\theta )\leq b$ to $L(\mathbf {X} )\leq \theta \leq U(\mathbf {X} )$ since the expression of $Q(\mathbf {X} ,\theta )$ involves $\theta$ , as we have assumed (the resulting inequalities should be equivalent to the original inequalities, that is, $a\leq Q(\mathbf {X} ,\theta )\leq b{\color {darkgreen}\iff }L(\mathbf {X} )\leq \theta \leq U(\mathbf {X} )$ , so that $\mathbb {P} (L(\mathbf {X} )\leq \theta \leq U(\mathbf {X} )){\color {darkgreen}=}\mathbb {P} (a\leq Q(\mathbf {X} ,\theta )\leq b)$ ).

Example. Consider a random sample $X_{1},\dotsc ,X_{n}$ from normal distribution ${\mathcal {N}}(\mu ,\sigma ^{2})$ with unknown mean $\mu$ and known variance $\sigma ^{2}$ . Find a pivotal quantity (of $\mu$ ).

Solution: By the property of normal distribution, ${\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\sim {\mathcal {N}}(0,1)$ . Since ${\mathcal {N}}(0,1)$ is independent of the unknown parameter $\mu$ , ${\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}$ is a pivotal quantity.

Alternatively, ${\overline {X}}-\mu \sim {\mathcal {N}}(0,n\sigma ^{2})$ is also a pivotal quantity, since ${\mathcal {N}}(0,n\sigma ^{2})$ is independent of $\mu$ (both $n$ and $\sigma ^{2}$ are known, so the variance of this distribution $n\sigma ^{2}$ is known).

Exercise.

(a) Is ${\overline {X}}$ a pivotal quantity?

(b) Is ${\frac {X_{1}}{\mu }}$ a pivotal quantity?

Solution

(a) No, since ${\overline {X}}\sim {\mathcal {N}}(\mu ,n\sigma ^{2})$ , and this distribution depends on $\mu$ .

(b) Yes, since ${\frac {X_{1}}{\mu }}\sim {\mathcal {N}}(1,\sigma ^{2}/\mu ^{2})$ , and this distribution is independent of $\mu$ .

Exercise. Consider a random sample $X_{1},\dotsc ,X_{n}$ from normal distribution ${\mathcal {N}}(\mu ,\sigma ^{2})$ with unknown mean $\mu$ and variance $\sigma ^{2}$ . Apart from ${\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}$ , suggest a pivotal quantity of $(\mu ,\sigma ^{2})$ .

Solution

A pivotal quantity is ${\frac {X_{1}-\mu }{\sigma }}$ , since ${\frac {X_{1}-\mu }{\sigma }}\sim {\mathcal {N}}(0,1)$ , and the distribution is independent from both $\mu$ and $\sigma ^{2}$ .

Example. Consider a random sample $X_{1},\dotsc ,X_{n}$ from exponential distribution $\operatorname {Exp} (\lambda )$ . Find a pivotal quantity. (Hint: $\sum _{i=1}^{n}X_{i}\sim \operatorname {Gamma} (n,\lambda )$ and if $Y\sim \operatorname {Gamma} (\alpha ,\lambda )$ , then $cY\sim \operatorname {Gamma} (\alpha ,c\lambda )$ .)

Solution: A pivotal quantity is ${\frac {\sum _{i=1}^{n}X_{i}}{\lambda }}$ , since ${\frac {\sum _{i=1}^{n}X_{i}}{\lambda }}\sim \operatorname {Gamma} (n,\lambda /\lambda )\equiv \operatorname {Gamma} (\alpha ,1)$ , where the distribution is independent from $\lambda$ .

Example. (A pivotal quantity for general distributions) Consider a distribution with unknown parameter (vector) $\theta$ , where its cdf $F_{X}$ is bijective (so that $F_{X}^{-1}$ exists).

(a) Prove that $F_{X}(X)\sim {\mathcal {U}}[0,1]$ .

(b) Suppose a random sample $X_{1},\dotsc ,X_{n}$ is taken from that distribution. Suggest a pivotal quantity.

Solution:

(a)

Proof. Let $Y=F_{X}(X)$ , and $F_{Y}(y)$ be the cdf of $Y$ . Then, $F_{Y}(y)=\mathbb {P} (Y\leq y)=\mathbb {P} (F_{X}(X)\leq y)=\mathbb {P} (X\leq F_{X}^{-1}(y))=F_{X}(F_{X}^{-1}(y))=y$ . Differentiating the cdf gives $f_{Y}(y)={\frac {d}{dy}}F_{Y}(y)={\frac {d}{dy}}y=1$ . This means that the pdf of $F_{X}(X)$ is 1. Also, we know that the support of $F_{X}(X)$ is $[0,1]$ since $F_{X}(X)$ is essentially a probability. Hence, we have $F_{X}(X)\sim {\mathcal {U}}[0,1]$ .

$\Box$

(b) From (a), we know that $F_{X}(X)\sim {\mathcal {U}}[0,1]$ (the cdf $F_{X}(X)$ involves the parameter (vector) $\theta$ ), and this distribution is clearly independent from the parameter (vector) $\theta$ . Hence, a pivotal quantity is $F_{X}(X_{1})$ (or $F_{X_{1}}(X_{1})$ , which is the same since $X_{1}$ is taken from the distribution with cdf $F_{X}$ ).

Exercise. Suppose a single observation $X_{1}$ is taken from the exponential distribution $\operatorname {Exp} (\lambda )$ . Find a pivotal quantity using the above method.

Solution

Since the cdf of $\operatorname {Exp} (\lambda )$ is $F_{X}(x)=1-e^{-\lambda x}$ , as suggested by above, a pivotal quantity is $1-e^{-\lambda X_{1}}$ , which follows the uniform distribution ${\mathcal {U}}[0,1]$ .

Confidence intervals for means of normal distributions

In the following, we will use the concept of pivotal quantity to construct confidence intervals for means and variances of normal distributions. After that, because of the central limit theorem, we can construct approximated confidence intervals for means and variances of other types of distributions that are not normal.

Mean of a normal distribution

Before discussing this confidence interval, let us first introduce a notation:

$z_{\alpha }$ is the upper percentile of ${\mathcal {N}}(0,1)$ at level $\alpha$ , i.e. it satisfies $\mathbb {P} (Z\geq z_{\alpha })=\alpha$ where $Z\sim {\mathcal {N}}(0,1)$ .

We can find (or calculate) the values of $z_{\alpha }$ for different $\alpha$ from standard normal table.

Theorem. (Confidence interval of $\mu$ when $\sigma ^{2}$ is known) Let $X_{1},\dotsc ,X_{n}$ be a random sample from ${\mathcal {N}}(\mu ,\sigma ^{2})$ . When $\sigma ^{2}$ is known, a $1-\alpha$ confidence interval for $\mu$ is $\left[{\overline {X}}-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}},{\overline {X}}+z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right].$

Remark.

By the definition of interval estimate, the corresponding interval estimate of $\mu$ is $\left[{\overline {x}}-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}},{\overline {x}}+z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right]$ , with observed value ${\overline {X}}={\overline {x}}$ . For simplicity, we usually also call such interval estimate as $1-\alpha$ confidence interval.

We can know the meaning of $1-\alpha$ confidence interval by referring to the context.
Usually, when the realization of random sample is given, then $1-\alpha$ confidence interval is referring to the interval estimate (since the interval estimate is more "useful" and "suggestive" in this context).

Unless otherwise specified, the $1-\alpha$ confidence intervals referred are constructed according to this theorem (if applicable).

Proof. Let $Z={\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\sim {\mathcal {N}}(0,1)$ . Since $Z$ is a pivotal quantity (its distribution is independent from $\mu$ ), we set $1-\alpha =1-\mathbb {P} (Z\geq z_{\alpha /2})-\mathbb {P} (Z\leq -z_{\alpha /2})=\mathbb {P} (-z_{\alpha /2}<Z<z_{\alpha /2})=\mathbb {P} (-z_{\alpha /2}\leq Z\leq z_{\alpha /2}),$ where $z_{\alpha /2}$ is a constant (and does not involve $\mu$ ). Then, we have ${\begin{aligned}1-\alpha &=\mathbb {P} (-z_{\alpha /2}\leq Z\leq z_{\alpha /2})\\&=\mathbb {P} \left(-z_{\alpha /2}\leq {\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\leq z_{\alpha /2}\right)\\&=\mathbb {P} \left(-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\leq {\overline {X}}-\mu \leq z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right)\\&=\mathbb {P} \left(z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\geq \mu -{\overline {X}}\geq -z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right)\\&=\mathbb {P} \left(-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\leq \mu -{\overline {X}}\leq z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right)&({\text{rewrite}})\\&=\mathbb {P} \left({\overline {X}}-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\leq \mu \leq {\overline {X}}+z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right).\\\end{aligned}}$ The result follows.

$\Box$

The following graph illustrates $\mathbb {P} (-z_{\alpha /2}\leq Z\leq z_{\alpha /2})=1-\alpha$ :

                    |
                  *-|-*
                 /##|##\   
                /###|###\  <----- area 1-a
               /####|####\
              /#####|#####\
             /######|######\
            /|######|######|\
 area    --*.|######|######|.*-- 
 a/2 --> ....|######|######|....  <---  area a/2
        ------------*---------------
           -z_{a/2}       z_{a/2}

Example. Consider a random sample $X_{1},\dotsc ,X_{5}$ from ${\mathcal {N}}(\mu ,1)$ . Suppose it is observed that $X_{1}=0.5,X_{2}=1,X_{3}=-2,X_{4}=0,X_{5}=0.5$ .

Construct a 95% confidence interval for $\mu$ .

Solution: Since ${\overline {x}}={\frac {0.5+1-2+0+0.5}{5}}=0$ , and $z_{0.025}\approx 1.96$ (from standard normal table, we know that $\mathbb {P} (Z\leq 1.96)\approx 1-0.025=0.975$ where $Z\sim {\mathcal {N}}(0,1)$ ), it follows that a 95% confidence interval for $\mu$ is $\left[0-1.96{\frac {\sqrt {1}}{\sqrt {5}}},0+1.96{\frac {\sqrt {1}}{\sqrt {5}}}\right]\approx [-0.8765,0.8765]$ .

Exercise.

(a) Construct a 99% confidence interval for $\mu$ .

(b) Construct a 90% confidence interval for $\mu$ .

(c) (alternative way of constructing confidence interval) Using a similar argument as in the proof of the previous theorem, another $1-\alpha$ confidence interval for $\mu$ is $\left[{\overline {X}}-z_{\alpha /5}{\frac {\sigma }{\sqrt {n}}},{\overline {X}}+z_{4\alpha /5}{\frac {\sigma }{\sqrt {n}}}\right]$ since $1-\alpha =1-{\frac {4\alpha }{5}}-{\frac {\alpha }{5}}=1-\mathbb {P} (Z\geq z_{4\alpha /5})-\mathbb {P} (Z\leq -z_{\alpha /5})=\mathbb {P} (-z_{\alpha /5}\leq Z\leq z_{4\alpha /5})$ . Construct another 95% confidence interval for $\mu$ by this method.

(d) Is the width of the confidence interval (i.e. its upper bound minus its lower bound) constructed in (c) the same as that constructed in the example?

Solution

(a) Since $z_{0.005}\approx 2.57$ (from standard normal table), a 99% confidence interval for $\mu$ is $\left[0-2.57{\frac {\sqrt {1}}{\sqrt {5}}},0+2.57{\frac {\sqrt {1}}{\sqrt {5}}}\right]\approx [-1.149,1.149]$ .

(b) Since $z_{0.05}\approx 1.64$ (from standard normal table), a 90% confidence interval for $\mu$ is $\left[0-1.64{\frac {\sqrt {1}}{\sqrt {5}}},0+1.64{\frac {\sqrt {1}}{\sqrt {5}}}\right]\approx [-0.733,0.733]$ .

(c) Since $z_{0.01}\approx 2.33$ and $z_{0.04}\approx 1.75$ from standard normal table, another 95% confidence interval for $\mu$ is $\left[0-2.33{\frac {\sqrt {1}}{\sqrt {5}}},0+1.75{\frac {\sqrt {1}}{\sqrt {5}}}\right]\approx [-1.042,0.783]$

(d) The width of the confidence interval in the example is 1.753 (approximately), while the width of the confidence interval in (c) is 1.825 (approximately). Hence, their widths are different.

Remark.

As we can see, when the confidence coefficient is higher, the corresponding confidence interval becomes wider.
This matches with our previous discussion.

Example. An undergraduate student John wants to estimate the average daily time spent on playing computer games of all teenagers aged 14-16 in the previous week. Clearly, it is infeasible to ask all such teenagers about their time spent. Therefore, John decides to take a random sample of 10 teenagers from the population (all teenagers aged 14-16), and their time spent (in hours) are

3,8,10,5,9,9,1,3,0,4

The distribution of the daily time spent is assumed to be normal, with mean $\mu$ and variance $\sigma ^{2}$ ^[4]. Also, based on the past data about the daily time spent, John assumes that the standard deviation of the distribution is $\sigma =3$ .

(a) Construct a 95% confidence interval for $\mu$ .

(b) According to John, the computer game addiction problem is serious among teenagers aged 14-16 if the average daily time spent on playing computer games is at least a quarter of a day, i.e. 6 hours, and is not serious otherwise. Can John be (95%) confident that the computer game addiction problem is (i) serious; (ii) not serious among teenagers aged 14-16, based on the 95% confidence interval in (a)?

(c) To be more certain about the time spent, John would like to construct a 99% confidence interval for $\mu$ , with width not exceeding 1 hour. At least how many teenagers should be in the random sample to satisfy this requirement?

(d) Suppose John take another random sample from the population where the number of teenagers involved is the number suggested in (c). If ${\overline {x}}=4.7$ in this random sample, construct a 99% confidence interval for $\mu$ , and verify that its width does not exceed 1 hour.

(e) Can John be (99%) confident that the computer game addiction problem is not serious among teenagers aged 14-16 based on the 99% confidence interval in (d)?

Solution:

(a) Since the realization of the sample mean is ${\overline {x}}={\frac {3+8+10+5+9+9+1+3+0+4}{10}}=5.2$ , and $z_{0.025}\approx 1.96$ , the 95% confidence interval for $\mu$ is $\left[5.2-1.96{\frac {3}{\sqrt {10}}},5.2+1.96{\frac {3}{\sqrt {10}}}\right]\approx [3.34,7.06]$ .

(b) (i) No, since the confidence interval contains some values that are strictly less than 6 and some that are at least 6. Thus, although John is 95% confident that $\mu$ lies in $[3.34,7.06]$ , it is uncertain that whether the time spent will be at least 6 when $\mu$ lies in $[3.34,7.06]$ .

(b) (ii) No, and the reason is similar to that in (i) (it is uncertain that whether the time spent will be lower than 6 when $\mu$ lies in $[3.34,7.06]$ ).

(c) Since a 99% confidence interval for $\mu$ is $\left[{\overline {x}}-z_{0.005}{\frac {3}{\sqrt {n}}},{\overline {x}}+z_{0.005}{\frac {3}{\sqrt {n}}}\right]$ , its width is $2z_{0.005}{\frac {3}{\sqrt {n}}}$ (which is independent from ${\overline {x}}$ ). Also, we know that $z_{0.005}\approx 2.57$ . Thus, to satisfy the requirement, we need to have $2(2.57){\frac {3}{\sqrt {n}}}\leq 1\implies n\geq (3(2)(2.57))^{2}\approx 237.776.$ Since the sample size $n$ must be an integer, it follows that the minimum value of $n$ is 238. That is, at least 238 teenagers should be in the random sample to satisfy the requirement.

(d) A 99% confidence interval for $\mu$ is $\left[4.7-2.57{\frac {3}{\sqrt {238}}},4.7+2.57{\frac {3}{\sqrt {238}}}\right]\approx [4.20023,5.199765]$ . Its width is approximately 0.999535, which is less than 1.

(e) Yes, since all values in the interval in (d) are strictly less than 6.

Exercise. Suppose John decides to take another random sample consisting of even more teenagers, 500 of them. If ${\overline {x}}=5.8$ in this random sample,

(a) Construct a 99% confidence interval for $\mu$ .

(b) Can John be (99%) confident that the computer game addiction problem is not serious among teenagers aged 14-16 based on the 99% confidence interval in (a)?

Solution

(a) A 99% confidence interval for $\mu$ is $\left[5.8-2.57{\frac {3}{\sqrt {500}}},5.8+2.57{\frac {3}{\sqrt {500}}}\right]\approx [5.4552,6.1448]$ .

(b) No, since some values in the interval are at least 6.

We have previously discussed a way to construct confidence interval for the mean when the variance is known. However, this is not always the case in practice. We may not know the variance, right? Then, we cannot use the $\sigma$ in the confidence interval from the previous theorem.

Intuitively, one may think that we can use the sample variance $S^{2}$ to "replace" the $\sigma ^{2}$ , according to the weak law of large number. Then, we can simply replace the unknown $\sigma$ in the confidence interval by the known $S$ (or its realization $s$ for interval estimate). However, the flaw in this argument is that the sample size may not be large enough to apply the weak law of large number for approximation.

Remark.

A rule of thumb is that we may regard the sample size is large enough for applying this kind of convergence theorem (e.g. weak law of large number and central limit theorem) for approximation, when the sample size is at least 30. Otherwise, the approximation is not accurate enough, i.e. the error can be quite large, and thus we should not use such theorem for approximation.

So, you may now ask that when the sample size is large enough, can we do such "replacement" for approximation. The answer is yes, and we will discuss in the last section about approximated confidence intervals.

Before that section, the confidence intervals discussed is exact in the sense that no approximation is used to construct them. Therefore, the confidence intervals constructed "work" for every sample size, no matter how large or how small it is (it works even if the sample size is 1, although such confidence interval constructed may not be very "nice", in the sense that the width of the interval may be quite large).

Before discussing how to construct an confidence interval for the mean when the variance is unknown, we first give some results that are useful for deriving such confidence interval.

Proposition. (Several properties about sample mean and variance) Let $X_{1},\dotsc ,X_{n}$ be a random sample from ${\mathcal {N}}(\mu ,\sigma ^{2})$ . Also let ${\overline {X}}={\frac {\sum _{i=1}^{n}X_{i}}{n}}$ be the sample mean and $S^{2}={\frac {\sum _{i=1}^{n}(X_{i}^{2}-{\overline {X}})^{2}}{n}}$ be the sample variance, where $n$ is the sample size. Then,

(i) ${\overline {X}}$ and $S^{2}$ are independent.

(ii) ${\frac {nS^{2}}{\sigma ^{2}}}={\frac {\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}$ where $\chi _{n-1}^{2}$ is a chi-squared distribution with $n-1$ degrees of freedom.

(iii) ${\frac {{\overline {X}}-\mu }{S/{\sqrt {n-1}}}}\sim t_{n-1}$ where $t_{n-1}$ is a $t$ -distribution with $n-1$ degrees of freedom.

Proof.

(i) One may use Basu's theorem to prove this, but the details about Basu's theorem and the proof are omitted here, since they are a bit complicated.

(ii) We will use the following definition of chi-squared distribution $\chi _{k}^{2}$ : $\sum _{i=1}^{k}Z_{i}^{2}\sim \chi _{k}^{2}$ where $Z_{1},Z_{2},\dotsc ,Z_{k}\sim {\mathcal {N}}(0,1)$ are independent. Also, we will use the fact that the mgf of $\chi _{k}^{2}$ is $M(t)=(1-2t)^{-k/2},\quad t<{\frac {1}{2}}$ .

Now, first let $W=\sum _{i=1}^{n}\left({\frac {X_{i}-\mu }{\sigma }}\right)^{2}$ which follows $\chi _{n}^{2}$ since ${\frac {X_{1}-\mu }{\sigma }},\dotsc ,{\frac {X_{n}-\mu }{\sigma }}\sim {\mathcal {N}}(0,1)$ are independent. Then, we write $W$ as ${\begin{aligned}W&=\sum _{i=1}^{n}\left({\frac {X_{i}-\mu }{\sigma }}\right)^{2}\\&=\sum _{i=1}^{n}\left({\frac {X_{i}{\color {darkgreen}-{\overline {X}}}}{\sigma }}+{\frac {{\color {darkgreen}{\overline {X}}}-\mu }{\sigma }}\right)^{2}\\&=\sum _{i=1}^{n}\left({\frac {X_{i}{\color {darkgreen}-{\overline {X}}}}{\sigma }}\right)^{2}+\sum _{i=1}^{n}\left({\frac {{\color {darkgreen}{\overline {X}}}-\mu }{\sigma }}\right)^{2}+0&{\Bigg (}{\color {blue}2}\sum _{i=1}^{n}{\frac {{\color {blue}({\overline {X}}-\mu )}(X_{i}-{\overline {X}})}{\color {blue}\sigma ^{2}}}={\color {blue}{\frac {2({\overline {X}}-\mu )}{\sigma ^{2}}}}{\bigg (}\underbrace {\sum _{i=1}^{n}X_{i}} _{=n{\overline {X}}}-\underbrace {\sum _{i=1}^{n}\overbrace {\overline {X}} ^{{\text{constant wrt }}i}} _{=n{\overline {X}}}{\bigg )}=0{\Bigg )}\\&={\frac {1}{\sigma ^{2}}}\underbrace {\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}} _{=nS^{2}}+{\Big (}\underbrace {\frac {{\sqrt {n}}({\overline {X}}-\mu )}{\sigma }} _{\sim {\mathcal {N}}(0,1){\text{ by property}}}{\Big )}^{2}\\&={\frac {nS^{2}}{\sigma ^{2}}}+Z^{2}&\left(Z={\frac {{\sqrt {n}}({\overline {X}}-\mu )}{\sigma }}\sim {\mathcal {N}}(0,1)\right)\end{aligned}}$ Applying the definition of chi-squared distribution, we have $Z^{2}\sim \chi _{1}^{2}$ .

By (i), ${\overline {X}}$ and $S^{2}$ are independent. Thus, ${\frac {nS^{2}}{\sigma ^{2}}}$ (a function of $S^{2}$ ) is independent from $Z^{2}$ (a function of ${\overline {X}}$ ). Now, let $U={\frac {nS^{2}}{\sigma ^{2}}}$ and $V=Z^{2}$ . Since $U$ and $V$ are independent, and also we have $W=U+V$ from above derivation, the mgf $M_{W}(t)=M_{U+V}(t)=M_{U}(t)M_{V}(t).$ Since $W\sim \chi _{n}^{2}$ and $V\sim \chi _{1}^{2}$ , we can further write $(1-2t)^{-n/2}=M_{U}(t)(1-2t)^{-1/2},\quad t<{\frac {1}{2}},$ which implies that the mgf of $U$ is $M_{U}(t)=(1-2t)^{-(n-1)/2},\quad t<{\frac {1}{2}}$ , which is exactly the mgf of $\chi _{n-1}^{2}$ . Hence, $U={\frac {nS^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}$ .

(iii) We will use the following definition of $t$ -distribution $t_{k}$ : ${\frac {Z}{\sqrt {Y/k}}}\sim t_{k}$ where $Z\sim {\mathcal {N}}(0,1)$ , $Y\sim \chi _{k}^{2}$ , and $Z$ and $Y$ are independent.

After using this definition, it is easy to prove (iii) with (ii), as follows: ${\begin{aligned}{\frac {{\overline {X}}-\mu }{S/{\sqrt {n-1}}}}&={\frac {{\color {darkgreen}{\sqrt {n}}}({\overline {X}}-\mu )/{\color {darkgreen}\sigma }}{{\color {darkgreen}{\sqrt {n}}}S/({\color {darkgreen}\sigma }{\sqrt {n-1}})}}\\&={\frac {{\sqrt {n}}({\overline {X}}-\mu )/\sigma }{\sqrt {{\frac {nS^{2}}{\sigma ^{2}}}{\big /}n-1}}}.\end{aligned}}$ By (ii), ${\frac {nS^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}$ . Also, we know that ${\frac {{\sqrt {n}}({\overline {X}}-\mu )}{\sigma }}$ and ${\frac {nS^{2}}{\sigma ^{2}}}$ are independent since ${\overline {X}}$ and $S^{2}$ are independent by (i). Then, it follows by the above definition that ${\frac {{\overline {X}}-\mu }{S/{\sqrt {n-1}}}}\sim t_{n-1}$ .

$\Box$

Using this proposition, we can prove the following theorem. Again, before discussing this confidence interval, let us introduce a notation:

$t_{\alpha ,\nu }$ is the upper percentile of $t_{\nu }$ at level $\alpha$ , i.e. it satisfies $\mathbb {P} (T\geq t_{\alpha ,\nu })=\alpha$ where $T\sim t_{\nu }$ .

Theorem. (Confidence interval of $\mu$ when $\sigma ^{2}$ is unknown) Let $X_{1},\dotsc ,X_{n}$ be a random sample ${\mathcal {N}}(\mu ,\sigma ^{2})$ . When $\sigma ^{2}$ is unknown, a $1-\alpha$ confidence interval for $\mu$ is $\left[{\overline {X}}-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}},{\overline {X}}+t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right].$

Remark.

The corresponding interval estimate is $\left[{\overline {x}}-t_{\alpha /2,n-1}{\frac {s}{\sqrt {n-1}}},{\overline {x}}+t_{\alpha /2,n-1}{\frac {s}{\sqrt {n-1}}}\right]$ , with observed value ${\overline {X}}={\overline {x}}$ and $S=s$ (sample standard deviation $S$ is nonnegative. Thus, this is equivalent to $S^{2}=s^{2}$ ).
We can find values of $t_{\alpha ,\nu }$ for some values of $\alpha$ and $\nu$ from " $t$ -table"

In this " $t$ -table", the first column indicates the value of $\nu$ , and the first row (one-sided) indicates $1-\alpha$ (it is "one-sided" since in our definition of $t_{\alpha ,\nu }$ , " $T\geq t_{\alpha ,\nu }$ " is involved, which is "one-sided". For instance, if we want to get $t_{0.05,\nu }$ , we can look at $1-0.05=95\%$ in the first row (one-sided).
Alternatively, we can look at the second row (two-sided) which indicates the confidence coefficient of the confidence interval ( $1-\alpha$ ), corresponding to $t_{\alpha /2,\nu }$ . For instance, if we want to get $t_{0.05/2,\nu }$ , we can look at $1-0.05=95\%$ in the second row (two-sided).

When $\nu \to \infty$ , the $t$ -distribution $t_{n}$ tends to the standard normal distribution ${\mathcal {N}}(0,1)$ . Hence, when $\nu$ is large, $t_{\alpha ,\nu }\approx z_{\alpha }$ . Thus, if one cannot find the value of $t_{\alpha ,\nu }$ from $t$ -table since $\nu$ is so large that it does not appear at the table, then one can simply get $z_{\alpha }$ from the standard normal table for an approximation.

Proof. By (iii) in the previous proposition, we have $T={\frac {{\overline {X}}-\mu }{S/{\sqrt {n-1}}}}\sim t_{n-1}$ . Since $t_{n-1}$ is independent from $\mu$ , $T$ is a pivotal quantity of $\mu$ . Hence, we set $1-\alpha =1-\mathbb {P} (T\geq t_{\alpha /2,n-1}))-\mathbb {P} (T\leq -t_{\alpha /2,n-1})=\mathbb {P} (-t_{\alpha /2,n-1}\leq T\leq t_{\alpha /2,n-1})$ where $t_{\alpha /2,n-1}$ is a constant ( $t$ -distribution is symmetric (about $x=0$ ), so we have $\mathbb {P} (T\leq -t_{\alpha /2,n-1})=\alpha /2$ ). It follows that ${\begin{aligned}1-\alpha &=\mathbb {P} (-t_{\alpha /2,n-1}\leq T\leq t_{\alpha /2,n-1})\\&=\mathbb {P} \left(-t_{\alpha /2,n-1}\leq {\frac {{\overline {X}}-\mu }{S/{\sqrt {n-1}}}}\leq t_{\alpha /2,n-1}\right)\\&=\mathbb {P} \left(-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\leq {\overline {X}}-\mu \leq t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)\\&=\mathbb {P} \left(t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\geq \mu -{\overline {X}}\geq -t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)\\&=\mathbb {P} \left(-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\leq \mu -{\overline {X}}\leq t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)&({\text{rewrite}})\\&=\mathbb {P} \left({\overline {X}}-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\leq \mu \leq {\overline {X}}+t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right).\\\end{aligned}}$ The result follows.

$\Box$

Example. A government officer of country A would like to know the daily average time spent on exercises of all citizens in country A. Suppose the variance of the time spent is unknown, and a random sample of 10 citizens are taken from the population. The following is the time spent on exercises in a particular day for the citizens in that sample (in minutes):

10, 0, 60, 20, 30, 30, 120, 40, 30, 10.

Assuming the time spent follows normal distribution, construct a 95% confidence interval for the daily average time spent on exercises of all citizens in country A, denoted by $\mu$ .

Solution: First, we have ${\overline {x}}={\frac {10+0+60+20+30+30+120+40+30+10}{10}}=35$ , and $s={\sqrt {\frac {(10-35)^{2}+(0-35)^{2}+(60-35)^{2}+(20-35)^{2}+(30-35)^{2}+(30-35)^{2}(120-35)^{2}+(40-35)^{2}+(30-35)^{2}+(10-35)^{2}}{10}}}={\sqrt {1065}}\approx 32.634$ .

Also, $t_{0.025,9}\approx 2.262$ from "97.5% (one-sided) and 9" (or "95% (two-sided) and 9") in $t$ -table.

Thus, a 99% confidence interval for $\mu$ is $\left[35-2.262\cdot {\frac {32.634}{\sqrt {9}}},35+2.262\cdot {\frac {32.634}{\sqrt {9}}}\right]\approx [10.39,59.61].$ .

Exercise. The government officer also want to know the mean monthly wage of all citizens in country A, $\mu$ . Suppose the standard deviation of the monthly wage is 2000 (all wages in this example are in USD). From a salary survey which asks for 15 citizens for their monthly wages, the following monthly wages (in USD) are obtained:

1500, 3000, 1200, 4000, 3500, 10000, 5000, 1000, 6000, 3000, 2000, 2000, 1500, 3000, 8000.

(a) Construct a 90% confidence interval for the mean monthly wage $\mu$ , assuming the underlying distribution for the wage is normal.

(b) For the salary survey, it is found that a respondent gives a wrong monthly wage: he enters one more "0" accidentally, and thus answers 10000 instead of 1000. Thus, after the correction, the corrected sample data of the monthly wages is:

1500, 3000, 1200, 4000, 3500, 1000, 5000, 1000, 6000, 3000, 2000, 2000, 1500, 3000, 8000.

Update the confidence interval in (a) to a correct one, based on this correct data.

Solution

(a) First, we can get ${\overline {x}}\approx 3646.67$ , and $s\approx 2526.09$ . Also, $t_{0.05,14}\approx 1.761$ (from "95% (one-sided) (or 90% (two-sided)) and 14" in $t$ -table).

Hence, a 90% confidence interval for $\mu$ is $\left[3646.67-1.761\cdot {\frac {2526.09}{\sqrt {14}}},3646.67+1.761\cdot {\frac {2526.09}{\sqrt {14}}}\right]\approx [2457.77,4835.57]$

(b) First, we update ${\overline {x}}$ and $s$ : ${\overline {x}}\approx 3046.67$ and $s\approx 1948.63$ . Then, a new 90% confidence interval for $\mu$ is $\left[{\color {darkgreen}3046.67}-1.761\cdot {\frac {\color {darkgreen}1948.63}{\sqrt {14}}},{\color {darkgreen}3046.67}+1.761\cdot {\frac {\color {darkgreen}1948.63}{\sqrt {14}}}\right]\approx [2129.55,3963.79]$

Example.

A farmer Tom owns an apple orchard. He just harvests a large amount of apples (1000 apples) from his orchard. To access the "quality" of this batch of apples, he wants to know the mean weight of the apples in this batch, $\mu$ . However, since there are too many apples, it is cumbersome to weigh every apple in this batch. Hence, Tom decides to take a random sample of 5 apples, and use them to roughly estimate the mean weight of the apples. The following is the weight of the apples in that sample (in g):

100, 120, 200, 220, 80.

Assume the distribution of the weight is normal.

(a) Based on past experiences, Tom knows that the standard deviation of the weight of the apples is 30g. Construct a 95% confidence interval for $\mu$ .

(b) Tom finds out that in this batch, the apples grown are of new kind, that have not been grown before. Therefore, the standard deviation of the weight based on past experiences cannot be applied to estimation of the mean weight for this batch. Hence, the standard deviation of the weight is now unknown. Construct an updated 95% confidence interval for $\mu$ .

Solution:

(a) We have ${\overline {x}}=144$ . Also, $z_{0.025}\approx 1.96$ from standard normal table. Hence, a 95% confidence interval for $\mu$ is $\left[144-1.96\cdot {\frac {30}{\sqrt {5}}},144+1.96\cdot {\frac {30}{\sqrt {5}}}\right]\approx [117.70,170.30].$

(b) We have $s\approx 55.71$ , and $t_{0.025,4}\approx 2.776$ from $t$ -table. Hence, a 95% confidence interval for $\mu$ is $\left[144-2.776\cdot {\frac {55.71}{\sqrt {4}}},144+2.776\cdot {\frac {55.71}{\sqrt {4}}}\right]\approx [66.67,221.33].$

Exercise. Tom sells this batch of apple to a nearby shop, and it is known that the shop will pay Tom $0.1\mu$ in USD for each apple, where $\mu$ is the mean weight of the batch of apples.

(a) Construct a 95% confidence interval for the total revenue of Tom from this transaction (in USD), $r$ , based on the above confidence interval in (b) of example.

(b) Suppose the cost for Tom to grow this batch of apples is USD 6000. Can Tom be 95% confident that he can earn a positive profit (i.e. the revenue exceeds the cost) from this transaction.

Solution

(a) Since $r=1000(0.1\mu )=100\mu$ , and a 95% confidence interval for $\mu$ is $[66.67,221.33]$ based on (b). From the construction of confidence interval, we have $1-\alpha =\mathbb {P} \left({\overline {X}}-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\leq \mu \leq {\overline {X}}+t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)\implies 1-\alpha =\mathbb {P} \left(100\left({\overline {X}}-t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)\leq r\leq 100\left({\overline {X}}+t_{\alpha /2,n-1}{\frac {S}{\sqrt {n-1}}}\right)\right).$ Hence, the corresponding confidence interval for $r$ is (approximately) $[6667,22133].$

(b) Yes, since Tom can be 95% confident that $r$ lies in $[6667,22133]$ , which exceeds the cost USD 6000.

Difference in means of two normal distributions

Sometimes, apart from estimating mean of a single normal distribution, we would like to estimate the difference in means of two normal distributions for making comparison. For example, apart from estimating the mean amount of time (lifetime) for a bulb until it burns out, we are often interested in estimating the difference between life of two different bulbs, so that we know which of the bulbs will last longer in average, and then we know that bulb has a higher "quality".

First, let us discuss the case where the two normal distributions are independent.

Now, the problem is that how should we construct a confidence interval for the difference in two means. It seems that we can just construct two $1-\alpha$ confidence intervals $[L(\mathbf {X} ),U(\mathbf {X} )],[L(\mathbf {Y} ),U(\mathbf {Y} )]$ for each of the two means $\mu _{X},\mu _{Y}$ respectively. Then, the $1-\alpha$ confidence interval for $\mu _{X}-\mu _{Y}$ is $[L(\mathbf {X} )-L(\mathbf {Y} ),U(\mathbf {X} )-U(\mathbf {Y} )]$ . However, this is indeed incorrect since when we have $\mathbb {P} (L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} ))=1-\alpha$ and $\mathbb {P} (L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} ))=1-\alpha$ , it does not mean that $\mathbb {P} (L(\mathbf {X} )-L(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-U(\mathbf {Y} ))=1-\alpha$ (there are no results in probability that justify this).

On the other hand, it seems that since $\{L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} )\}$ and $\{L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} )\}$ are independent (since the normal distributions we are considering are independent), then we have $\mathbb {P} (L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} ){\text{ and }}L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} ))=(1-\alpha )^{2}.$ Then, when $L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} )$ and $L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} )$ , we have $L(\mathbf {X} )-U(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-L(\mathbf {Y} ),$ so $\mathbb {P} (L(\mathbf {X} )-U(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-L(\mathbf {Y} ))=(1-\alpha )^{2},$ which means $[L(\mathbf {X} )-U(\mathbf {Y} ),U(\mathbf {X} )-L(\mathbf {Y} )]$ is a $(1-\alpha )^{2}$ confidence interval.

However, this is actually also incorrect. The flaw is that "when $L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} )$ and $L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} )$ , we have $L(\mathbf {X} )-U(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-L(\mathbf {Y} )$ " only means $\{L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} ){\text{ and }}L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} )\}\subseteq \{L(\mathbf {X} )-U(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-L(\mathbf {Y} )\}$ (we do not have the reverse subset inclusion in general). This in turn means $(1-\alpha )^{2}=\mathbb {P} (L(\mathbf {X} )\leq \mu _{X}\leq U(\mathbf {X} ){\text{ and }}L(\mathbf {Y} )\leq \mu _{Y}\leq U(\mathbf {Y} )){\color {darkgreen}\leq }\mathbb {P} (L(\mathbf {X} )-U(\mathbf {Y} )\leq \mu _{X}-\mu _{Y}\leq U(\mathbf {X} )-L(\mathbf {Y} )).$ So, $[L(\mathbf {X} )-U(\mathbf {Y} ),U(\mathbf {X} )-L(\mathbf {Y} )]$ is actually not a $(1-\alpha )^{2}$ confidence interval (in general).

So, the above two "methods" to construct confidence intervals for difference in means of two independent normal distributions actually do not work. Indeed, we do not use the confidence interval for each of the two means, which is constructed previously, to construct a confidence interval for difference in the two means. Instead, we consider a pivotal quantity of the difference in the two means, which is a standard way for constructing confidence intervals.

Theorem. (Confidence interval of $\mu _{X}-\mu _{Y}$ when $\sigma _{X}^{2}$ and $\sigma _{Y}^{2}$ is known) Let $X_{1},\dotsc ,X_{\color {darkgreen}n}$ and $Y_{1},\dotsc ,Y_{\color {darkgreen}m}$ be a random sample from two independent distributions ${\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and ${\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ (i.e. the random variables $X\sim {\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and $Y\sim {\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ are independent) respectively, where $\sigma _{X}^{2}$ and $\sigma _{Y}^{2}$ are known. Then, a $1-\alpha$ confidence interval for $\mu _{X}-\mu _{Y}$ is $\left[({\overline {X}}-{\overline {Y}})-z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}},({\overline {X}}-{\overline {Y}})+z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\right]$

Remark.

The corresponding interval estimate is $\left[({\overline {x}}-{\overline {y}})-z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}},({\overline {x}}-{\overline {y}})+z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\right]$ with observed values ${\overline {X}}={\overline {x}}$ and ${\overline {Y}}={\overline {y}}$ .

Exercise. Show that ${\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}\sim {\mathcal {N}}(0,1)$ (the meaning of the notations follows the above theorem).

Solution

Proof. First, we have ${\overline {X}}\sim {\mathcal {N}}(\mu _{X},\sigma _{X}^{2}/n)$ and ${\overline {Y}}\sim {\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2}/m)$ by property of normal distribution ( $X_{1},\dotsc ,X_{n}$ , and $Y_{1},\dotsc ,Y_{m}$ are independent random samples). Then, applying the property of normal distribution again (the two distributions ${\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and ${\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ are independent, and hence ${\overline {X}}$ and ${\overline {Y}}$ are independent), we have ${\overline {X}}-{\overline {Y}}\sim {\mathcal {N}}(\mu _{X}-\mu _{Y},\sigma _{X}^{2}/n+(-1)^{2}\sigma _{Y}^{2}/m)\equiv {\mathcal {N}}(\mu _{X}-\mu _{Y},\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m).$ It follows by applying the property again that ${\frac {({\overline {X}}-{\overline {Y}}){\color {blue}-(\mu _{X}-\mu _{Y})}}{\color {red}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}}\sim {\mathcal {N}}\left({\frac {(\mu _{X}-\mu _{Y}){\color {blue}-(\mu _{X}-\mu _{Y})}}{\color {red}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}},{\frac {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}{\color {red}\left({\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}\right)^{2}}}\right)\equiv {\mathcal {N}}(0,1).$

$\Box$

Now, we will prove the above theorem based on the result shown in the previous exercise:

Proof. Let $Z={\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}\sim {\mathcal {N}}(0,1)$ (from the previous exercise). Then, $Z$ is a pivotal quantity of $\mu _{X}-\mu _{Y}$ . Hence, we have ${\begin{aligned}1-\alpha &=\mathbb {P} (-z_{\alpha /2}\leq Z\leq z_{\alpha /2})\\&=\mathbb {P} \left(-z_{\alpha /2}\leq {\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}\leq z_{\alpha /2}\right)\\&=\mathbb {P} \left(-z_{\alpha /2}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}\leq ({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})\leq z_{\alpha /2}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/n}}\right)\\&=\mathbb {P} \left(z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\geq (\mu _{X}-\mu _{Y})-({\overline {X}}-{\overline {Y}})\geq -z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\right)\\&=\mathbb {P} \left(({\overline {X}}-{\overline {Y}})-z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\leq \mu _{X}-\mu _{Y}\leq ({\overline {X}}-{\overline {Y}})+z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\right).\\\end{aligned}}$

$\Box$

Example. A statistician wants to compare two kinds of light bulbs (brand A vs. brand B) by their lifetime (amount of time until the bulb burns out). He takes a random sample of 10 light bulbs from the light bulbs of each of the brands, and measure their lifetime. The following is the summary of the results: ${\begin{array}{cc}{\text{Brand}}&{\text{Sample mean (in hours)}}\\\hline A&4000\\B&4200\\\end{array}}$ Based on past studies, the statistician knows that the standard deviation of the lifetime for brand A light bulb and brand B light bulb is 600 hours and 150 hours respectively. Assume the distribution of the lifetime is normal.

(a) Construct a 95% confidence interval for the mean lifetime of brand A light bulb ( $\mu _{A}$ ) and brand B light bulb ( $\mu _{B}$ ) respectively.

(b) Construct a 95% confidence interval for $\mu _{B}-\mu _{A}$ .

(c) Can the statistician conclude with 95% confidence that brand B light bulb has a longer lifetime than brand A light bulb on average?

Solution.

(a) Since $z_{0.025}\approx 1.96$ and the sample size for each of the random samples is 10, a 95% confidence interval for $\mu _{A}$ is $\left[4000-1.96\cdot {\frac {600}{\sqrt {10}}},4000+1.96\cdot {\frac {600}{\sqrt {10}}}\right]\approx [3628.116,4371.884],$ and a 95% confidence interval for $\mu _{B}$ is $\left[4200-1.96\cdot {\frac {150}{\sqrt {10}}},4200+1.96\cdot {\frac {150}{\sqrt {10}}}\right]\approx [4107.029,4292.971].$

(b) A 95% confidence interval for $\mu _{B}-\mu _{A}$ is $\left[(4200-4000)-1.96{\sqrt {{\frac {600}{10}}+{\frac {150}{10}}}},(4200-4000)+1.96{\sqrt {{\frac {600}{10}}+{\frac {150}{10}}}}\right]\approx [183.026,216.974].$

(c) Since all values in the 95% confidence interval in (b) are positive, it means the statistician can be 95% confident that mean lifetime of brand B light bulb is longer than brand A light bulb.

Remark.

Notice that some values in the 95% confidence interval for $\mu _{A}$ exceed all values in the 95% confidence interval for $\mu _{B}$ . However, we are still 95% confident that $\mu _{B}$ exceeds $\mu _{A}$ .

Exercise. Suppose there is a brand C light bulb, and the statistician also takes a random sample of 10 light bulbs from brand C light bulbs. It is observed that the sample mean of this random sample is 4210 hours, and the standard deviation of brand C light bulbs is a known to be $\sigma _{C}$ hours. Assume the distribution of the lifetime is normal.

After constructing 95% confidence intervals using the above theorem, the statistician is 95% confident that the brand C light bulb has a longer or same lifetime than both brand A and B light bulbs on average. Show that the maximum value of $\sigma _{C}$ is (approximately) 110.31.

Solution

Proof. Let $\mu _{C}$ be the mean lifetime of brand C light bulb.

A 95% confidence interval for $\mu _{C}-\mu _{A}$ is $\left[(4210-4000)-1.96{\sqrt {{\frac {600}{10}}+{\frac {\sigma _{C}}{10}}}},(4210-4000)+1.96{\sqrt {{\frac {600}{10}}+{\frac {\sigma _{C}}{10}}}}\right],$ and a 95% confidence interval for $\mu _{C}-\mu _{B}$ is $\left[(4210-4200)-1.96{\sqrt {{\frac {150}{10}}+{\frac {\sigma _{C}}{10}}}},(4210-4200)+1.96{\sqrt {{\frac {150}{10}}+{\frac {\sigma _{C}}{10}}}}\right].$ In order for the statistician to be 95% confident that the brand C light bulb has a longer or same lifetime than both brand A and B light bulbs, the lower bound of both of these confidence intervals should be at least 0, i.e. ${\begin{cases}(4210-4000)-1.96{\sqrt {{\frac {600}{10}}+{\frac {\sigma _{C}}{10}}}}\geq 0\\(4210-4200)-1.96{\sqrt {{\frac {150}{10}}+{\frac {\sigma _{C}}{10}}}}\geq 0\\\end{cases}}\implies {\begin{cases}210\geq 1.96{\sqrt {{\frac {600}{10}}+{\frac {\sigma _{C}}{10}}}}\\10\geq 1.96{\sqrt {{\frac {150}{10}}+{\frac {\sigma _{C}}{10}}}}\\\end{cases}}\implies {\begin{cases}\sigma _{C}\leq 114195.92\\\sigma _{C}\leq 110.31\\\end{cases}}.$ Hence, the maximum value of $\sigma _{C}$ is 110.31.

$\Box$

Now, we will consider the case where the variances are unknown. In this case, the construction of the confidence interval for the difference in means is more complicated, and even more complicated when $\sigma _{X}^{2}\neq \sigma _{Y}^{2}$ . Thus, we will only discuss the case where $\sigma _{X}^{2}=\sigma _{Y}^{2}$ is unknown. As you may expect, we will also use some results mentioned previously for constructing confidence interval for $\mu$ when $\sigma ^{2}$ is unknown in this case.

Theorem. (Confidence interval of $\mu _{X}-\mu _{Y}$ when $\sigma _{X}^{2}=\sigma _{Y}^{2}=\sigma ^{2}$ is unknown) Let $X_{1},\dotsc ,X_{\color {darkgreen}n}$ and $Y_{1},\dotsc ,Y_{\color {darkgreen}m}$ be a random sample from two independent distributions ${\mathcal {N}}(\mu _{X},{\color {darkgreen}\sigma ^{2}})$ and ${\mathcal {N}}(\mu _{Y},{\color {darkgreen}\sigma ^{2}}$ ) respectively. Then, a $1-\alpha$ confidence interval for $\mu _{X}-\mu _{Y}$ is $\left[({\overline {X}}-{\overline {Y}})-t_{\alpha /2,n+m-2}{\sqrt {{\frac {nS_{X}^{2}+mS_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}},({\overline {X}}-{\overline {Y}})+t_{\alpha /2,n+m-2}{\sqrt {{\frac {nS_{X}^{2}+mS_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}}\right]$ where $S_{X}^{2}$ and $S_{Y}^{2}$ are the sample variance of the random sample $X_{1},\dotsc ,X_{\color {darkgreen}n}$ and $Y_{1},\dotsc ,Y_{\color {darkgreen}m}$ respectively.

Remark.

The corresponding interval estimate is $\left[({\overline {x}}-{\overline {y}})-t_{\alpha /2,n+m-2}{\sqrt {{\frac {ns_{X}^{2}+ms_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}},({\overline {x}}-{\overline {y}})+t_{\alpha /2,n+m-2}{\sqrt {{\frac {ns_{X}^{2}+ms_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}}\right]$ , with observed values ${\overline {X}}={\overline {x}},{\overline {Y}}={\overline {y}},S_{X}=s_{x},{\text{ and }}S_{Y}=s_{y}$ .

Proof. Let $Z={\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma ^{2}/n+\sigma ^{2}/m}}}\sim {\mathcal {N}}(0,1)$ (the reason for this to follow ${\mathcal {N}}(0,1)$ is shown in a previous exercise). From a previous result, we know that $V={\frac {nS_{X}^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}$ and $W={\frac {mS_{Y}^{2}}{\sigma ^{2}}}\sim \chi _{m-1}^{2}$ . Then, we know that the mgf of $V$ is $M_{V}(t)=(1-2t)^{-(n-1)/2}$ and the mgf of $W$ is $M_{W}(t)=(1-2t)^{-(m-1)/2}$ . Since the distributions ${\mathcal {N}}(\mu _{X},\sigma ^{2})$ and ${\mathcal {N}}(\mu _{Y},\sigma ^{2})$ are independent, the mgf of $U=V+W$ is $M_{U}(t)=M_{V+W}(t)=M_{V}(t)M_{W}(t)=(1-2t)^{-(n-1)/2-(m-1)/2}=(1-2t)^{-(n+m-2)/2}.$ Hence, $W\sim \chi _{n+m-2}^{2}$ .

By the independence of sample mean and sample variance ( ${\overline {X}}$ and $S_{X}^{2}$ are independent, ${\overline {Y}}$ and $S_{Y}^{2}$ are independent), we can deduce that $Z$ and $U$ are independent. Thus, by the definition of $t$ -distribution, ${\begin{aligned}T&={\frac {Z}{\sqrt {{\color {darkgreen}U}/(n+m-2)}}}\\&={\frac {{\big (}({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y}){\big )}/{\sqrt {\sigma ^{2}/n+\sigma ^{2}/m}}}{\sqrt {{\color {darkgreen}(nS_{X}^{2}+mS_{Y}^{2})}/{\big (}{\color {darkgreen}\sigma ^{2}}(n+m-2){\big )}}}}\\&={\frac {{\big (}({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y}){\big )}/\left({\cancel {\sigma }}{\color {blue}{\sqrt {1/n+1/m}}}\right)}{{\cancel {({\color {darkgreen}1/\sigma })}}{\sqrt {{\color {darkgreen}(nS_{X}^{2}+mS_{Y}^{2})}/(n+m-2)}}}}&({\text{this step is not possible without the equal variance assumption}})\\&={\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{{\color {blue}{\sqrt {1/n+1/m}}}{\sqrt {(nS_{X}^{2}+mS_{Y}^{2})/(n+m-2)}}}}\\&=\left(({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})\right){\Bigg /}{\sqrt {{\frac {nS_{X}^{2}+mS_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}}\\&={\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{R}}&\left(R={\sqrt {{\frac {nS_{X}^{2}+mS_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}}\right)\end{aligned}}$ follows $t_{n+m-2}$ . Therefore, $T\sim t_{n+m-2}$ is a pivotal quantity of $\mu _{X}-\mu _{Y}$ . Hence, we have ${\begin{aligned}1-\alpha &=\mathbb {P} (-t_{\alpha /2,n+m-2}\leq T\leq t_{\alpha /2,n+m-2})\\&=\mathbb {P} \left(-t_{\alpha /2,n+m-2}\leq {\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{R}}\leq t_{\alpha /2,n+m-2}\right)\\&=\mathbb {P} \left(-t_{\alpha /2,n+m-2}R\leq ({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})\leq t_{\alpha /2,n+m-2}R\right)\\&=\mathbb {P} \left(({\overline {X}}-{\overline {Y}})+t_{\alpha /2,n+m-2}R\geq (\mu _{X}-\mu _{Y})\geq ({\overline {X}}-{\overline {Y}})-t_{\alpha /2,n+m-2}R\right)\\&=\mathbb {P} \left(({\overline {X}}-{\overline {Y}})-t_{\alpha /2,n+m-2}R\leq (\mu _{X}-\mu _{Y})\leq ({\overline {X}}-{\overline {Y}})+t_{\alpha /2,n+m-2}R\right)\\&=\mathbb {P} \left(({\overline {X}}-{\overline {Y}})-t_{\alpha /2,n+m-2}{\sqrt {{\frac {nS_{X}^{2}+mS_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}}\leq (\mu _{X}-\mu _{Y})\leq ({\overline {X}}-{\overline {Y}})+t_{\alpha /2,n+m-2}{\sqrt {{\frac {nS_{X}^{2}+mS_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}}\right).\\\end{aligned}}$ The result follows.

$\Box$

Example.

There are two lakes in a country, one located at north, called North Lake, and another located at south, called South Lake. Suppose the weight of the fishes in North Lake and South Lake follows ${\mathcal {N}}(\mu _{X},\sigma ^{2})$ and ${\mathcal {N}}(\mu _{Y},\sigma ^{2})$ , where $\sigma ^{2}$ is unknown. A fisher Bob wants to compare the mean weight of the fishes in North Lake and South Lake so that he can choose the lake with a greater mean weight of fishes for fishing. For comparison, Bob went to North Lake and fished there in day 1. In day 2, he went to South Lake instead and fished there. The following are some descriptions about the fishes caught: ${\begin{array}{cccc}{\text{Lake}}&{\text{number of fishes caught}}&{\text{Sample mean of weight (in kg)}}&{\text{Sample standard deviation of weight (in kg)}}\\\hline {\textit {North}}\;{\textit {Lake}}&127&1.2&0.8\\{\textit {South}}\;{\textit {Lake}}&153&1.5&0.3\\\end{array}}$ Can Bob be 90% confident that he should choose South Lake for fishing?

Solution. First, we have $t_{0.05,153+127-2}=t_{0.05,288}$ . Since the degree of freedom is so large that the corresponding value cannot be found in $t$ -table, we may use $z_{0.05}\approx 1.64$ to approximate it. A 90% confidence interval for $\mu _{Y}-\mu _{X}$ is $\left[(1.5-1.2)-t_{0.05,288}{\sqrt {{\frac {(153)(0.3)+127(0.8)}{288}}\left({\frac {1}{153}}+{\frac {1}{127}}\right)}},(1.5-1.2)+t_{0.05,288}{\sqrt {{\frac {(153)(0.3)+127(0.8)}{288}}\left({\frac {1}{153}}+{\frac {1}{127}}\right)}}\right]\approx [0.3-(1.64)(0.0859),0.3+(1.64)(0.0859)]\approx [0.159,0.441].$ Since all values in the confidence interval exceed 0, Bob can be 90% confident that $\mu _{Y}>\mu _{X}$ , i.e. the mean weight of fishes in South Lake is greater than that in North Lake, and hence he can be 90% confident that he should choose South Lake for fishing.

Now, what if the two normal distributions concerned are dependent? Clearly, we cannot use the above results anymore, and we need to develop a new method to construct a confidence interval for the difference of means in this case. In this case, we need to consider the notion of paired samples.

Proposition. Let $X_{1},\dotsc ,X_{\color {darkgreen}n}$ and $Y_{1},\dotsc ,Y_{\color {darkgreen}n}$ (the sample sizes must be the same) be an independent (this is referring to each of the random sample) random sample from two normal distributions (may be dependent) ${\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and ${\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ , and $D_{i}=X_{i}-Y_{i}$ for each $i\in \{1,\dotsc ,n\}$ . Then, $D_{1},\dotsc ,D_{n}$ are independent and $D_{1},\dotsc ,D_{n}\sim {\mathcal {N}}(\mu _{D},\sigma _{D}^{2})$ where $\mu _{D}=\mu _{X}-\mu _{Y}$ and $\sigma _{D}^{2}=\sigma _{X}^{2}+\sigma _{Y}^{2}-2\operatorname {Cov} (X,Y)$ ( $X\sim {\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and $Y\sim {\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ .

Remark.

$(X_{1},Y_{1}),\dotsc ,(X_{n},Y_{n})$ are called paired samples in this case.

Proof.

1. Independence of $D_{1},\dotsc ,D_{n}$ :

Since $X_{1},\dotsc ,X_{n}$ are independent, and $Y_{1},\dotsc ,Y_{n}$ are independent, it follows that $X_{1}-Y_{1},\dotsc ,X_{n}-Y_{n}$ are independent, which is what we want to show.

2. $D_{1},\dotsc ,D_{n}\sim {\mathcal {N}}{\big (}\mu _{X}-\mu _{Y},\sigma _{X}^{2}+\sigma _{Y}^{2}+2\operatorname {Cov} (X,Y){\big )}$ :

To show that $D_{1},\dotsc ,D_{n}$ still follow normal distribution, we can consider the pdf of $D_{i}$ for each $i\in \{1,\dotsc ,n\}$ . The pdf can be obtained using the transformation of random variables formula: e.g., let $U=X-Y$ and $V=Y$ where $X\sim {\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and ${\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ . Then, the pdf of $U$ obtained, which is the pdf of $D_{i}$ , should be in the form of normal distribution.

However, since the actual derivation process is somewhat complicated, it is omitted here.

Of course, the mean and variance of $D_{i}$ can be observed from the pdf of $D_{i}$ determined previously. Alternatively, before determining the pdf of $D_{i}$ , we can also know that the mean of $D_{i}$ is $\mathbb {E} [D_{i}]=\mathbb {E} [X_{i}]-\mathbb {E} [Y_{i}]=\mu _{X}-\mu _{Y}$ (we use the linearity of expectation here, which does not require independence assumption), and the variance of $D_{i}$ is $\operatorname {Var} (D_{i})=\operatorname {Var} (X_{i})+(-1)^{2}\operatorname {Var} (Y_{i})+2\operatorname {Cov} (X_{i},-Y_{i})=\sigma _{X}^{2}+\sigma _{Y}^{2}+2(-1)\operatorname {Cov} (X_{i},Y_{i})=\sigma _{X}^{2}+\sigma _{Y}^{2}-2\operatorname {Cov} (X,Y)$ ( $X\sim {\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and $Y\sim {\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ ).

$\Box$

Corollary. (Confidence interval of $\mu _{X}-\mu _{Y}$ when $\sigma _{D}^{2}$ is known) Let $X_{1},\dotsc ,X_{\color {darkgreen}n}$ and $Y_{1},\dotsc ,Y_{\color {darkgreen}n}$ be a random sample from two normal distributions ${\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and ${\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ . Then, a $1-\alpha$ confidence interval for $\mu _{D}=\mu _{X}-\mu _{Y}$ is $\left[{\overline {D}}-z_{\alpha /2}{\frac {\sigma _{D}}{\sqrt {n}}},{\overline {D}}+z_{\alpha /2}{\frac {\sigma _{D}}{\sqrt {n}}}\right]$ where $D_{i}=X_{i}-Y_{i}$ for each $i\in \{1,\dotsc ,n\}$ , ${\overline {D}}={\frac {\sum _{i=1}^{n}D_{i}}{n}}$ , $\sigma _{D}$ is the standard deviation of $D_{i}$ , and $\sigma _{D}^{2}$ is known.

Remark.

The corresponding interval estimate is $\left[{\overline {d}}-z_{\alpha /2}{\frac {\sigma _{D}}{\sqrt {n}}},{\overline {d}}+z_{\alpha /2}{\frac {\sigma _{D}}{\sqrt {n}}}\right]$ with observed value ${\overline {D}}={\overline {d}}$ .

Proof. From the previous proposition, we know that $D_{1},\dotsc ,D_{n}$ is a random sample from ${\mathcal {N}}(\mu _{D},\sigma _{D}^{2})$ . Since $\sigma _{D}^{2}$ is known, it follows from a previous theorem that a $1-\alpha$ confidence interval for $\mu _{D}=\mu _{X}-\mu _{Y}$ is $\left[{\overline {D}}-z_{\alpha /2}{\frac {\sigma _{D}}{\sqrt {n}}},{\overline {D}}+z_{\alpha /2}{\frac {\sigma _{D}}{\sqrt {n}}}\right].$

$\Box$

Corollary. (Confidence interval of $\mu _{X}-\mu _{Y}$ when $\sigma _{D}^{2}$ is unknown) Let $X_{1},\dotsc ,X_{\color {darkgreen}n}$ and $Y_{1},\dotsc ,Y_{\color {darkgreen}n}$ be a random sample from two normal distributions ${\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and ${\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ . Then, a $1-\alpha$ confidence interval for $\mu _{D}=\mu _{X}-\mu _{Y}$ is $\left[{\overline {D}}-t_{\alpha /2,n-1}{\frac {S_{D}}{\sqrt {n-1}}},{\overline {D}}+t_{\alpha /2,n-1}{\frac {S_{D}}{\sqrt {n-1}}}\right]$ where $D_{i}=X_{i}-Y_{i}$ for each $i\in \{1,\dotsc ,n\}$ , ${\overline {D}}={\frac {\sum _{i=1}^{n}D_{i}}{n}}$ , $S_{D}$ is the sample standard deviation of $D_{1},\dotsc ,D_{n}$ , and the variance of $D_{i}$ is unknown.

Remark.

The corresponding interval estimate is $\left[{\overline {d}}-t_{\alpha /2,n-1}{\frac {s_{D}}{\sqrt {n-1}}},{\overline {d}}+t_{\alpha /2,n-1}{\frac {s_{D}}{\sqrt {n-1}}}\right]$ with observed values ${\overline {D}}={\overline {d}}$ and $S_{D}=s_{D}$ .

Exercise. Prove the above corollary.

Solution

Proof. From the previous proposition, we know that $D_{1},\dotsc ,D_{n}$ is a random sample from ${\mathcal {N}}(\mu _{D},\sigma _{D}^{2})$ . Since $\sigma _{D}^{2}$ is unknown, it follows from a previous theorem that a $1-\alpha$ confidence interval for $\mu _{D}=\mu _{X}-\mu _{Y}$ is $\left[{\overline {D}}-t_{\alpha /2,n-1}{\frac {S_{D}}{\sqrt {n-1}}},{\overline {D}}+t_{\alpha /2,n-1}{\frac {S_{D}}{\sqrt {n-1}}}\right].$

$\Box$

Example. A fertilizer company wants to advertise the effect of its corn fertilizer. Thus, the company plants 5 corn seeds in each of two neighbouring places: places X and Y, where the corn seeds involved are identical. After that, the company uses fertilizer in place Y only (apart from this, all other conditions in place X and Y are the same). The weight of the corns harvested (in g) in the two places is summarized below: ${\begin{array}{cccccc}{\text{Corn}}&1&2&3&4&5\\\hline {\text{Place X}}&300&320&290&315&330\\{\text{Place Y}}&340&315&320&390&380\\\end{array}}$ Let $x_{i}$ and $y_{i}$ (the realizations of random variables $X_{i}$ and $Y_{i}$ respectively) be the weight (in g) of the corn $i$ harvested in place X and place Y respectively, and $w_{i}=y_{i}-x_{i}$ (the realization of random variable $W_{i}$ ) be the improvement of the weight by using the fertilizer. Suppose $X_{1},\dotsc ,X_{5}$ is a random sample from ${\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and $Y_{1},\dotsc ,Y_{5}$ is a random sample from ${\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ , and the variance of $W_{i}$ is unknown ( $i=1,2,\dotsc ,5$ ).

Construct a 95% confidence interval for $\mu _{Y}-\mu _{X}$ .

Solution.

First, the values of $w_{i}$ is summarized as follows: ${\begin{array}{cccccc}i&1&2&3&4&5\\\hline w_{i}&40&-5&30&75&50\\\end{array}}$ Hence, the sample mean and the sample standard deviation of $W_{1},\dotsc ,W_{5}$ are observed to be ${\overline {w}}=38$ and $s_{W}\approx 26.192$ respectively. Since $t_{0.025,4}\approx 2.776$ , it follows that a 95% confidence interval for $\mu _{Y}-\mu _{X}$ is $\left[38-(2.776){\frac {26.192}{\sqrt {4}}},38+(2.776){\frac {26.192}{\sqrt {4}}}\right]\approx [1.646,74.354].$

Exercise. Suppose it is known that $\sigma _{X}=\sigma _{Y}=25$ , and the correlation coefficient of $X\sim {\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and $Y\sim {\mathcal {(}}\mu _{X},\sigma _{X}^{2})$ is $\rho =0.7$ . Construct a 95% confidence interval for $\mu _{Y}-\mu _{X}$ based on the theorem for the case of known variance.

Solution

The variance of $W=Y-X$ is $\sigma _{W}^{2}=\sigma _{X}^{2}+\sigma _{Y}^{2}-2\operatorname {Cov} (X,Y)=25^{2}+25^{2}-2\rho \sigma _{X}\sigma _{Y}=1250-2(0.7)(625)=375$ . Hence, $\sigma _{W}={\sqrt {375}}\approx 19.365$ . Since $z_{0.025}\approx 1.96$ , a 95% confidence interval for $\mu _{Y}-\mu _{X}$ is $\left[38-1.96\cdot {\frac {19.365}{\sqrt {5}}},38+1.96\cdot {\frac {19.365}{\sqrt {5}}}\right]\approx [21.026,54.974].$

Confidence intervals for variances of normal distributions

Variance of a normal distribution

After discussing the confidence intervals for means of normal distributions, let us consider the confidence intervals for variances of normal distributions. Similarly, we need to consider a pivotal quantity of $\sigma ^{2}$ . Can you suggest a pivotal quantity of $\sigma ^{2}$ , based on a previous result discussed?

Recall that we have ${\frac {nS^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}$ , and $\chi _{n-1}^{2}$ is independent from $\sigma ^{2}$ with some suitable assumptions. Thus, this result gives us a pivotal quantity of $\sigma ^{2}$ , namely ${\frac {nS^{2}}{\sigma ^{2}}}$ . Before discussing the theorem for constructing a confidence interval for $\sigma ^{2}$ . Let us introduce a notation:

$\chi _{\alpha ,\nu }^{2}$ is the upper percentile of $\chi _{\nu }^{2}$ at level $\alpha$ , i.e. it satisfies $\mathbb {P} (X\geq \chi _{\alpha ,\nu }^{2})=\alpha$ where $X\sim \chi _{\nu }^{2}$ .

Some values of $\chi _{\alpha ,\nu }^{2}$ can be found in the chi-squared table.

To find the value of $\chi _{\alpha ,\nu }^{2}$ , locate the row for $\nu$ degrees of freedom and the column for "probability content" $\alpha$ .

Theorem. (Confidence interval of $\sigma ^{2}$ ) Let $X_{1},\dotsc ,X_{n}$ be a random sample from ${\mathcal {N}}(\mu ,\sigma ^{2})$ . Then, a $1-\alpha$ confidence interval for $\sigma ^{2}$ is $\left[{\frac {nS^{2}}{\chi _{\alpha ,n-1}^{2}}},{\frac {nS^{2}}{\chi _{1-\alpha /2,n-1}^{2}}}\right].$

Remark.

The corresponding interval estimate is $\left[{\frac {ns^{2}}{\chi _{\alpha ,n-1}^{2}}},{\frac {ns^{2}}{\chi _{1-\alpha /2,n-1}^{2}}}\right]$ with observed value $S^{2}=s^{2}$ .

Proof. Since $Y={\frac {nS^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}$ , set $1-\alpha =(1-\alpha /2)-\alpha /2=\mathbb {P} (Y\geq \chi _{1-\alpha /2,n-1}^{2})-\mathbb {P} (Y\geq \chi _{\alpha /2,n-1}^{2})=\mathbb {P} (\chi _{1-\alpha /2,n-1}^{2}\leq Y<\chi _{\alpha /2,n-1}^{2})=\mathbb {P} (\chi _{1-\alpha /2,n-1}^{2}\leq Y<\chi _{\alpha /2,n-1}^{2}).$ ^[5] Then, we have ${\begin{aligned}1-\alpha &=\mathbb {P} \left(\chi _{1-\alpha /2,n-1}^{2}\leq {\frac {nS^{2}}{\sigma ^{2}}}\leq \chi _{\alpha /2,n-1}^{2}\right)\\&=\mathbb {P} \left({\frac {\chi _{1-\alpha /2,n-1}^{2}}{nS^{2}}}\leq {\frac {1}{\sigma ^{2}}}\leq {\frac {\chi _{\alpha /2,n-1}^{2}}{nS^{2}}}\right)\\&=\mathbb {P} \left({\frac {nS^{2}}{\chi _{1-\alpha /2,n-1}^{2}}}\geq \sigma ^{2}\geq {\frac {nS^{2}}{\chi _{\alpha /2,n-1}^{2}}}\right)\\&=\mathbb {P} \left({\frac {nS^{2}}{\chi _{\alpha /2,n-1}^{2}}}\leq \sigma ^{2}\leq {\frac {nS^{2}}{\chi _{1-\alpha /2,n-1}^{2}}}\right).\\\end{aligned}}$ The result follows.

$\Box$

Example.

A candy company recently offers a new type of chocolate, where each chocolate is supposed to weigh 10g. To have a quality control (QC) on the production process of a batch of the chocolates, the company takes a random sample of 20 chocolates from a factory for producing this type of chocolate. After measuring the weight of these 20 chocolates, it is found that the sample standard deviation of these 20 chocolates is 0.03g. To pass the QC, the standard deviation of the weight of the whole batch of chocolates, $\sigma$ , should not exceed 0.5% of the weight each chocolate is supposed to weigh, with 99% confidence (based on the above construction of confidence interval). Assume the distribution of the weight is normal.

Can the QC be passed?

Solution. Since $\chi _{0.005,19}^{2}\approx 38.582$ and $\chi _{0.995,19}^{2}\approx 6.844$ , a 99% confidence interval for $\sigma ^{2}$ is $\left[{\frac {20(0.03)^{2}}{38.582}},{\frac {20(0.03)^{2}}{6.844}}\right]\approx [0.000467,0.00263].$ Considering the proof of the theorem for constructing this confidence interval, we know that a 99% confidence interval for $\sigma$ can be obtained by taking positive square root for both lower and upper bounds of the above confidence interval. Thus, a 99% confidence interval for $\sigma$ is $[0.0216,0.0513].$ Since $10\times 0.5\%=0.05$ , and some values in this confidence interval for $\sigma$ exceed 0.05, the QC cannot be passed.

Exercise.

(a) What is the maximum/minimum value of the sample standard deviation of the 20 chocolates to pass the QC?

(b) Suppose the requirement to pass the QC becomes less strict. Can the QC be passed if

(i) the "0.5%" is increased to "1%";

(ii) the "99% confidence" is decreased to "95% confidence"?

Solution

(a) To pass the QC, the upper bound of the 99% confidence interval should be at most 0.05. Hence, ${\frac {20s^{2}}{6.844}}\leq 0.05\implies s\leq 0.131\;({\text{approximately}})$ (we have $s\geq 0$ , so we consider the positive square root only). Thus, the maximum value of the sample standard deviation is 0.131g (approximately).

(b) (i) In this case, since $10\times 1\%=0.1$ , and all values in the above confidence interval do not exceed 0.1, the QC can be passed.

(ii) Since $\chi _{0.025,19}^{2}\approx 32.852$ and $\chi _{0.975,19}^{2}\approx 8.907$ , a 95% confidence interval for $\sigma ^{2}$ is $\left[{\frac {20(0.03)^{2}}{32.852}},{\frac {20(0.03)^{2}}{8.907}}\right]\approx [0.000548,0.00202].$ Hence, the corresponding 95% confidence interval for $\sigma$ is $[0.0234,0.0449].$ Since all values in this confidence interval do not exceed 0.05, the QC can be passed.

Remark.

Notice that the (sample) mean is not considered in the above calculations. Indeed, it does not play any role in the above construction of confidence interval, so it is not important in this context.

Ratio of variances of two independent normal distributions

Similar to the case for means, we would also sometimes like to compare the variances of two normal distributions. One may naturally expect that we should construct a confidence interval for difference in variances, similar to the case for means. However, there are simple ways to do this, since we do not have some results that help with this construction. Therefore, we need to consider an alternative way to compare the variances, without using the difference in variances. Can you suggest a way?

Recall the definition of efficiency in point estimation. Efficiency gives us a nice way to compare two variances without considering their difference, where the ratio of two variances is considered. Fortunately, we have some results that help us to construct a confidence interval for the ratio of two variances.

Recall that the definition of $F$ -distribution: if $U\sim \chi _{\color {red}\nu _{1}}^{2}$ and $V\sim \chi _{\color {blue}\nu _{2}}^{2}$ are independent, then ${\frac {U/{\color {red}\nu _{1}}}{V/{\color {blue}\nu _{2}}}}$ follows the $F$ -distribution with ${\color {red}\nu _{1}}$ and ${\color {blue}\nu _{2}}$ degrees of freedom, denoted by $F_{{\color {red}\nu _{1}},{\color {blue}\nu _{2}}}$ . From the definition of $F$ -distribution, we can see that it involves a ratio of two independent chi-squared random variables. How can it be linked to the ratio of two variances?

Recall that we have ${\frac {nS^{2}}{\sigma ^{2}}}\sim \chi _{n-1}^{2}$ with some suitable assumptions. This connects the variance with the chi-squared random variable, and thus we can use this property together with the definition of $F$ -distribution to construct a pivotal quantity, and hence a confidence interval.

Let us introduce a notation before discussing the construction of confidence interval:

$F_{\alpha ,\nu _{1},\nu _{2}}$ is the upper percentile of $F_{\nu _{1},\nu _{2}}$ at level $\alpha$ , i.e. it satisfies $\mathbb {P} (X\geq F_{\alpha ,\nu _{1},\nu _{2}})=\alpha$ .

Some values of $F_{\alpha ,\nu _{1},\nu _{2}}$ can be found in $F$ -tables (there is different $F$ -tables for different values of $\alpha$ , and the row and column of each table indicates the first and second degrees of freedom respectively). Also, using the property that $F_{\alpha ,\nu _{1},\nu _{2}}={\frac {1}{F_{1-\alpha ,\nu _{2},\nu _{1}}}}$ , we can obtain some more values of $F_{\alpha ,\nu _{1},\nu _{2}}$ which are not included in the $F$ -tables.

Theorem. (Confidence interval of $\sigma _{X}^{2}/\sigma _{Y}^{2}$ ) Let $X_{1},\dotsc ,X_{\color {darkgreen}n}$ and $Y_{1},\dotsc ,Y_{\color {darkgreen}m}$ be a random sample from two independent normal distributions ${\mathcal {N}}(\mu _{X},\sigma _{X}^{2})$ and ${\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2})$ respectively. Then, a $1-\alpha$ confidence interval for $\sigma _{X}^{2}/\sigma _{Y}^{2}$ is $\left[{\frac {n(m-1)S_{X}^{2}}{m(n-1)S_{Y}^{2}}}\cdot F_{1-\alpha /2,m-1,n-1},{\frac {n(m-1)S_{X}^{2}}{m(n-1)S_{Y}^{2}}}\cdot F_{\alpha /2,m-1,n-1}\right]$ where $S_{X}^{2}$ and $S_{Y}^{2}$ are the sample variances of $X_{1},\dotsc ,X_{\color {darkgreen}n}$ and $Y_{1},\dotsc ,Y_{\color {darkgreen}m}$ respectively.

Remark.

The corresponding interval estimate is $\left[{\frac {n(m-1)s_{X}^{2}}{m(n-1)s_{Y}^{2}}}\cdot F_{1-\alpha /2,m-1,n-1},{\frac {n(m-1)s_{X}^{2}}{m(n-1)s_{Y}^{2}}}\cdot F_{\alpha /2,m-1,n-1}\right]$ , with observed values $S_{X}^{2}=s_{X}^{2}$ and $S_{Y}^{2}=s_{Y}^{2}$ .

Proof. By the assumptions, we have ${\frac {nS_{X}^{2}}{\sigma _{X}^{2}}}\sim \chi _{n-1}^{2}{\text{ and }}{\frac {mS_{Y}^{2}}{\sigma _{Y}^{2}}}\sim \chi _{m-1}^{2}.$ Thus, by the definition of $F$ -distribution, we have ${\frac {mS_{Y}^{2}}{\sigma _{Y}^{2}(m-1)}}{\Bigg /}{\frac {nS_{X}^{2}}{\sigma _{X}^{2}(n-1)}}={\frac {m(n-1)S_{Y}^{2}\sigma _{X}^{2}}{n(m-1)S_{X}^{2}\sigma _{Y}^{2}}}\sim F_{m-1,n-1},$ which is a pivotal quantity of $\sigma _{X}^{2}/\sigma _{Y}^{2}$ . Hence, we have ${\begin{aligned}1-\alpha &=(1-\alpha /2)-\alpha /2\\&=\mathbb {P} \left(F_{1-\alpha /2,m-1,n-1}\leq {\frac {m(n-1)S_{Y}^{2}\sigma _{X}^{2}}{n(m-1)S_{Y}^{2}\sigma _{X}^{2}}}\leq F_{\alpha /2,m-1,n-1}\right)\\&=\mathbb {P} \left({\frac {n(m-1)S_{X}^{2}}{m(n-1)S_{Y}^{2}}}\cdot F_{1-\alpha /2,m-1,n-1}\leq {\frac {\sigma _{X}^{2}}{\sigma _{Y}^{2}}}\leq {\frac {n(m-1)S_{X}^{2}}{m(n-1)S_{Y}^{2}}}\cdot F_{\alpha /2,m-1,n-1}\right),\\\end{aligned}}$ as desired.

$\Box$

Apart from using this confidence interval to compare variances (or standard deviations), it can also be useful to justify some assumptions about variances. Let us illustrate these two usages in the following examples.

Example. (Comparison of standard deviations) An economist wants to compare the severity of the income inequality of countries A and B. Using the Gini coefficient for the comparison is a common way, but somehow both countries do not publish their Gini coefficient, or other measures of income inequality. Thus, the economist decides to have a comparison between the severity of the income inequality of countries A and B, by conducting a survey to the citizens in countries X and Y to ask for their monthly income (in USD), and then compare the standard deviation of the income in country X, $\sigma _{X}$ , and that of the income in country Y, $\sigma _{Y}$ .

The following is summary for the results from the survey: ${\begin{array}{ccc}{\text{Country}}&{\text{Number of respondents}}&{\text{Sample standard deviation of the income (in USD)}}\\\hline {\text{X}}&25&s_{X}=1200\\{\text{Y}}&13&s_{Y}=1320\\\end{array}}$

(a) Construct a 95% confidence interval for ${\frac {\sigma _{X}}{\sigma _{Y}}}$ .

(b) The economist will think that the income inequality in a country is at least as severe as another country if the standard deviation of the income in that country is greater or equal to that in another country. Can the economist be 95% confident that the income inequality in country X is at least as severe as country Y?

Solution.

(a) Since $F_{0.025,13-1,25-1}\approx 2.5411$ (see the column for first degree of freedom to be 12 and the row for second degree of freedom to be 24 in $F$ -table for $\alpha =0.025$ ), and $F_{0.975,13-1,25-1}={\frac {1}{F_{0.025,25-1,13-1}}}\approx {\frac {1}{3.0187}}\approx 0.331$ (the property of $F$ -distribution), a 95% confidence interval for ${\frac {\sigma _{X}^{2}}{\sigma _{Y}^{2}}}$ is $\left[{\frac {25(13-1)(1200^{2})}{13(25-1)(1320^{2})}}\cdot 0.331,{\frac {25(13-1)(1200^{2})}{13(25-1)(1320^{2})}}\cdot 2.5411\right]\approx [0.262,2.01].$ Taking the positive square root for lower and upper bounds of the above confidence interval (considering the above proof, we can do this), a 95% confidence interval for ${\frac {\sigma _{X}}{\sigma _{Y}}}$ is $[0.512,1.418].$

(b) No, since $\sigma _{X}\geq \sigma _{Y}\Leftrightarrow {\frac {\sigma _{X}}{\sigma _{Y}}}\geq 1$ , and there are some values less than to 1 in the confidence interval in (a).

Exercise. What is the minimum/maximum value of $s_{X}$ for the economist to be 95% confident that the income inequality in country X is at least as severe as that in country Y?

Solution

For the economist to be 95% confident that the income inequality in country X is at least as severe as that in country Y, the lower bound of the above 95% confidence interval should be at least 1. That is, ${\sqrt {{\frac {25(13-1)s_{X}^{2}}{13(25-1)(1320^{2})}}\cdot 0.331}}\geq 1\implies s_{X}\geq 2339.79\;({\text{approximately}}).$ Hence, the minimum value of $s_{X}$ is (approximately) 2339.79.

Example. (Justification of assumptions about variance) A statistics question is given to each student in two groups of high school students from high schools X and Y, and the result about the time taken to finish the question (in minutes) is summarized as follows: ${\begin{array}{cccc}{\text{High school}}&{\text{Number of students in the group}}&{\text{Sample mean of the time}}&{\text{Sample standard deviation of the time}}\\\hline {\text{X}}&n=13&{\overline {x}}=134&s_{X}=39\\{\text{Y}}&m=21&{\overline {y}}=79&s_{Y}=14\\\end{array}}$ Assume the distribution of the time is normal. Is using the $1-\alpha$ confidence interval $\left[({\overline {x}}-{\overline {y}})-t_{\alpha /2,n+m-2}{\sqrt {{\frac {ns_{X}^{2}+ms_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}},({\overline {x}}-{\overline {y}})+t_{\alpha /2,n+m-2}{\sqrt {{\frac {ns_{X}^{2}+ms_{Y}^{2}}{n+m-2}}\left({\frac {1}{n}}+{\frac {1}{m}}\right)}}\right]$ "reasonable" to estimate difference between the mean time taken to finish the question of all students in high school X and that in high school?

Solution. For the estimation to be reasonable, we should be able to assume the population variances of the time taken in high school X and Y, $\sigma _{X}$ and $\sigma _{Y}$ respectively, to be the same (this is the assumption for constructing the given confidence interval).

To be able to assume the variances are equal, we need to be quite confident that the ratio ${\frac {\sigma _{X}^{2}}{\sigma _{Y}^{2}}}$ is "close to" 1. In other words, there should be a $1-\alpha$ confidence interval for ${\frac {\sigma _{X}^{2}}{\sigma _{Y}^{2}}}$ that covers the value 1, whose width is sufficiently small (recall that when $\alpha$ increases, the width will become smaller. A rule of thumb for the width to be "sufficiently small", while the confidence is still "sufficiently large" is that the $\alpha$ can take the value of 0.1 ^[6]. Notice that for the confidence interval to cover 1, $\alpha$ is required to be smaller or equal to a certain value, since smaller $\alpha$ gives wider confidence interval, and thus when $\alpha$ gets smaller, the confidence interval gets wider and eventually covers 1 for a certain value of $\alpha$ .).

In this case, a $1-\alpha$ confidence interval for ${\frac {\sigma _{X}^{2}}{\sigma _{Y}^{2}}}$ is $\left[{\frac {13(20)(39^{2})}{21(12)(14^{2})}}\cdot F_{1-\alpha /2,12,20},{\frac {13(20)(39^{2})}{21(12)(14^{2})}}\cdot F_{\alpha /2,12,20}\right]\approx \left[8.007F_{1-\alpha /2,12,20},8.007F_{\alpha /2,12,20}\right]=\left[{\frac {8.007}{F_{\alpha /2,20,12}}},8.007F_{\alpha /2,12,20}\right].$ For this confidence interval to contain 1, we need to have ${\begin{cases}{\frac {8.007}{F_{\alpha /2,20,12}}}&\leq 1\\8.007F_{\alpha /2,12,20}&\geq 1\\\end{cases}}\implies {\begin{cases}F_{\alpha /2,20,12}&\geq 8.007&(1)\\F_{\alpha /2,12,20}&\geq 0.125&(2)\\\end{cases}}$ For (1) to hold, $\alpha$ needs to be "very small" ( $F_{0.01,20,12}\approx 3.858$ , which is still quite small compared to 8.007. So, we know that $\alpha$ must at least be smaller than 0.01) ^[7]. Hence, to satisfy both inequalities, $\alpha$ needs to be very small. In other words, in order for the $1-\alpha$ confidence interval to contain 1, $\alpha$ needs to be very small, which means the width of the confidence interval is very large. Therefore, we are not confident that the ratio ${\frac {\sigma _{X}^{2}}{\sigma _{Y}^{2}}}$ to be close to 1. Hence, we are not able to assume the variances are equal.

Remark.

Graphically, the inequalities look like:
$F_{\alpha /2,20,12}\geq 8.007$

|                  
|     #     
|   #    #  
|  #        #     area = alpha/2 is very small
| #            #   | 
|#               |#v 
|                |//#
*----------------*----
                8.007

$F_{\alpha /2,12,20}\geq 0.125$

|                  
|   #       
|  //////#      area = alpha/2 is very large
| #/////////#    |                            
| |/////////// # v   
|#|///////////////#  
| |/////////////////#
*-*-------------------
0.125              8.007

Approximated confidence intervals for means

Previously, the distributions for the population are assumed to be normal, but the distributions are often not normal in reality. So, does it mean our previous discussions are meaningless in reality? No. The discussions are indeed still quite meaningful in reality, since we can use the central limit theorem to "connect" the distributions in reality (which are usually not normal) to normal distribution. Through this, we can construct approximated confidence intervals, since we use central limit theorem for approximation.

To be more precise, recall that the central limit theorem suggests that ${\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\;{\overset {d}{\to }}\;Z\sim {\mathcal {N}}(0,1)$ with some suitable assumptions. Therefore, if the sample size $n$ is large enough (a rule of thumb: at least 30), then ${\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}$ follows approximately standard normal distribution. Hence, ${\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}$ is a pivotal quantity (approximately). Recall from the property of normal distribution that if $X_{1},\dotsc ,X_{n}$ is a random sample from ${\mathcal {N}}(\mu ,\sigma ^{2})$ , then we have ${\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\sim {\mathcal {N}}(0,1)$ exactly (not approximately), and we have used this for the pivotal quantity for the confidence interval for mean when variance is known, and also the confidence interval for $\mu _{X}-\mu _{Y}$ when $\sigma _{D}^{2}$ is known. Therefore, we can just use basically the same confidence interval in these cases, but we need to notice that such confidence intervals are approximated, but not exact since we have used the central limit theorem for constructing the pivot quantity.

Now, how about the other confidence intervals where the pivotal quantity is "not in this form"? In the confidence interval for difference in means when variance is unknown, the pivotal quantity is similar in some sense: ${\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}\sim {\mathcal {N}}(0,1)$ (see the corresponding theorem for the meaning of the notations involved). Can we use the central limit theorem to conclude that when the distributions involved are not normal (but are still independent), and the sample sizes $n$ and $m$ are both large enough, then ${\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}\sim {\mathcal {N}}(0,1)$ approximately? The answer is yes. For the proof, see the following exercise.

Exercise. Use the central limit theorem to prove that when the distributions involved are not normal (but are still independent), and the sample sizes $n$ and $m$ are both large enough, then ${\frac {({\overline {X}}-{\overline {Y}})-(\mu _{X}-\mu _{Y})}{\sqrt {\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m}}}\sim {\mathcal {N}}(0,1)$ approximately.

Solution

Proof. Under the assumptions, by the central limit theorem, we know that ${\overline {X}}\sim {\mathcal {N}}(\mu _{X},\sigma _{X}^{2}/n)$ approximately and ${\overline {Y}}\sim {\mathcal {N}}(\mu _{Y},\sigma _{Y}^{2}/m)$ approximately. Using these approximations (the distributions involved are still assumed to be independent, so their corresponding approximated distributions are also independent), we apply the property of normal distribution and get ${\overline {X}}-{\overline {Y}}\sim {\mathcal {N}}(\mu _{X}-\mu _{Y},\sigma _{X}^{2}/n+\sigma _{Y}^{2}/m)$ . The result then follows in a similar way as in the previous proof of this result when the distributions involved are normal.

$\Box$

As a result, we know that we can again just use basically the same confidence interval in this case, but of course such confidence interval is approximated.

There are still some confidence intervals that are not considered yet. Let us first consider the confidence interval for mean when the variance is unknown.

Recall that we have mentioned that we can simply replace the " $\sigma$ " by " $S$ " according to the weak law of large number, which is quite intuitive. But why can we do this? Consider the following theorem.

Theorem. (Approximated confidence interval for mean when variance is unknown). Let $X_{1},\dotsc ,X_{n}$ be a random sample from a certain distribution, with finite mean and variance. When the variance is unknown and the sample size $n$ is large (at least 30), an approximated $1-\alpha$ confidence interval for the mean is $\left[{\overline {X}}-z_{\alpha /2}{\frac {S}{\sqrt {n}}},{\overline {X}}+z_{\alpha /2}{\frac {S}{\sqrt {n}}}\right].$

Remark.

The corresponding interval estimate is $\left[{\overline {x}}-z_{\alpha /2}{\frac {s}{\sqrt {n}}},{\overline {x}}+z_{\alpha /2}{\frac {s}{\sqrt {n}}}\right]$ , with observed values ${\overline {X}}={\overline {x}}$ and $S=s$ .
We can also apply this result similarly for constructing the confidence interval for $\mu _{X}-\mu _{Y}$ when $\sigma _{D}^{2}$ is unknown: we just replace the " $\sigma _{D}$ " by $S_{D}$ in the confidence interval when $\sigma _{D}^{2}$ is known to get an approximated confidence interval.

Proof. Under the assumption that the random sample has finite mean and variance, applying weak law of large number gives $S\;{\overset {p}{\to }}\;\sigma$ (we have shown that $S^{2}\;{\overset {p}{\to }}\;\sigma ^{2}$ , then we can just apply continuous mapping theorem to get this). Hence, ${\frac {\sigma }{S}}\;{\overset {p}{\to }}\;{\frac {\sigma }{\sigma }}=1$ ( $\sigma >0$ ) by property of convergence in probability.

By central limit theorem, we have ${\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\;{\overset {d}{\to }}\;Z\sim {\mathcal {N}}(0,1)$ . Thus, ${\frac {{\overline {X}}-\mu }{S/{\sqrt {n}}}}={\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\cdot {\frac {\sigma }{S}}\;{\overset {d}{\to }}\;Z\sim {\mathcal {N}}(0,1)$ by Slutsky's theorem.

Therefore, ${\frac {{\overline {X}}-\mu }{S/{\sqrt {n}}}}$ is a pivotal quantity, which follows ${\mathcal {N}}(0,1)$ approximately. Notice that its approximated distribution, ${\mathcal {N}}(0,1)$ , is the same as that of pivotal quantity for confidence interval for $\mu$ when $\sigma ^{2}$ is known, namely ${\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}$ . As a result, we can use similar steps to obtain the approximated confidence interval, where " $\sigma$ " is replaced by " $S$ ".

$\Box$

So far, we have not discussed how to construct an approximated confidence interval for $\mu _{X}-\mu _{Y}$ when $\sigma _{X}^{2}=\sigma _{Y}^{2}=\sigma ^{2}$ is unknown, as well as approximated confidence intervals of variances. Since the pivotal quantities used are constructed according to some results that are exclusive to normal distributions, they all do not work when the distributions involved are not normal. Therefore, there are no simple ways to perform such constructions.

The following table summarizes the approximated $1-\alpha$ confidence intervals in different cases: ${\begin{array}{c|c|c}{\big (}{\text{approximated }}(1-\alpha ){\text{ confidence intervals}}{\big )}&{\text{mean}}&{\text{difference in means}}\\\hline {\text{known variance}}&\left[{\overline {X}}-z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}},{\overline {X}}+z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}\right]&\left[({\overline {X}}-{\overline {Y}})-z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}},({\overline {X}}-{\overline {Y}})+z_{\alpha /2}{\sqrt {{\frac {\sigma _{X}^{2}}{n}}+{\frac {\sigma _{Y}^{2}}{m}}}}\right]{\text{ OR }}\left[{\overline {D}}-z_{\alpha /2}{\frac {\sigma _{D}}{\sqrt {n}}},{\overline {D}}+z_{\alpha /2}{\frac {\sigma _{D}}{\sqrt {n}}}\right]({\text{paired samples}})\\\hline {\text{unknown variance}}&\left[{\overline {X}}-z_{\alpha /2}{\frac {S}{\sqrt {n}}},{\overline {X}}+z_{\alpha /2}{\frac {S}{\sqrt {n}}}\right]&\left[{\overline {D}}-z_{\alpha /2}{\frac {S_{D}}{\sqrt {n}}},{\overline {D}}+z_{\alpha /2}{\frac {S_{D}}{\sqrt {n}}}\right]({\text{paired samples}})\\\end{array}}$

Remark.

We will more often use the second row in this table for constructing the approximated confidence intervals, since the variances are often unknown in these cases.

Example. Let $X_{1},\dotsc ,X_{30}$ be a random sample from the Bernoulli distribution $\operatorname {Ber} (p)$ . Suppose it is observed that ${\overline {x}}=0.45$ and $s=0.12$ . Construct an approximated 95% confidence interval for the mean of the Bernoulli distribution, $p$ .

Solution. Notice that the variance of the Bernoulli distribution is $p(1-p)$ . Since $p$ is unknown (it is what we want to estimate, so it does not make sense to be known), the variance is unknown. Hence, we consider the above approximated confidence interval for mean in the case where the variance is unknown.

Since $z_{0.025}\approx 1.96$ , an approximated 95% confidence interval for $p$ is $\left[0.45-z_{0.025}{\frac {0.12}{\sqrt {30}}},0.45+z_{0.025}{\frac {0.12}{\sqrt {30}}}\right]\approx \left[0.45-1.96\cdot {\frac {0.12}{\sqrt {30}}},0.45+1.96\cdot {\frac {0.12}{\sqrt {30}}}\right]\approx [0.407,0.493].$

Exercise. You are given a (fair or unfair) coin, and you want to estimate the probability for heads coming up, denoted by $p$ . Define a random variable $X$ such that $X=1$ if heads comes up and $X=0$ otherwise (we assume the coin never land on edge). Suppose you toss the coin 100 times independently. Let $X_{1},\dotsc ,X_{100}$ be the independent random sample corresponding to these 100 tosses, with the same distribution as the random variable $X$ . After tossing the coin 100 times, heads comes up in 68 tosses and tail comes up in 32 tosses. Construct an approximated 90% confidence interval for $p$ . What does it suggest about the coin?

Solution

Notice that $X$ follows the Bernoulli distribution $\operatorname {Ber} (p)$ . Thus, the mean of the population is $p$ , which is what we want to estimate. Also, we know that the population variance is unknown.

Based on the result from the 100 tosses, we know that 68 of $X_{1},\dotsc ,X_{100}$ equal one, and 32 of them equal zero. As a result, the sample mean is $={\frac {68(1)+32(0)}{100}}=0.68$ , and the sample standard deviation is ${\sqrt {\frac {68(1-68/100)^{2}+32(0-68/100)^{2}}{100}}}=0.2176$ .

Since $z_{0.05}\approx 1.64$ , an approximated 90% confidence interval for $p$ is $\left[0.68-1.64\cdot {\frac {0.2176}{\sqrt {100}}},0.68+1.64\cdot {\frac {0.2176}{\sqrt {100}}}\right]\approx [0.635,0.716],$ which suggests with 90% confidence that the coin is biased toward heads (since all values in the confidence interval exceed 0.5).

Let us consider an application of the approximated confidence intervals.

Proposition. (Confidence interval for probability) Let $X$ be a random variable, and $p=\mathbb {P} (X\in S)$ where $S$ is a set of real numbers. Define another (Bernoulli) random variable $\xi =\mathbf {1} \{X\in S\}$ ^[8]. Let $X_{1},\dotsc ,X_{n}$ be a random sample with the same distribution as $X$ , and let $\xi _{1},\dotsc ,\xi _{n}$ be the corresponding independent random sample, given by $\mathbf {1} \{X_{1}\in S\},\dotsc ,\mathbf {1} \{X_{n}\in S\}$ respectively. When the sample size $n$ is large (at least 30), an approximated $1-\alpha$ confidence interval for $p$ is $\left[{\overline {\xi }}-z_{\alpha /2}{\frac {S_{\xi }}{\sqrt {n}}},{\overline {\xi }}+z_{\alpha /2}{\frac {S_{\xi }}{\sqrt {n}}}\right],$ where ${\overline {\xi }}$ and $S_{\xi }$ is the sample mean and standard deviation of $\xi _{1},\dotsc ,\xi _{n}$ respectively.

Remark.

We may regard the event $\{X\in S\}$ as "success". Then the probability $p$ is the probability of "success", and ${\overline {\xi }}$ is the relative frequency of "success" for the sample (i.e. the ratio of the number of "success" to the sample size).

The reason for ${\overline {\xi }}$ to be the relative frequency is that, for each $i\in \{1,\dotsc ,n\}$ , when the $i$ th outcome is "success", the value of $\xi _{i}$ is one (and zero otherwise). Hence, the sample sum $\sum _{i=1}^{n}\xi _{i}$ gives the number of "success" outcomes.

Often, the probability $p$ involved is interpreted as a proportion for a large population. For example, the proportion of the people labelled with "success" in a large population.

Proof. Since $\xi =\mathbf {1} \{X\in S\}$ , by the fundamental bridge between probability and expectation, we have $\mathbb {E} [\xi ]=\mathbb {E} [\mathbf {1} \{X\in S\}]=\mathbb {P} (X\in S)=p.$

Applying the result for constructing an approximated confidence interval for mean when variance is unknown (the variance of $\xi$ is $p(1-p)$ since $\xi$ follows the Bernoulli distribution $\operatorname {Ber} (p)$ actually), an approximated $1-\alpha$ confidence interval for $p$ is $\left[{\overline {\xi }}-z_{\alpha /2}{\frac {S_{\xi }}{\sqrt {n}}},{\overline {\xi }}+z_{\alpha /2}{\frac {S_{\xi }}{\sqrt {n}}}\right].$

$\Box$

Example. Consider the above proposition. Show that the sample variance $S_{\xi }^{2}={\overline {\xi }}(1-{\overline {\xi }})$ .

Proof. We will use the result that $S_{\xi }^{2}={\overline {\xi ^{2}}}-\left({\overline {\xi }}\right)^{2}$ . First, we have ${\overline {\xi ^{2}}}={\frac {\sum _{i=1}^{n}\xi _{i}^{2}}{n}}.$ For each $i\in \{1,\dotsc ,n\}$ , we have $\xi _{i}^{2}=\xi _{i}$ , since

case 1: $\xi _{i}=1$ . Then, $\xi _{i}^{2}=1=\xi _{i}$ .
case 2: $\xi _{i}=0$ . Then, $\xi _{i}^{2}=0=\xi _{i}$ .

Hence, ${\overline {\xi ^{2}}}={\frac {\sum _{i=1}^{n}\xi _{i}^{2}}{n}}={\frac {\sum _{i=1}^{n}\xi _{i}}{n}}={\overline {\xi }}.$ It follows that $S_{\xi }^{2}={\overline {\xi }}-\left({\overline {\xi }}\right)^{2}={\overline {\xi }}(1-{\overline {\xi }}).$

$\Box$

Example. A box contains an unknown and large number of balls, where some are red. To estimate the proportion of the red balls in the box, we draw a single ball from the box, and then put it back into the box, for 100 times. Suppose we get 7 red balls in these 100 draws. Construct an approximated 95% confidence interval for the proportion of the red balls in the box.

Solution. From the given result, we have ${\overline {\xi }}=0.07$ , and thus $S_{\xi }^{2}={\overline {\xi }}(1-{\overline {\xi }})=0.07(0.93)=0.0651$ . Since $z_{0.025}\approx 1.96$ , an approximated 95% confidence interval for the proportion is $\left[0.07-1.96\cdot {\frac {\sqrt {0.0651}}{\sqrt {100}}},0.07+1.96\cdot {\frac {\sqrt {0.0651}}{\sqrt {100}}}\right]\approx [0.01999,0.12001].$

Exercise. Suppose we repeat the drawing process 10000 times, and we get 700 red balls in these 10000 draws. Construct an approximated 95% confidence interval for the proportion of the red balls in the box.

Solution

Notice that we also have ${\overline {\xi }}=0.07$ and $S_{\xi }^{2}=0.0651$ in this case. Hence, an approximated 95% confidence interval for the proportion is $\left[0.07-1.96\cdot {\frac {\sqrt {0.0651}}{\sqrt {10000}}},0.07+1.96\cdot {\frac {\sqrt {0.0651}}{\sqrt {10000}}}\right]\approx [0.064999,0.075001].$

Point Estimation

Statistics
Interval Estimation

Hypothesis Testing

↑ In more complicated cases, the coverage probability may vary with $\theta$ , i.e. is a varying function of $\theta$ .
↑ No, since if this is the case, then $\mathbb {P} (\theta \in \mathbb {R} )=\mathbb {P} (\theta \leq T_{1})+\mathbb {P} (\theta >T_{2})=\mathbb {P} (\theta \leq T_{1})+\mathbb {P} (\theta \geq T_{2})=0.025+0.025=0.05\neq 1.$
↑ Usually, we choose $a$ and $b$ such that $\mathbb {P} (Q(\mathbf {X} ,\theta )<a)=\alpha /2$ and $\mathbb {P} (Q(\mathbf {X} ,\theta )>b)=\alpha /2$ because of convenience (if the pdf of $Q(\mathbf {X} ,\theta )$ is symmetric about $x=0$ , then we know that $b=-a$ ).
↑ Although this assumption may not make sense (since clearly the time spent cannot be negative, while the support of normal distribution is $\mathbb {R}$ ), we use this assumption for illustration purpose. Nevertheless, if the mean of the normal distribution is "positive enough", then the probability for getting negative values is very low, and close to 0 anyway. Also, as we can see in the last section, no matter what the underlying distribution is, we can use central limit theorem to construct an approximated confidence interval, provided that the sample size is large enough.

↑ We need to do this since chi-squared distribution is not symmetric about

x=0

. Graphically, it looks like

|         area: 1-a
|     #    |
|   #....# v
|  # .......#   
| # |..........#
|#  |..........|  #
*---*----------*------
chi^2 1-a/2  chi^2 a/2

↑ Of course, this "cutoff" value is somewhat subjective, and different people may have different opinions about this. But in the examples here, the conditions imposed on $\alpha$ will be quite "extreme" so that the decision is clear.
↑ For (2) to hold, $\alpha$ can take a wide range of values ( $F_{0.1,12,20}\approx 1.89236$ which is still much greater than 0.125).
↑ $\xi$ is the greek letter Xi, which may be regarded as "the greek letter corresponding to x".

[1] In more complicated cases, the coverage probability may vary with $\theta$ , i.e. is a varying function of $\theta$ .

[2] No, since if this is the case, then $\mathbb {P} (\theta \in \mathbb {R} )=\mathbb {P} (\theta \leq T_{1})+\mathbb {P} (\theta >T_{2})=\mathbb {P} (\theta \leq T_{1})+\mathbb {P} (\theta \geq T_{2})=0.025+0.025=0.05\neq 1.$

[3] Usually, we choose $a$ and $b$ such that $\mathbb {P} (Q(\mathbf {X} ,\theta )<a)=\alpha /2$ and $\mathbb {P} (Q(\mathbf {X} ,\theta )>b)=\alpha /2$ because of convenience (if the pdf of $Q(\mathbf {X} ,\theta )$ is symmetric about $x=0$ , then we know that $b=-a$ ).

[4] Although this assumption may not make sense (since clearly the time spent cannot be negative, while the support of normal distribution is $\mathbb {R}$ ), we use this assumption for illustration purpose. Nevertheless, if the mean of the normal distribution is "positive enough", then the probability for getting negative values is very low, and close to 0 anyway. Also, as we can see in the last section, no matter what the underlying distribution is, we can use central limit theorem to construct an approximated confidence interval, provided that the sample size is large enough.

[5] We need to do this since chi-squared distribution is not symmetric about $x=0$ . Graphically, it looks like
| area: 1-a | # | | #....# v | # .......# | # |..........# |# |..........| # *---*----------*------ chi^2 1-a/2 chi^2 a/2

[6] Of course, this "cutoff" value is somewhat subjective, and different people may have different opinions about this. But in the examples here, the conditions imposed on $\alpha$ will be quite "extreme" so that the decision is clear.

[7] For (2) to hold, $\alpha$ can take a wide range of values ( $F_{0.1,12,20}\approx 1.89236$ which is still much greater than 0.125).

[8] $\xi$ is the greek letter Xi, which may be regarded as "the greek letter corresponding to x".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

	$[T_{1},T_{2}]$
	$(T_{1},T_{2})$
	$(T_{1},\infty )$
	$(-\infty ,T_{2})$
	None of the above.

	$[T_{1},T_{2}]$
	$(T_{1},T_{2})$
	$(T_{1},\infty )$
	$(-\infty ,T_{2})$
	None of the above.

	$[T_{1},T_{2}]$
	$(T_{1},T_{2})$
	$(T_{1},\infty )$
	$(-\infty ,T_{2})$
	None of the above.