# Statistics/Hypothesis Testing

## Introduction

In previous chapters, we have discussed two methods for estimating unknown parameters, namely point estimation and interval estimation. Estimating unknown parameters is an important area in statistical inference, and in this chapter we will discuss another important area, namely hypothesis testing, which is related to decision making. Indeed, the concepts of confidence intervals and hypothesis testing are closely related, as we will demonstrate.

## Basic concepts and terminologies

Before discussing how to conduct hypothesis testing, and evaluate the "goodness" of a hypothesis test, let us introduce some basic concepts and terminologies related to hypothesis testing first.

Definition. (Hypothesis) A (statistical) hypothesis is a statement about population parameter(s).

There are two terms that classify hypotheses:

Definition. (Simple and composite hypothesis) A hypothesis is a simple hypothesis if it completely specifies the distribution of the population (that is, the distribution is completely known, without any unknown parameters involved), and is a composite hypothesis otherwise.

Sometimes, it is not immediately clear that whether a hypothesis is simple or composite. To understand the classification of hypotheses more clearly, let us consider the following example.

Example. Consider a distribution with parameter ${\displaystyle \theta }$ , taking values in the parameter space ${\displaystyle \Theta =[0,\infty )}$ . Determine whether each of the following hypotheses is simple or composite.

(a) ${\displaystyle \theta =1}$ .

(b) ${\displaystyle \theta =\theta _{0}}$  where ${\displaystyle \theta _{0}\in \Theta }$  is known.

(c) ${\displaystyle \theta >1}$ .

(d) ${\displaystyle \theta >\theta _{0}}$  where ${\displaystyle \theta _{0}\in \Theta }$  is known.

(e) ${\displaystyle \theta \leq \theta _{0}}$  where ${\displaystyle \theta _{0}\in \Theta }$  is known.

(f) ${\displaystyle \theta \in \Theta _{0}}$  where ${\displaystyle \Theta _{0}}$  is a nonempty subset of ${\displaystyle \Theta }$ . [1]

Solution.

• (a) and (b) are simple hypotheses, since they all completely specifies the distribution.
• (c), (d) and (e) are composite hypotheses, since the parameter ${\displaystyle \theta }$  is not completely specified, then so is the distribution.
• (f) may be simple hypothesis or composite hypothesis, depending on ${\displaystyle \Theta _{0}}$ . If ${\displaystyle \Theta _{0}}$  contains exactly one element, then it is simple hypothesis. Otherwise, it is composite hypothesis.

In hypothesis tests, we consider two hypotheses:

Definition. (Null hypothesis and alternative hypothesis) In hypothesis testing, the hypothesis being tested is the null hypothesis (denoted by ${\displaystyle H_{0}}$ ) and another complementary hypothesis (to ${\displaystyle H_{0}}$ ) is the alternative hypothesis (denoted by ${\displaystyle H_{1}}$ ).

Remark.

• ${\displaystyle H_{1}}$  is complementary hypothesis to ${\displaystyle H_{0}}$  in the sense that if ${\displaystyle H_{0}}$  is true (false), then ${\displaystyle H_{1}}$  is false (true) (exactly one of ${\displaystyle H_{0}}$  and ${\displaystyle H_{1}}$  is true). Because of this, we usually say ${\displaystyle H_{0}}$  is tested against ${\displaystyle H_{1}}$  (so we often write ${\displaystyle H_{0}\quad {\text{vs.}}\quad H_{1}}$ ).
• Usually, ${\displaystyle H_{0}}$  usually corresponds to the status quo ("no effect"), and ${\displaystyle H_{1}}$  corresponds to some interesting "research findings" (So, ${\displaystyle H_{1}}$  is sometimes also called research hypothesis.).
• Since ${\displaystyle H_{0}}$  often corresponds to the status quo, we usually assume ${\displaystyle H_{0}}$  is true, unless there are sufficient evidences against it.
• This is somehow analogous to the legal principle of presumption of innocence which states that every person accused of any crime is considered innocent (${\displaystyle H_{0}}$  is assumed to be true), until proven guilty (there are sufficient evidences against ${\displaystyle H_{0}}$ ).

A general form of ${\displaystyle H_{0}}$  and ${\displaystyle H_{1}}$  is ${\displaystyle H_{0}:\theta \in \Theta _{0}}$  and ${\displaystyle H_{1}:\theta \in \Theta _{1}}$  where ${\displaystyle \Theta _{1}=\Theta _{0}^{c}}$ , which is the complement of ${\displaystyle \Theta _{0}}$  (with respect to ${\displaystyle \Theta }$ ), i.e., ${\displaystyle \Theta _{0}^{c}=\Theta \setminus \Theta _{0}}$  (${\displaystyle \Theta }$  is the parameter space, containing all possible values of ${\displaystyle \theta }$ ). The reason for choosing the complement of ${\displaystyle \Theta _{0}}$  in ${\displaystyle H_{1}}$  is that ${\displaystyle H_{1}}$  is the complementary hypothesis to ${\displaystyle H_{0}}$ , as suggested in the above definition.

Remark.

• In some books, it is only required that ${\displaystyle \Theta _{0}}$  and ${\displaystyle \Theta _{1}}$  to be disjoint (nonempty) subsets of the parameter space ${\displaystyle \Theta }$ , and it is not necessary that ${\displaystyle \Theta _{0}\cup \Theta _{1}=\Theta }$ .
• However, usually it is still assumed that exactly one of ${\displaystyle H_{0}}$  and ${\displaystyle H_{1}}$  is true, so it means that ${\displaystyle \theta }$  is not supposed to take values outside the set ${\displaystyle \Theta _{0}\cup \Theta _{1}}$  (otherwise, none of ${\displaystyle H_{0}}$  and ${\displaystyle H_{1}}$  will be true).
• Thus, in this case, we may actually say the parameter space is indeed ${\displaystyle \Theta _{0}\cup \Theta _{1}}$ . With this parameter space (since ${\displaystyle \theta }$  is assumed to take value in this union), then ${\displaystyle \Theta _{1}}$  is the complement of ${\displaystyle \Theta _{0}}$ .
• Alternatively, some may view the parameter space to be "linked" with a distribution, and so for a given distribution, the parameter space is fixed to be the one suggested by the distribution itself. So, in this case, ${\displaystyle \Theta _{1}}$  is not the complement of ${\displaystyle \Theta _{0}}$  (with respect to the parameter space).
• Despite the different definitions of ${\displaystyle \Theta _{0}}$  and ${\displaystyle \Theta _{1}}$ , a common feature is that we assume exactly one of ${\displaystyle H_{0}}$  and ${\displaystyle H_{1}}$  is true.

Example. Suppose your friend gives you a coin for tossing, and we do not know whether it is fair or not. However, since the coin is given by your friend, you believe that the coin is fair unless there are sufficient evidences suggesting otherwise. What is the null hypothesis and alternative hypothesis in this context (suppose the coin never land on edge)?

Solution. Let ${\displaystyle p}$  be the probability for landing on heads after tossing the coin. The null hypothesis is ${\displaystyle H_{0}:p={\frac {1}{2}}}$ . The alternative hypothesis is ${\displaystyle H_{1}:p\neq {\frac {1}{2}}}$ .

Exercise. Suppose we replace "coin" with "six-faced dice" in the above question. What is the null and alternative hypothesis? (Hint: You may let ${\displaystyle p_{1},p_{2},\dotsc ,p_{6}}$  be the probability for "1","2",...,"6" coming up after rolling the dice respectively.)

Solution

Let ${\displaystyle p_{1},p_{2},\dotsc ,p_{6}}$  be the probability for "1","2",...,"6" coming up after rolling the dice respectively. The null hypothesis is ${\displaystyle H_{0}:p_{1}=p_{2}=\dotsb =p_{6}={\frac {1}{6}}}$ , and the alternative hypothesis is ${\displaystyle H_{1}:{\text{at least one of }}p_{1},\dotsc ,p_{6}\neq {\frac {1}{6}}}$  (In fact, when one of ${\displaystyle p_{1},\dotsc ,p_{6}}$  is not ${\displaystyle {\frac {1}{6}}}$ , it must cause at least one other probability to be different from ${\displaystyle {\frac {1}{6}}}$ .)

We have mentioned that exactly one of ${\displaystyle H_{0}}$  and ${\displaystyle H_{1}}$  is assumed to be true. To make a decision, we need to decide which hypothesis should be regarded as true. Of course, as one may expect, this decision is not perfect, and we will have some errors involved in our decision. So, we cannot say we "prove that" a particular hypothesis is true (that is, we cannot be certain that a particular hypothesis is true). Despite this, we may "regard" (or "accept") a particular hypothesis as true (but not prove it as true) when we have sufficient evidences that lead us to make this decision (ideally, with small errors [2]).

Remark.

• Philosophically, "not rejecting ${\displaystyle H_{0}}$ " is different from "accepting ${\displaystyle H_{0}}$ ", since the phrase "not rejecting ${\displaystyle H_{0}}$ " can mean that we actually do not regard ${\displaystyle H_{0}}$  as true but just do not have sufficient evidences to reject ${\displaystyle H_{0}}$ , instead of meaning that we regard ${\displaystyle H_{0}}$  as true. On the other hand, the phrase "accepting ${\displaystyle H_{0}}$ " should mean that we regard ${\displaystyle H_{0}}$  as true.
• In spite of this, we will not handle these philosophical issues, and we will just assume that whenever there are not sufficient evidences to reject ${\displaystyle H_{0}}$  (i.e., we do not reject ${\displaystyle H_{0}}$ ), then we will act as if ${\displaystyle H_{0}}$  is true, that is, still accept ${\displaystyle H_{0}}$ , even if we may not actually "believe" in ${\displaystyle H_{0}}$ .
• Of course, in some other places, the saying of "accepting null hypothesis" is avoided because of these philosophical issues.

Now, we are facing with two questions. First, what evidences should we consider? Second, what is meant by "sufficient"? For the first question, a natural answer is that we should consider the observed samples, right? This is because we are making hypothesis about the population, and the samples are taken from, and thus closely related to the population, which should help us make the decision.

To answer the second question, we need the concepts in hypothesis testing. In particular, in hypothesis testing, we will construct a so-called rejection region or critical region to help us determining that whether we should reject the null hypothesis (i.e., regard ${\displaystyle H_{0}}$  as false), and hence (naturally) regard ${\displaystyle H_{1}}$  as true ("accept" ${\displaystyle H_{1}}$ ) (we have assumed that exactly one of ${\displaystyle H_{0}}$  and ${\displaystyle H_{1}}$  is true, so when we regard one of them as false, we should regard another of them as true). In particular, when we do not reject ${\displaystyle H_{0}}$ , we will act as if, or accept ${\displaystyle H_{0}}$  as true (and thus should also reject ${\displaystyle H_{1}}$  since exactly one of ${\displaystyle H_{0}}$  of ${\displaystyle H_{1}}$  is true).

Let us formally define the terms related to hypothesis testing in the following.

Definition. (Hypothesis test) A hypothesis test is a rule that specifies for which observed sample values we (do not reject and) accept ${\displaystyle H_{0}}$  as true (and thus reject ${\displaystyle H_{1}}$ ), and for which observed sample values we reject ${\displaystyle H_{0}}$  and accept ${\displaystyle H_{1}}$ .

Remark.

• Hypothesis test is sometimes simply written as "test" for simplicity. We also sometimes use the Greek letters "${\displaystyle \varphi }$ ", "${\displaystyle \psi }$ ", etc. to denote tests.

Definition. (Rejection and acceptance regions) Let ${\displaystyle S}$  be the set containing all possible observations of a random sample ${\displaystyle X_{1},\dotsc ,X_{n}}$ , ${\displaystyle \{\mathbf {x} \}=\{(x_{1},\dotsc ,x_{n})\}}$ . The rejection region (denoted by ${\displaystyle R}$ ) is the subset of ${\displaystyle S}$  for which ${\displaystyle H_{0}}$  is rejected. The complement of rejection region (with respect to the set ${\displaystyle S}$ ) (${\displaystyle R^{c}}$ ) is the acceptance region (it is thus the subset of ${\displaystyle S}$  for which ${\displaystyle H_{0}}$  is accepted).

Remark.

• Graphically, it looks like
    S
*------------*
|///|........|
|///\........|
|////\.......|
|/////\......|
*------------*

*--*
|//|: R
*--*

*--*
|..|: R^c
*--*


Typically, we use test statistic (a statistic for conducting a hypothesis test) to specify the rejection region. For instance, if the random sample is ${\displaystyle X_{1},\dotsc ,X_{n}}$  and the test statistic is ${\displaystyle {\overline {X}}}$ , the rejection region may be, say, ${\displaystyle R=\{\mathbf {x} :{\overline {x}}<2\}}$  (where ${\displaystyle x_{1},\dotsc ,x_{n}}$  and ${\displaystyle {\overline {x}}}$  is observed value of ${\displaystyle X_{1},\dotsc ,X_{n}}$  and ${\displaystyle {\overline {X}}}$  respectively). Through this, we can directly construct a hypothesis test: when ${\displaystyle \mathbf {x} \in R}$ , we reject ${\displaystyle H_{0}}$  and accept ${\displaystyle H_{1}}$ . Otherwise, if ${\displaystyle \mathbf {x} \in R^{c}}$ , we accept ${\displaystyle H_{0}}$ . So, in general, to specify the rule in a hypothesis test, we just need a rejection region. After that, we will apply the test on testing ${\displaystyle H_{0}}$  against ${\displaystyle H_{1}}$ . There are some terminologies related to the hypothesis tests constructed in this way:

Definition. (Left-, Right- and two-tailed tests) Let ${\displaystyle T(\mathbf {x} )=T(x_{1},\dotsc ,x_{n})}$  be an observed test statistic for a hypothesis test, and ${\displaystyle x_{1},\dotsc ,x_{n}}$  be the realizations of random samples.

• If the rejection region is in the form of ${\displaystyle \{\mathbf {x} :T(\mathbf {x} )\leq k_{1}\}}$ , then the hypothesis test is called a left-tailed test (or lower-tailed test).
• If the rejection region is in the form of ${\displaystyle \{\mathbf {x} :T(\mathbf {x} )\geq k_{2}\}}$ , then the hypothesis test is called a right-tailed test (or upper-tailed test).
• If the rejection region is in the form of ${\displaystyle \{\mathbf {x} :T(\mathbf {x} )\leq k_{3}{\text{ or }}T(\mathbf {x} )\geq k_{4}\}}$ , then the hypothesis test is called a two-tailed test.

Remark.

• The inequality signs can be strict, i.e., the above inequality signs can be replaced "${\displaystyle <}$ " and "${\displaystyle >}$ ".
• We use the terminology "tail" since the rejection region includes the values that are located at the "extreme portions" (i.e., very left (with small values) or very right (with large values) portions) (called tails) of distributions.
• When ${\displaystyle k_{3}=-k_{4}}$ , we may say the two-tailed test is equal-tailed. In this case, we can also express the rejection region as ${\displaystyle \{\mathbf {x} :|T(\mathbf {x} )|\geq k_{4}\}}$ .
• We sometimes also call upper-tailed and lower-tailed tests as one-sided tests, and two-tailed tests as two-sided tests.

Example. Suppose the rejection region is ${\displaystyle R=\{(x_{1},x_{2},x_{3}):x_{1}+x_{2}+x_{3}>6\}}$ , and it is observed that ${\displaystyle x_{1}=1,x_{2}=2,x_{3}=3}$ . Which hypothesis, ${\displaystyle H_{0}}$  or ${\displaystyle H_{1}}$ , should we accept?

Solution. Since ${\displaystyle (x_{1},x_{2},x_{3})\in R^{c}}$ , we should (not reject and) accept ${\displaystyle H_{0}}$ .

Exercise. What is the type of this hypothesis test?

Solution

Right-tailed test.

As we have mentioned, the decisions made by hypothesis test should not be perfect, and errors occur. Indeed, when we think carefully, there are actually two types of errors, as follows:

Definition. (Type I and II errors) A type I error is the rejection of ${\displaystyle H_{0}}$  when ${\displaystyle H_{0}}$  is true. A type II error is the acceptance of ${\displaystyle H_{0}}$  when ${\displaystyle H_{0}}$  is false.

We can illustrate these two types of errors more clearly using the following table.

Type I and II errors
Accept ${\displaystyle H_{0}}$  Reject ${\displaystyle H_{0}}$
${\displaystyle H_{0}}$  is true Correct decision Type I error
${\displaystyle H_{0}}$  is false Type II error Correct decision

We can express ${\displaystyle H_{0}:\theta \in \Theta _{0}}$  and ${\displaystyle H_{1}:\theta \in \Theta _{0}^{c}}$ . Also, assume the rejection region is ${\displaystyle R=R(\mathbf {X} )}$  (i.e., the rejection region with "${\displaystyle x}$ " replaced by "${\displaystyle X}$ "). In general, when "${\displaystyle R}$ " is put together with "${\displaystyle X}$ ", we assume ${\displaystyle R=R(\mathbf {X} )}$ .

Then we have some notations and expressions for probabilities of making type I and II errors: (let ${\displaystyle X_{1},\dotsc ,X_{n}}$  be a random sample and ${\displaystyle \mathbf {X} =(X_{1},\dotsc ,X_{n})}$ )

• The probability of making a type I error, denoted by ${\displaystyle \alpha (\theta )}$ , is ${\displaystyle \mathbb {P} _{\theta }(\mathbf {X} \in R)}$  if ${\displaystyle \theta \in \Theta _{0}}$ .
• The probability of making a type II error, denoted by ${\displaystyle \beta (\theta )}$ , is ${\displaystyle \mathbb {P} _{\theta }(\mathbf {X} \in R^{c})=1-\mathbb {P} _{\theta }(\mathbf {X} \in R)}$  if ${\displaystyle \theta \in \Theta _{0}^{c}}$ .

Remark.

• Remark on notations: In some other places, ${\displaystyle \alpha (\theta )}$  may be expressed as "${\displaystyle \mathbb {P} (\mathbf {X} \in R|\theta \in \Theta _{0})}$ ", "${\displaystyle \mathbb {P} (\mathbf {X} \in R|H_{0})}$ " or "${\displaystyle \mathbb {P} (\mathbf {X} \in R|H_{0}{\text{ is true}})}$ ". We should be careful that these notations are not supposed to interpret as conditional probabilities [3]. Instead, they are just some notations. This applies similarly to ${\displaystyle \beta (\theta )}$ .
• When ${\displaystyle \Theta _{0}}$  contains a single value only, we simply denote the type I error probability by ${\displaystyle \alpha }$ . Similarly, when ${\displaystyle \Theta _{1}}$  contains a single value only, we simply denote the type II error probability by ${\displaystyle \beta }$ .

Notice that we have a common expression in both ${\displaystyle \alpha (\theta )}$  and ${\displaystyle \beta (\theta )}$ , which is "${\displaystyle \mathbb {P} _{\theta }((X_{1},\dotsc ,X_{n})\in R)}$ ". Indeed, we can also write this expression as

${\displaystyle \mathbb {P} _{\theta }((X_{1},\dotsc ,X_{n})\in R)={\begin{cases}\alpha (\theta ),&\theta \in \Theta _{0};\\1-\beta (\theta ),&\theta \in \Theta _{0}^{c}.\end{cases}}}$

Through this, we can observe that this expression contains all informations about the probabilities of making errors, given a hypothesis test with rejection ${\displaystyle R}$ . Hence, we will give a special name to it:

Definition. (Power function) Let ${\displaystyle R}$  be a rejection region of a hypothesis test, and ${\displaystyle X_{1},\dotsc ,X_{n}}$  be a random sample. Then, the power function of the hypothesis test is

${\displaystyle \pi (\theta )=\mathbb {P} _{\theta }((X_{1},\dotsc ,X_{n})\in R)}$

where ${\displaystyle \theta \in \Theta }$ .

Remark.

• "${\displaystyle \pi }$ " can be thought of as the Greek letter "p". We choose ${\displaystyle \pi }$  instead of ${\displaystyle p}$  since "${\displaystyle p}$ " is sometimes used to denote probability (mass or density) functions.
• The power function will be our basis in evaluating the goodness of a test or comparing two different tests.

Example. Suppose we toss a (fair or unfair) coin 5 times (suppose the coin never land on edge), and we have the following hypotheses:

${\displaystyle H_{0}:p\leq {\frac {1}{2}}\quad {\text{vs.}}\quad H_{1}:p>{\frac {1}{2}}}$

where ${\displaystyle p}$  is the probability for landing on heads after tossing the coin. Let ${\displaystyle X_{1},\dotsc ,X_{5}}$  be the random sample for the 5 times of coin tossing, and ${\displaystyle x_{1},\dotsc ,x_{5}}$  be the corresponding realizations. Also, the value of a random sample is 1 if heads come up and 0 otherwise. Suppose we will reject ${\displaystyle H_{0}}$  if and only if heads come up in all 5 coin tosses.

(a) Determine the rejection region ${\displaystyle R}$ .

(b) What is the power function ${\displaystyle \pi (p)}$  (express in terms of ${\displaystyle p}$ )?

(c) Calculate ${\displaystyle \alpha (1/2)}$  and ${\displaystyle \beta (2/3)}$ .

Solution.

(a) The rejection region ${\displaystyle R=\{(x_{1},\dotsc ,x_{5}):x_{1}+\dotsb +x_{5}=5\}}$ .

(b) The power function is ${\displaystyle \pi (p)={\begin{cases}\mathbb {P} _{\theta }((X_{1},\dotsc ,X_{5})\in R)=p^{5},&p\leq {\frac {1}{2}};\\1-\mathbb {P} _{\theta }((X_{1},\dotsc ,X_{5})\in R)=1-p^{5},&p>{\frac {1}{2}}.\end{cases}}}$

(c) We have ${\displaystyle \alpha (1/2)=\left({\frac {1}{2}}\right)^{5}=0.03125}$  and ${\displaystyle \beta (2/3)=1-\left({\frac {2}{3}}\right)^{5}\approx 0.8683}$ . (Notice that although the probability of type I error can be low, the probability of type II error can be quite high. This is because, intuitively, it is quite "hard" to reject ${\displaystyle H_{0}}$  due to the strict requirement. So, even if ${\displaystyle H_{0}}$  is false, it may not be rejected, causing a type II error.)

Exercise. Does ${\displaystyle \max _{p\leq {\frac {1}{2}}}\pi (p)}$  exist? If yes, calculate it.

Solution

${\displaystyle \max _{p\leq {\frac {1}{2}}}\pi (p)}$  exists, and ${\displaystyle \max _{p\leq {\frac {1}{2}}}\pi (p)=\max _{p\leq {\frac {1}{2}}}p^{5}=\left({\frac {1}{2}}\right)^{5}={\frac {1}{32}}}$  (notice that ${\displaystyle y=x^{5}}$  is a strictly increasing function).

You notice that the type II error of this hypothesis test can be quite large, so you want to revise the test to lower the type II error.

(a) What is ${\displaystyle \beta (p)}$  in the above hypothesis test?

(b) Suppose the rejection region is modified to ${\displaystyle \{(x_{1},\dotsc ,x_{5}):x_{1}+\dotsb +x_{5}\geq 3\}}$ . Calculate ${\displaystyle \alpha (1/2)}$  and ${\displaystyle \beta (2/3)}$ . (Hint: consider binomial distribution.)

(c) Suppose the rejection region is modified to ${\displaystyle \{(x_{1},\dotsc ,x_{5}):x_{1}+\dotsb +x_{5}\geq 2\}}$ . Calculate ${\displaystyle \alpha (1/2)}$  and ${\displaystyle \beta (2/3)}$ .

(d) ${\displaystyle \alpha (1/2)+\beta (2/3)}$  is minimized at which hypothesis test: the original one, the one in (b), or the one in (c)?

Solution

(a) ${\displaystyle \beta (p)=p^{5}}$  if ${\displaystyle p>{\frac {1}{2}}}$ .

(b) In this case, we have ${\displaystyle \alpha (1/2)={\binom {5}{3}}\left({\frac {1}{2}}\right)^{3}\left({\frac {1}{2}}\right)^{2}+{\binom {5}{4}}\left({\frac {1}{2}}\right)^{4}\left({\frac {1}{2}}\right)+\left({\frac {1}{2}}\right)^{5}=0.5}$ , and ${\displaystyle \beta (2/3)=1-{\binom {5}{3}}\left({\frac {2}{3}}\right)^{3}\left({\frac {1}{3}}\right)^{2}+{\binom {5}{4}}\left({\frac {2}{3}}\right)^{4}\left({\frac {1}{3}}\right)+\left({\frac {2}{3}}\right)^{5}\approx 0.2099}$ .

(c) In this case, we have ${\displaystyle \alpha (1/2)=0.5+{\binom {5}{2}}\left({\frac {1}{2}}\right)^{2}\left({\frac {1}{2}}\right)^{3}=0.8125}$  and ${\displaystyle \beta (2/3)\approx 0.2099-{\binom {5}{2}}\left({\frac {2}{3}}\right)^{2}\left({\frac {1}{3}}\right)^{3}\approx 0.0453}$ .

(d) At the original one, ${\displaystyle \alpha (1/2)+\beta (2/3)\approx 0.89955}$ , at the one in (b), ${\displaystyle \alpha (1/2)+\beta (2/3)\approx 0.7099}$ , and at the one in (c), ${\displaystyle \alpha (1/2)+\beta (2/3)\approx 0.8578}$ . So, ${\displaystyle \alpha (1/2)+\beta (2/3)}$  is minimized at the one in (b).

Example. Let ${\displaystyle X_{1},\dotsc ,X_{n}}$  be a random sample from the normal distribution ${\displaystyle {\mathcal {N}}(\mu ,\sigma ^{2})}$  where ${\displaystyle \sigma ^{2}}$  is known. Consider the following hypotheses:

${\displaystyle H_{0}:\mu \leq \mu _{0}\quad {\text{vs.}}\quad \mu >\mu _{0}}$

where ${\displaystyle \mu _{0}}$  is a constant. We use the test statistic ${\displaystyle T={\frac {{\overline {X}}-\mu _{0}}{\sigma /{\sqrt {n}}}}\sim {\mathcal {N}}(0,1)}$  for the hypothesis testing, and we reject ${\displaystyle H_{0}}$  if and only if ${\displaystyle T\geq k}$ .

Find the power function ${\displaystyle \pi (\mu )}$ , ${\displaystyle \lim _{\mu \to -\infty }\pi (\mu )}$ , and ${\displaystyle \lim _{\mu \to \infty }\pi (\mu )}$ .

Solution. The power function is

{\displaystyle {\begin{aligned}\pi (\mu )&=\mathbb {P} _{\mu }(T\geq k)\\&=\mathbb {P} _{\mu }\left({\frac {{\overline {X}}-\mu _{0}}{\sigma /{\sqrt {n}}}}\geq k\right)\\&=\mathbb {P} _{\mu }\left({\frac {{\overline {X}}-\mu +\mu -\mu _{0}}{\sigma /{\sqrt {n}}}}\geq k\right)\\&=\mathbb {P} _{\mu }\left({\frac {{\overline {X}}-\mu }{\sigma /{\sqrt {n}}}}\geq k+{\frac {\mu _{0}-\mu }{\sigma /{\sqrt {n}}}}\right)\\&=\mathbb {P} \left(Z\geq k+{\frac {\mu _{0}-\mu }{\sigma /{\sqrt {n}}}}\right).&(Z\sim {\mathcal {N}}(0,1),{\text{ which is independent from }}\mu ,{\text{so we can drop the subscript }}\mu {\text{' for }}\mathbb {P} )\\\end{aligned}}}

Thus, ${\displaystyle \lim _{\mu \to -\infty }\pi (\mu )=\mathbb {P} (Z\geq \infty )=0}$  and ${\displaystyle \lim _{\mu \to \infty }\pi (\mu )=\mathbb {P} (Z\geq -\infty )=1}$  (some abuse of notations), by the definition of cdf. (Indeed, ${\displaystyle \pi (\mu )}$  is a strictly increasing function of ${\displaystyle \mu }$ .)

Exercise. Show that ${\displaystyle \pi (\mu _{0})=\alpha }$  if ${\displaystyle \mathbb {P} (Z\geq k)=\alpha }$ .

Solution

Proof. Assume ${\displaystyle \mathbb {P} (Z\geq k)=\alpha }$ . Then, ${\displaystyle \pi (\mu _{0})=\mathbb {P} (Z\geq k+0)=\mathbb {P} (Z\geq k)=\alpha }$ .

${\displaystyle \Box }$

Ideally, we want to make both ${\displaystyle \alpha (\theta )}$  and ${\displaystyle \beta (\theta )}$  arbitrarily small. But this is generally impossible. To understand this, we can consider the following extreme examples:

• Set the rejection region ${\displaystyle R}$  to be ${\displaystyle S=\{\mathbf {x} \}}$ , which is the set of all possible observations of random samples. Then, ${\displaystyle \pi (\theta )=1}$  for each ${\displaystyle \theta \in \Theta }$ . From this, of course we have ${\displaystyle \beta (\theta )=0}$ , which is nice. But the serious problem is that ${\displaystyle \alpha (\theta )=1}$  due to the mindless rejection.
• Another extreme is setting the rejection region ${\displaystyle R}$  to be the empty set ${\displaystyle \varnothing }$ . Then, ${\displaystyle \pi (\theta )=0}$  for each ${\displaystyle \theta \in \Theta }$ . From this, we have ${\displaystyle \alpha (\theta )=0}$ , which is nice. But, again the serious problem is that ${\displaystyle \beta (\theta )=1}$  due to the mindless acceptance.

We can observe that to make ${\displaystyle \alpha (\theta )}$  (${\displaystyle \beta (\theta )}$ ) to be very small, it is inevitable that ${\displaystyle \beta (\theta )}$  (${\displaystyle \alpha (\theta )}$ ) will increase consequently, due to accepting (rejecting) "too much". As a result, we can only try to minimize the probability of making one type of error, holding the probability of making another type of error controlled.

Now, we are interested in knowing that which type of errors should be controlled. To motivate the choice, we can again consider the analogy of legal principle of presumption of innocence. In this case, type I error means proving an innocent guilty, and type II error means acquitting a guilty person. Then, as suggested by Blackstone's ratio, type I error is more serious and important than type II error. This motivates us to control the probability of type I error, i.e., ${\displaystyle \alpha (\theta )}$ , at a specified small value ${\displaystyle \alpha ^{*}}$ , so that we can control the probability of making this more serious error. After that, we consider the tests that "control the type I error probability at this level", and the one with the smallest ${\displaystyle \beta (\theta )}$  is the "best" one (in the sense of probability of making errors).

To describe "control the type I error probability at this level" in a more precise way, let us define the following term.

Definition. (Size of a test) A test with power function ${\displaystyle \pi (\theta )}$  is a size ${\displaystyle \alpha }$  test if

${\displaystyle \sup _{\theta \in \Theta _{0}}\pi (\theta )=\alpha }$

where ${\displaystyle 0\leq \alpha \leq 1}$ .

Remark.

• Supremum is similar to maximum, and in "nice" situations (you may assume the situations here are "nice"), supremum is the same as maximum. Hence, choosing the supremum of ${\displaystyle \pi (\theta )}$  over ${\displaystyle \theta \in \Theta _{0}}$  as the size of a test means that the size of a test gives its maximum probability of type I error (reject ${\displaystyle H_{0}}$  when ${\displaystyle H_{0}}$  is true), considering all situations, i.e., all different possible values of ${\displaystyle \theta }$  that makes ${\displaystyle H_{0}}$  true.
• Intuitively, we choose the maximum probability of type I error to be the size so that the size can tell us how probable type I error occurs in the worst situation, to show that how "well" can the test control the type I error [4].
• Special case: if ${\displaystyle \Theta _{0}}$  contains a single parameter only, say (a known value) ${\displaystyle \theta _{0}}$  (i.e., ${\displaystyle H_{0}}$  is a simple hypothesis which states that ${\displaystyle \theta =\theta _{0}}$ ), then ${\displaystyle \alpha =\pi (\theta _{0})}$ .
• ${\displaystyle \alpha }$  is also called the level of significance or significance level (these terms are related to the concept of statistical (in)significance, which is in turn related to the concept of ${\displaystyle p}$ -value. We will discuss these later.).
• The "${\displaystyle \alpha }$ " here and the "${\displaystyle \alpha }$ " in the confidence coefficient can actually be interpreted as the "same", by connecting confidence intervals with hypothesis testing. We will discuss these later.
• Because of this definition, the null hypothesis conventionally contains an equality (i.e. in the form of ${\displaystyle \theta =\theta _{0},\theta \geq \theta _{0}}$  or ${\displaystyle \theta \leq \theta _{0}}$ ), since the size of the test can be calculated more conveniently if this is the case.

So, using this definition, controlling the type I error probability at a particular level ${\displaystyle \alpha }$  means that the size of the test should not exceed ${\displaystyle \alpha }$ , i.e., ${\displaystyle \sup _{\theta \in \Theta _{0}}\pi (\theta )\leq \alpha }$  (in some other places, such test is called a level ${\displaystyle \alpha }$  test.

Example. Consider the normal distribution ${\displaystyle {\mathcal {N}}(\mu ,1)}$  (with the parameter space: ${\displaystyle \Theta =\{\mu :\mu =20{\text{ or }}21\}}$ ) , and the hypotheses

${\displaystyle H_{0}:\mu =20\quad {\text{vs.}}\quad H_{1}:\mu =21}$

Let ${\displaystyle X_{1},\dotsc ,X_{10}}$  be a random sample from the normal distribution ${\displaystyle {\mathcal {N}}(\mu ,1)}$ , and the corresponding realizations are ${\displaystyle x_{1},\dotsc ,x_{10}}$ . Suppose the rejection region is ${\displaystyle \{(x_{1},\dotsc ,x_{10}):{\overline {x}}\geq k\}}$ .

(a) Find ${\displaystyle k}$  such that the significance level of the test is ${\displaystyle \alpha =0.05}$ .

(b) Calculate the type II error probability ${\displaystyle \beta }$ . To have the type II error probability ${\displaystyle \beta \leq 0.05}$ , what is the minimum sample size (with the same rejection region)?

Solution.

(a) In order for the significance level to be 0.05, we need to have

${\displaystyle \sup _{\mu \in \Theta _{0}}\pi (\mu )=0.05.}$

But ${\displaystyle \Theta _{0}=\{20\}}$ . So, this means
${\displaystyle 0.05=\pi (20)=\mathbb {P} _{\mu =20}({\overline {X}}\geq k)=\mathbb {P} \left({\frac {{\overline {X}}-20}{1/{\sqrt {10}}}}\geq {\frac {k-20}{1/{\sqrt {10}}}}\right)=\mathbb {P} (Z\geq {\sqrt {10}}(k-20))}$

where ${\displaystyle Z\sim {\mathcal {N}}(0,1)}$ . We then have
${\displaystyle {\sqrt {10}}(k-20)=z_{0.05}\approx 1.64\implies k\approx 20.51861.}$

(b) The type II error probability is

${\displaystyle \beta \approx 1-\mathbb {P} _{\mu =21}({\overline {X}}\geq 20.51861)=1-\mathbb {P} \left({\frac {{\overline {X}}-21}{1/{\sqrt {10}}}}\geq {\frac {20.51861-21}{1/{\sqrt {10}}}}\right)\approx 1-\mathbb {P} (Z\geq -1.522)=\mathbb {P} (Z<-1.522)\approx 0.06426.}$

(${\displaystyle Z\sim {\mathcal {N}}(0,1)}$ ) With the sample size ${\displaystyle n}$ , the type II error probability is
${\displaystyle \beta \approx \mathbb {P} \left(Z<{\sqrt {n}}(20.51861-21)\right)}$

When the sample size ${\displaystyle n}$  increases, ${\displaystyle {\sqrt {n}}(20.51861-21)}$  will become more negative, and hence the type II error probability decreases. It follows that
${\displaystyle \mathbb {P} (Z<{\sqrt {n^{*}}}(20.51861-21)\leq 0.05\implies {\sqrt {n}}(20.51861-21)\geq -1.64\implies n\geq 11.603.}$

Hence, the minimum sample size is 12.

Exercise. Calculate the type I error probability and type II error probability when the sample size is 12 (the rejection region remains unchanged).

Solution

The type II error probability is

${\displaystyle \mathbb {P} (Z<{\sqrt {12}}(20.51861-21))\approx \mathbb {P} (Z<-1.668)\approx 0.04746.}$

The type I error probability is
${\displaystyle \mathbb {P} (Z\geq {\sqrt {12}}(20.51861-20))\approx \mathbb {P} (Z\geq 1.797)\approx 0.0359.}$

So, with the same rejection region and different sample size, the significance level (type I error probability in this case) of the test changes.

For now, we have focused on using rejection region to conduct hypothesis tests. But this is not the only way. Alternatively, we can make use of ${\displaystyle p}$ -value.

Definition. (${\displaystyle p}$ -value) Let ${\displaystyle T(\mathbf {x} )}$  be an observed value of a test statistic ${\displaystyle T(\mathbf {X} )=T(X_{1},\dotsc ,X_{n})}$  in a hypothesis test.

• Case 1: The test is left-tailed. Then, the ${\displaystyle p}$ -value is ${\displaystyle \sup _{\theta \in \Theta _{0}}\mathbb {P} _{\theta }(T(\mathbf {X} )\leq T(\mathbf {x} ))}$ .
• Case 2: The test is right-tailed. Then, the ${\displaystyle p}$ -value is ${\displaystyle \sup _{\theta \in \Theta _{0}}\mathbb {P} _{\theta }(T(\mathbf {X} )\geq T(\mathbf {x} ))}$ .
• Case 3: The test is two-tailed.
• Subcase 1: The distribution of ${\displaystyle T}$  is symmetric about zero (when ${\displaystyle H_{0}}$  is true). Then, the ${\displaystyle p}$ -value is ${\displaystyle \sup _{\theta \in \Theta _{0}}\mathbb {P} _{\theta }(|T(\mathbf {X} )|\geq |T(\mathbf {x} )|)}$ .
• Subcase 2: The distribution of ${\displaystyle T}$  is not symmetric about zero (when ${\displaystyle H_{0}}$  is true). Then, the ${\displaystyle p}$ -value is ${\displaystyle 2\min {\bigg \{}\sup _{\theta \in \Theta _{0}}\mathbb {P} _{\theta }(T(\mathbf {X} )\leq T(\mathbf {x} )),\sup _{\theta \in \Theta _{0}}\mathbb {P} _{\theta }(T(\mathbf {X} )\geq T(\mathbf {x} )){\bigg \}}}$ .

Remark.

• ${\displaystyle p}$ -value can be interpreted as the probability of a test statistic to be at least as "extreme" as the observed test statistic in a hypothesis test, when ${\displaystyle H_{0}}$  is true. Here, "extreme" is in favour of ${\displaystyle H_{1}}$ , i.e., the "direction of extreme" is towards the "direction of tail" for the test (when the test statistic is more towards the tail direction, it is more likely to fall in the rejection region, and therefore reject ${\displaystyle H_{0}}$  and accept ${\displaystyle H_{1}}$ ).
• So, when the ${\displaystyle p}$ -value is small, it implies that the observed value of the test statistic is already very "extreme", causing the test statistic to be unlikely to be even more "extreme" than the observed value.
• In general, it can be quite difficult to compute ${\displaystyle p}$ -values manually. Thus, ${\displaystyle p}$ -values are often computed using software, e.g. R.
• For case 3 subcase 1, consider the following diagram:
            pdf of T(X)
|
*---*
/  |  \
/   |   \
/    |    \
/|    |    |\
/#|    |    |#\
/##|    |    |##\
---*###|    |    |###*---
#######|    |    |#######
-------------*-------------
^            ^
<---->|   =====>   |<---->             T(x)<0
T(x)         -T(x)
"more extreme"          "more extreme"

T(X)<=T(x)          T(X)>=-T(x)          ====> |T(X)|>=|T(x)| ( T(x)=-|T(x)|, -T(x)=|T(x)|)

<-->^                ^<-->
|                |                 T(x)>0
-T(x)            T(x)

T(X)<=-T(x)         T(X)>=T(x)            ====> |T(X)|>=|T(x)| (-T(x)=-|T(x)|, T(x)=|T(x)|)

• For case 3 subcase 2, consider the following diagram:
                  pdf of T
|
|     /*----*
|    /|      \
|   /#|       \
|  /##|        \
| /###|         *---|--------*
|/####|             |#########\
----*------------------------------
^
|
T(x)
|---|-------------------------|
T(X)<=T(x)   T(X)>=T(x)       &&&&&: T(X)>= -T(x)
choose
^
|
t
|-------------------------|---|
T(X)<=T(x)      T(X)>=T(x)
&&&:                      choose
T(X)<=-T(x)
`
We can observe that the observed value ${\displaystyle t}$  may lie at the left tail or right tail. In either case, for ${\displaystyle T}$  to be "more extreme", the resulting inequality corresponds to the one with smaller probability. Thus, we have "${\displaystyle \min }$ ". But we also need to consider the "extreme" in another tail. It is intuitive to say that when ${\displaystyle T}$  is more extreme than ${\displaystyle -t}$  (in another tail), then ${\displaystyle T}$  should be also considered as "more extreme". Thus, there is a "${\displaystyle 2\times }$ "

The following theorem allows us to use ${\displaystyle p}$ -value for hypothesis testing.

Theorem. Let ${\displaystyle T(\mathbf {x} )=T(x_{1},\dotsc ,x_{n})}$  be an observed value of a test statistic ${\displaystyle T(\mathbf {X} )=T(X_{1},\dotsc ,X_{n})}$  in a hypothesis test. The null hypothesis ${\displaystyle H_{0}}$  is rejected at the significance level ${\displaystyle \alpha }$  if and only if the ${\displaystyle p}$ -value is less than or equal to ${\displaystyle \alpha }$ .

Proof. (Partial) We can prove "if" and "only if" directions at once. Let us first consider the case 1 in the definition of ${\displaystyle p}$ -value. By definitions, ${\displaystyle p}$ -value is ${\displaystyle \sup _{\theta \in \Theta _{0}}\mathbb {P} _{\theta }(T(\mathbf {X} )\leq T^{*}(\mathbf {x} ))}$  and ${\displaystyle \alpha =\sup _{\theta \in \Theta _{0}}\pi (\theta )=\sup _{\theta \in \Theta _{0}}\mathbb {P} _{\theta }(T(\mathbf {X} )\leq T^{*}(\mathbf {x} ))}$  (Define ${\displaystyle T^{*}(\mathbf {X} )}$  such that ${\displaystyle T(\mathbf {X} )\leq T^{*}(\mathbf {x} )\iff (X_{1},\dotsc ,X_{n})\in R}$ .). Then, we have

{\displaystyle {\begin{aligned}p{\text{-value}}\leq \alpha &\iff \sup _{\theta \in \Theta _{0}}\mathbb {P} _{\theta }(T(\mathbf {X} )\leq T(\mathbf {x} ))\leq \sup _{\theta \in \Theta _{0}}\mathbb {P} _{\theta }(T(\mathbf {X} )\leq T^{*}(\mathbf {x} ))\\&\iff T(\mathbf {x} )\leq T^{*}(\mathbf {x} )&({\text{by some omitted arguments and the monotonicity of cdf}})\\&\iff (x_{1},\dotsc ,x_{n})\in \{(y_{1},\dotsc ,y_{n}):T(y_{1},\dotsc ,y_{n})\leq T^{*}(\mathbf {x} )\}&(x_{1},\dotsc ,x_{n}{\text{ are realizations of }}X_{1},\dotsc ,X_{n}{\text{ respectively}})\\&\iff (x_{1},\dotsc ,x_{n})\in R&({\text{defined above}})\\&\iff H_{0}{\text{ is rejected at significance level }}\alpha .&({\text{the test with power function }}\pi (\theta ){\text{ is size }}\alpha {\text{ test}})\end{aligned}}}

For other cases, the idea is similar (just the directions of inequalities for ${\displaystyle T}$  are different).

${\displaystyle \Box }$

Remark.

• From this, we can observe that ${\displaystyle p}$ -value can be used to report the test result in a more "continuous" scale, instead of just a single decision "accept ${\displaystyle H_{0}}$ " or "reject ${\displaystyle H_{0}}$ ", in the sense that if ${\displaystyle p}$ -value is "much smaller" than the significance level ${\displaystyle \alpha }$ , then we have a "stronger" evidence for rejecting ${\displaystyle H_{0}}$  (stronger in the sense that even if the significance level is very low (a very strict requirement on type I error), ${\displaystyle H_{0}}$  can still be rejected).
• Also, reporting a ${\displaystyle p}$ -value allows readers to choose an appropriate significance level ${\displaystyle \alpha }$  themselves, and compare the ${\displaystyle p}$ -value with ${\displaystyle \alpha }$ , and therefore make their own decisions, which are not necessary the same as the decisions made in the test report (since readers may choose a different significance level from that of the report).
• Here, let us also mention about the concept of statistical significance. An observation has statistical significance if it is "unlikely" to happen (i.e., the observed value is quite "extreme") when the null hypothesis is true. To be more precise, in terms of ${\displaystyle p}$ -value, this means an observed value of a test statistic is statistically significant if ${\displaystyle p}$ -value is less than or equal to ${\displaystyle \alpha }$ , and we say that the observed value is statistically insignificant otherwise. Thus, ${\displaystyle \alpha }$  can be interpreted as the benchmark of "significant" or "extremeness", and hence the name significance level.

Example. Recall the setting of a previous example: consider the normal distribution ${\displaystyle {\mathcal {N}}(\mu ,1)}$  (with the parameter space for ${\displaystyle \mu }$ : ${\displaystyle \Theta =\{20,21\}}$ ) , and the hypotheses

${\displaystyle H_{0}:\mu =20\quad {\text{vs.}}\quad H_{1}:\mu =21}$

Let ${\displaystyle X_{1},\dotsc ,X_{10}}$  be a random sample from the normal distribution ${\displaystyle {\mathcal {N}}(\mu ,1)}$ , and the corresponding realizations are ${\displaystyle x_{1},\dotsc ,x_{10}}$ .

At the significance level ${\displaystyle \alpha =0.05}$ , we have determined that the rejection region is ${\displaystyle R=\{(y_{1},\dotsc ,y_{10}):{\overline {y}}\geq 20.51861\}}$ . Suppose it is observed that ${\displaystyle {\overline {x}}=20.5}$ .

(a) Use the rejection region to determine whether we should reject ${\displaystyle H_{0}}$ .

(b) Use ${\displaystyle p}$ -value to determine whether we should reject ${\displaystyle H_{0}}$ .

Solution.

(a) Since ${\displaystyle {\overline {x}}=20.5<20.51861}$ , we have ${\displaystyle (x_{1},\dotsc ,x_{10})\in R^{c}}$ . Thus, we should not reject ${\displaystyle H_{0}}$ .

(b) Since the test is right-tailed, the ${\displaystyle p}$ -value is

${\displaystyle \sup _{\mu \in \{20\}}\mathbb {P} _{\mu }({\overline {X}}\geq {\overline {x}})=\mathbb {P} _{\mu =20}({\overline {X}}\geq 20.5)=\mathbb {P} \left({\frac {{\overline {X}}-20}{1/{\sqrt {10}}}}\geq {\frac {20.5-20}{1/{\sqrt {10}}}}\right)\approx \mathbb {P} (Z\geq 1.581)\approx 0.05705>\alpha =0.05}$

where ${\displaystyle Z\sim {\mathcal {N}}(0,1)}$ . Thus, ${\displaystyle H_{0}}$  should not be rejected.

Exercise.

Choose the significance level(s) for which ${\displaystyle H_{0}}$  is rejected based on the observation.

 0.01 0.04 0.06 0.08 0.1

Remark.

• From this, we can notice that one can "manipulate" the decision by changing the significance level. In fact, if one sets the significance level to be 1 , then ${\displaystyle H_{0}}$  must be rejected (since ${\displaystyle p}$ -value is a probability which must be less than or equal to 1). But such significance level is meaningless, since it means that the type I error probability can be as high as 1, so such test has a large error and the result is not reliable anyway.
• On the other hand, if one sets the significance level to be 0, then ${\displaystyle H_{0}}$  must not be rejected (unless the ${\displaystyle p}$ -value is exactly zero, which is very unlikely, since zero ${\displaystyle p}$ -value means that the observation is the most extreme one, so it is (almost) never for the test statistic to be at least as extreme as the observation.).

## Evaluating a hypothesis test

After discussing some basic concepts and terminologies, let us now study some ways to evaluate goodness of a hypothesis test. As we have previously mentioned, we want the probability of making type I errors and type II errors to be small, but we have mentioned that it is generally impossible to make both probabilities to be arbitrarily small. Hence, we have suggested to control the type I error, using the size of a test, and the "best" test should the one with the smallest probability of making type II error, after controlling the type I error.

These ideas lead us to the following definitions.

Definition. (Power of a test) The power of a test is the probability of rejecting ${\displaystyle H_{0}}$  when ${\displaystyle H_{0}}$  is false. That is, the power is ${\displaystyle 1-\beta }$ , if the probability of making type II error is ${\displaystyle \beta }$ .

Using this definition, instead of saying "best" test (test with the smallest type II error probability), we can say "a test with the most power", or in other words, the "most powerful test".

Definition. (Uniformly most powerful test) A test ${\displaystyle \varphi }$  with rejection region ${\displaystyle R}$  is a uniformly most powerful (UMP) test with size ${\displaystyle \alpha }$  for testing ${\displaystyle H_{0}:\theta \in \Theta _{0}\quad {\text{vs.}}\quad H_{1}:\theta \in \Theta _{1}}$  (${\displaystyle \Theta _{1}=\Theta _{0}^{c}}$ ) if

• (size of ${\displaystyle \varphi }$ ) ${\displaystyle \sup _{\theta \in \Theta _{0}}\pi _{\varphi }(\theta _{0})=\alpha }$ , and
• (UMP) ${\displaystyle \pi _{\varphi }(\theta _{1})\geq \pi _{\psi }(\theta _{1})}$  for each ${\displaystyle \theta \in \Theta _{1}}$  and for each test ${\displaystyle \psi }$  with rejection region ${\displaystyle R^{*}\neq R}$  and ${\displaystyle \pi _{\psi }(\theta _{0})\leq \alpha }$  (${\displaystyle \pi _{\psi }(\cdot )}$  is the power function of the test ${\displaystyle \psi }$ ).

(${\displaystyle \pi _{\varphi }(\cdot )}$  and ${\displaystyle \pi _{\psi }(\cdot )}$  are the power functions of the tests ${\displaystyle \varphi }$  and ${\displaystyle \psi }$  respectively.)

Remark.

• The rejection region ${\displaystyle R}$  is sometimes called a best rejection region of size ${\displaystyle \alpha }$ .
• In other words, a test is UMP with size ${\displaystyle \alpha }$  if it has size ${\displaystyle \alpha }$  and its power is the largest among all other tests with size less than or equal to ${\displaystyle \alpha }$ , for each ${\displaystyle \theta \in \Theta _{1}}$ . The adverb "uniformly" emphasizes that this is true for each ${\displaystyle \theta \in \Theta _{1}}$ .
• Since the power is largest for each value of ${\displaystyle \theta \in \Theta _{1}}$ , the rejection region ${\displaystyle R}$  of the UMP test does not depend on the choice of ${\displaystyle \theta \in \Theta _{1}}$ , that is, regardless of the chosen value of ${\displaystyle \theta \in \Theta _{1}}$ , the rejection region is the same. This is expected since the rejection region ${\displaystyle R}$  is not supposed to be changed when the choice of ${\displaystyle \theta \in \Theta _{1}}$  is different. The rejection region ${\displaystyle R}$  (fixed) should always be the best, for each ${\displaystyle \theta \in \Theta _{1}}$ .
• If ${\displaystyle H_{1}}$  is simple, we may simply call the UMP test as the most powerful (MP) test.

## Constructing a hypothesis test

There are many ways of constructing a hypothesis test, but of course not all are good (i.e., "powerful"). In the following, we will provide some common approaches to construct hypothesis tests. In particular, the following lemma is very useful for constructing a MP test with size ${\displaystyle \alpha }$ .

### Neyman-Pearson lemma

Lemma. (Neyman-Pearson lemma) Let ${\displaystyle X_{1},\dotsc ,X_{n}}$  be a random sample from a population with pdf or pmf ${\displaystyle f(x;\theta )}$  (${\displaystyle \theta }$  may be a parameter vector, and the parameter space is ${\displaystyle \Theta =\{\theta _{0},\theta _{1}\}}$ ). Let ${\displaystyle {\mathcal {L}}(\cdot )}$  be the likelihood function. Then a test ${\displaystyle \varphi }$  with rejection region

${\displaystyle R=\left\{(x_{1},\dotsc ,x_{n}):{\frac {{\mathcal {L}}(\theta _{0};\mathbf {x} )}{{\mathcal {L}}(\theta _{1};\mathbf {x} )}}\leq k\right\}}$

and size ${\displaystyle \alpha }$  is the MP test with size ${\displaystyle \alpha }$  for testing
${\displaystyle H_{0}:\theta =\theta _{0}\quad {\text{vs.}}\quad H_{1}:\theta =\theta _{1}}$

where ${\displaystyle k}$  is a value determined by the size ${\displaystyle \alpha }$ .

Proof. Let us first consider the case where the underlying distribution is continuous. With the assumption that the size of ${\displaystyle \varphi }$  is ${\displaystyle \alpha }$ , the "size" requirement for being a UMP test is satisfied immediately. So, it suffices to show that ${\displaystyle \varphi }$  satisfies the "UMP" requirement for being a MP test.

Notice that in this case, "${\displaystyle \Theta _{1}}$ " is simply ${\displaystyle \{\theta _{1}\}}$ . So, for every test ${\displaystyle \psi }$  with rejection region ${\displaystyle R^{*}\neq R}$  and ${\displaystyle {\color {purple}\pi _{\psi }(\theta _{0})\leq \alpha }}$ , we will proceed to show that ${\displaystyle \pi _{\varphi }(\theta _{1})\geq \pi _{\psi }(\theta _{1})}$ .

Since

{\displaystyle {\begin{aligned}\pi _{\varphi }(\theta _{1})-\pi _{\psi }(\theta _{1})&=\mathbb {P} _{\theta _{1}}((X_{1},\dotsc ,X_{n})\in R)-\mathbb {P} _{\theta _{1}}((X_{1},\dotsc ,X_{n})\in R^{*})\\&=\int \dotsi \int _{R}^{}{\mathcal {L}}(\theta _{1};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}-\int \dotsi \int _{R^{*}}^{}{\mathcal {L}}(\theta _{1};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}\\&={\color {blue}\int \dotsi \int _{R}^{}{\mathcal {L}}(\theta _{1};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}-\int \dotsi \int _{R\cap R^{*}}^{}{\mathcal {L}}(\theta _{1};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}}-\left({\color {red}\int \dotsi \int _{R^{*}}^{}{\mathcal {L}}(\theta _{1};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}-\int \dotsi \int _{R\cap R^{*}}^{}{\mathcal {L}}(\theta _{1};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}}\right)\\&={\color {blue}\int \dotsi \int _{R\setminus R^{*}}^{}{\mathcal {L}}(\theta _{1};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}}-{\color {red}\int \dotsi \int _{R^{*}\setminus R}^{}{\mathcal {L}}(\theta _{1};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}}\\&\geq {\color {blue}{\frac {1}{k}}}\int \dotsi \int _{R\setminus R^{*}}^{}{\color {blue}{\mathcal {L}}(\theta _{0};\mathbf {x} )}\,dx_{n}\cdots \,dx_{1}-{\color {red}{\frac {1}{k}}}\int \dotsi \int _{R^{*}\setminus R}^{}{\color {red}{\mathcal {L}}(\theta _{0};\mathbf {x} )}\,dx_{n}\cdots \,dx_{1}\qquad ({\text{In }}R,{\color {blue}{\mathcal {L}}(\theta _{1};\mathbf {x} )\geq {\frac {1}{k}}{\mathcal {L}}(\theta _{0};\mathbf {x} )}.{\text{ In }}R^{c},{\mathcal {L}}(\theta _{1};\mathbf {x} )<{\frac {1}{k}}{\mathcal {L}}(\theta _{0};\mathbf {x} )\iff {\color {red}-{\mathcal {L}}(\theta _{1};\mathbf {x} )>-{\frac {1}{k}}{\mathcal {L}}(\theta _{0};\mathbf {x} )})\\&={\frac {1}{k}}\int \dotsi \int _{R\setminus R^{*}}^{}{\mathcal {L}}(\theta _{0};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}+{\frac {1}{k}}\int \dotsi \int _{R\cap R^{*}}^{}{\mathcal {L}}(\theta _{0};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}-\left({\frac {1}{k}}\int \dotsi \int _{R^{*}\setminus R}^{}{\mathcal {L}}(\theta _{0};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}+{\frac {1}{k}}\int \dotsi \int _{R\cap R^{*}}^{}{\mathcal {L}}(\theta _{0};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}\right)\\&={\frac {1}{k}}\int \dotsi \int _{R}^{}{\mathcal {L}}(\theta _{0};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}-{\frac {1}{k}}\int \dotsi \int _{R^{*}}^{}{\mathcal {L}}(\theta _{0};\mathbf {x} )\,dx_{n}\cdots \,dx_{1}\\&={\frac {1}{k}}{\bigg (}{\color {brown}\underbrace {\mathbb {P} _{\theta _{0}}((X_{1},\dotsc ,X_{n})\in R)} _{=\alpha }}-{\color {purple}\underbrace {\mathbb {P} _{\theta _{0}}((X_{1},\dotsc ,X_{n})\in R^{*})} _{\leq \alpha }}{\bigg )}\\&\geq {\frac {1}{k}}(\alpha -\alpha )=0,\end{aligned}}}

we have ${\displaystyle \pi _{\phi }(\theta _{1})\geq \pi _{\psi }(\theta _{1})}$  as desired.

For the case where the underlying distribution is discrete, the proof is very similar (just replace the integrals with sums), and hence omitted.

${\displaystyle \Box }$

Remark.

• Sometimes, we call ${\displaystyle {\frac {{\mathcal {L}}(\theta _{0};\mathbf {x} )}{{\mathcal {L}}(\theta _{1};\mathbf {x} )}}}$  as the likelihood ratio.
• In fact, the MP test constructed by Neyman-Pearson lemma is a variant from the likelihood-ratio test, which is more general in the sense that the likelihood-ratio test can also be constructed for composite null and alternative hypotheses, apart from simple null and alternative hypotheses directly. But, the likelihood-ratio test may not be (U)MP. We will discuss likelihood-ratio test later.
• For the discrete distribution, it may be impossible to determine a ${\displaystyle k}$  for the rejection ${\displaystyle R}$  for some ${\displaystyle \alpha }$ . In this case, we call such ${\displaystyle \alpha }$  to be not attainable.
• Intuitively, this test means that we should reject ${\displaystyle H_{0}}$  when "likelihood" of ${\displaystyle H_{0}}$  (${\displaystyle {\mathcal {L}}(\theta _{0};\mathbf {x} )}$ ) is not as large as the "likelihood" of ${\displaystyle H_{1}}$  (${\displaystyle {\mathcal {L}}(\theta _{1};\mathbf {x} )}$ ) (${\displaystyle {\mathcal {L}}(\theta _{0};\mathbf {x} )\leq k{\mathcal {L}}(\theta _{0};\mathbf {x} )}$ ), with respect to the observed samples. For the meaning of "not as large as", it depends on the size ${\displaystyle \alpha }$ .
• Intuitively, we will expect that ${\displaystyle k}$  should be a positive value that is strictly less than 1, so that ${\displaystyle H_{0}}$  is "less likely" than ${\displaystyle H_{1}}$ . This is usually, but not necessarily, the case. Particularly, when the size ${\displaystyle \alpha }$  is large, ${\displaystyle k}$  may be greater than 1.
• Typically, to determine the value of ${\displaystyle k}$ , we need to transform "${\displaystyle {\frac {{\mathcal {L}}(\theta _{0};\mathbf {x} )}{{\mathcal {L}}(\theta _{1};\mathbf {x} )}}\leq k}$ " to another equivalent inequality for which the probability under ${\displaystyle H_{0}}$  is easier to be calculated.
• It must be equivalent, so that its probability under ${\displaystyle H_{0}}$  is the same as probability of "${\displaystyle {\frac {{\mathcal {L}}(\theta _{0};\mathbf {x} )}{{\mathcal {L}}(\theta _{1};\mathbf {x} )}}\leq k}$ " under ${\displaystyle H_{0}}$ . As a result, during the process of the transformation, it is better to use "${\displaystyle \iff }$ ", instead of just "${\displaystyle \implies }$ ", or even writing different inequalities line by line.
• If ${\displaystyle \theta }$  is a vector, then ${\displaystyle \theta _{0}}$  and ${\displaystyle \theta _{1}}$  should also be vectors.

Even if the hypotheses involved in the Neyman-Pearson lemma are simple, with some conditions, we can use the lemma to construct a UMP test for testing composite null hypothesis against composite alternative hypothesis. The details are as follows: For testing ${\displaystyle H_{0}:\theta \leq \theta _{0}\quad {\text{vs.}}\quad H_{1}:\theta >\theta _{0}}$

1. Find a MP test ${\displaystyle \varphi }$  with size ${\displaystyle \alpha }$ , for testing ${\displaystyle H_{0}:\theta =\theta _{0}\quad {\text{vs.}}\quad H_{1}:\theta =\theta _{1}>\theta _{0}}$  using the Neyman-Pearson lemma, where ${\displaystyle \theta _{1}}$  is an arbitrary value such that ${\displaystyle \theta _{1}>\theta _{0}}$ .
2. If the rejection region ${\displaystyle R}$  does not depend on ${\displaystyle \theta _{1}}$ , then the test ${\displaystyle \varphi }$  has the greatest power for each ${\displaystyle \theta \in \Theta _{1}=\{\vartheta :\vartheta >\theta _{0}\}}$ . So, the test ${\displaystyle \varphi }$  is a UMP test with size ${\displaystyle \alpha }$  for testing ${\displaystyle H_{0}:\theta =\theta _{0}\quad {\text{vs.}}\quad H_{1}:\theta >\theta _{0}}$
3. If we can further show that ${\displaystyle \sup _{\theta \leq \theta _{0}}\pi _{\varphi }(\theta )=\alpha =\pi _{\varphi }(\theta _{0})}$ , then it means that the size of the test ${\displaystyle \varphi }$  is still ${\displaystyle \alpha }$ , even if the null hypothesis is changed to ${\displaystyle H_{0}:\theta \leq \theta }$ . So, after changing ${\displaystyle H_{0}:\theta =\theta _{0}}$  to ${\displaystyle H_{0}:\theta \leq \theta _{0}}$  and not changing ${\displaystyle H_{1}}$  (also adjusting the parameter space) for the test ${\displaystyle \varphi }$ , the test ${\displaystyle \varphi }$  still satisfies the "MP" requirement (because of not changing ${\displaystyle H_{1}}$ , so the result in step 2 still applies), and also the test ${\displaystyle \varphi }$  will satisfy the "size" requirement (because of changing ${\displaystyle H_{0}}$  in this way). Hence, the test ${\displaystyle \varphi }$  is a UMP test with size ${\displaystyle \alpha }$  for testing ${\displaystyle H_{0}:\theta \leq \theta _{0}\quad {\text{vs.}}\quad H_{1}:\theta >\theta _{0}}$ .

For testing ${\displaystyle H_{0}:\theta \geq \theta _{0}\quad {\text{vs.}}\quad H_{1}:\theta <\theta _{0}}$ , the steps are similar. But in general, there is no UMP test for testing ${\displaystyle H_{0}:\theta =\theta _{0}\quad {\text{vs.}}\quad H_{1}:\theta \neq \theta _{0}}$ .

Of course, when the condition in step 3 holds but that in step 2 does not hold, the test ${\displaystyle \varphi }$  in step 1 is a UMP test with size ${\displaystyle \alpha }$  for testing ${\displaystyle H_{0}:\theta \leq \theta _{0}\quad {\text{vs.}}\quad H_{1}:\theta =\theta _{1}}$  where ${\displaystyle \theta _{1}}$  is a constant (which is larger than ${\displaystyle \theta _{0}}$ , or else ${\displaystyle H_{1}}$  and ${\displaystyle H_{0}}$  are not disjoint). However, the hypotheses are generally not in this form.

Example. Let ${\displaystyle X_{1},\dotsc ,X_{10}}$  be a random sample from the normal distribution ${\displaystyle {\mathcal {N}}(\mu ,1)}$ .

(a) Construct a MP test ${\displaystyle \varphi }$  with size 0.05 for testing ${\displaystyle H_{0}:\mu =20\quad {\text{vs.}}\quad H_{1}:\mu =21}$ .

(b) Hence, show that the test ${\displaystyle \varphi }$  is also a UMP test with size 0.05 for testing ${\displaystyle H_{0}:\mu =20\quad {\text{vs.}}\quad H_{1}:\mu >20}$ .

(c) Hence, show that the test ${\displaystyle \varphi }$  is also a UMP test with size 0.05 for testing ${\displaystyle H_{0}:\mu \leq 20\quad {\text{vs.}}\quad H_{1}:\mu >20}$ .

Solution. (a) We can use the Neyman-Pearson lemma. First, consider the likelihood ratio

${\displaystyle {\frac {{\mathcal {L}}(20)}{{\mathcal {L}}(21)}}={\frac {{\cancel {\left({\frac {1}{\sqrt {2\pi (1)}}}\right)^{10}}}\prod _{i=1}^{10}\exp \left(-{\frac {(x_{i}-20)^{2}}{2}}\right)}{{\cancel {\left({\frac {1}{\sqrt {2\pi (1)}}}\right)^{10}}}\prod _{i=1}^{10}\exp \left(-{\frac {(x_{i}-21)^{2}}{2}}\right)}}=\exp \left(-{\frac {1}{2}}\sum _{i=1}^{10}{\big [}(x_{i}-20)^{2}-(x_{i}-21)^{2}{\big ]}\right)=\exp \left(-{\frac {1}{2}}\sum _{i=1}^{10}{\big [}{\cancel {x_{i}^{2}}}-40x_{i}+400{\cancel {-x_{i}^{2}}}+42x_{i}-441{\big ]}\right)=\exp \left(-{\frac {1}{2}}\sum _{i=1}^{10}{\big [}2x_{i}-41{\big ]}\right)=\exp \left({\frac {41}{2}}-\sum _{i=1}^{10}x_{i}\right).}$

Now, we have
${\displaystyle {\frac {{\mathcal {L}}(20)}{{\mathcal {L}}(21)}}\leq k'\iff \exp \left({\frac {41}{2}}-10{\overline {x}}\right)\leq k'\iff -10{\overline {x}}\leq k''\iff {\overline {x}}\geq k}$

where ${\displaystyle k,k',k''}$  are some constants. To find ${\displaystyle k}$ , consider the size 0.05:
${\displaystyle 0.05=\mathbb {P} _{\mu =20}({\overline {X}}\geq k)=\mathbb {P} _{\mu =20}\left({\frac {{\overline {X}}-20}{1/{\sqrt {10}}}}\geq {\frac {k-20}{1/{\sqrt {10}}}}\right)=\mathbb {P} (Z\geq {\sqrt {10}}(k-20)).}$

(${\displaystyle Z\sim {\mathcal {N}}(0,1)}$ ) Hence, we have ${\displaystyle {\sqrt {10}}(k-20)\approx 1.64\implies k\approx 20.51861}$ . Now, we can construct the rejection region:
${\displaystyle R=\{(x_{1},\dotsc ,x_{n}):{\overline {x}}\geq 20.51861\},}$

and the test ${\displaystyle \varphi }$  with the rejection region ${\displaystyle R}$  is a MP test with size 0.05 for testing ${\displaystyle H_{0}:\mu =20\quad {\text{vs.}}\quad \mu =21}$ . (b)

Proof. Let ${\displaystyle \mu _{1}}$  be an arbitrary value such that ${\displaystyle \mu _{1}>20}$ . Then, we can show that (see the following exercise)

${\displaystyle {\frac {{\mathcal {L}}(20)}{{\mathcal {L}}(\mu _{1})}}\leq k'\iff {\overline {x}}\geq k}$

where ${\displaystyle k,k'}$  are some constants (may be different from the above constants). Since ${\displaystyle H_{0}}$  here is the same as ${\displaystyle H_{0}}$  in (a), the rejection region constructed is also
${\displaystyle R=\{(x_{1},\dotsc ,x_{n}):{\overline {x}}\geq 20.51861\}.}$

Notice that ${\displaystyle R}$  does not depend on the value of ${\displaystyle \mu _{1}}$ . It follows that the test ${\displaystyle \varphi }$  is a UMP test with size 0.05 for testing ${\displaystyle H_{0}:\mu =20\quad {\text{vs.}}\quad \mu >20}$ .

${\displaystyle \Box }$

(c)

Proof. It suffices to show that ${\displaystyle \sup _{\mu \leq 20}\pi _{\varphi }(\mu )=0.05{\overset {\text{(a)}}{=}}\pi _{\varphi }(20)}$ . First let us consider the power function

${\displaystyle \pi _{\varphi }(\mu )=\mathbb {P} _{\mu }({\overline {X}}\geq 20.51861)=\mathbb {P} _{\mu }(Z\geq {\sqrt {10}}(20.51861-\mu ))=1-\Phi ({\sqrt {10}}(20.51861-\mu ))}$

where ${\displaystyle \Phi (\cdot )}$  is the cdf of ${\displaystyle Z\sim {\mathcal {N}}(0,1)}$ . Now, since when ${\displaystyle \mu }$  increases, ${\displaystyle {\sqrt {10}}(20.51861-\mu )}$  decreases and hence ${\displaystyle \Phi ({\sqrt {10}}(20.51861-\mu ))}$  decreases, it follows that the power function ${\displaystyle \pi _{\varphi }(\mu )}$  is a strictly increasing function of ${\displaystyle \mu }$ . Hence,
${\displaystyle \sup _{\mu \leq 20}\pi _{\varphi }(\mu )=\max _{\mu \leq 20}\pi _{\varphi }(\mu )=\pi _{\varphi }(20)=0.05.}$

Then, the result follows.

${\displaystyle \Box }$

Exercise. Show that

${\displaystyle {\frac {{\mathcal {L}}(20)}{{\mathcal {L}}(\mu _{1})}}\leq k'\iff {\overline {x}}\geq k}$

for every ${\displaystyle \mu _{1}>20}$ .
Solution

Proof. First, consider the likelihood ratio

${\displaystyle {\frac {{\mathcal {L}}(20)}{{\mathcal {L}}(\mu _{1})}}=\exp \left(-{\frac {1}{2}}\sum _{i=1}^{10}{\big [}(x_{i}-20)^{2}-(x_{i}-\mu _{1})^{2}{\big ]}\right)=\exp \left(-{\frac {1}{2}}\sum _{i=1}^{10}{\big [}{\cancel {x_{i}^{2}}}-40x_{i}+400{\cancel {-x_{i}^{2}}}+2\mu _{1}x_{i}-\mu _{1}^{2}{\big ]}\right)=\exp \left(-{\frac {1}{2}}\sum _{i=1}^{10}{\big [}(2\mu _{1}-40)x_{i}+400-\mu _{1}^{2}{\big ]}\right)=\exp \left({\frac {\mu _{1}^{2}-400}{2}}-(\mu _{1}-20)\sum _{i=1}^{10}x_{i}\right).}$

Then, we have
${\displaystyle {\frac {{\mathcal {L}}(20)}{{\mathcal {L}}(\mu _{1})}}\leq k'\iff \exp \left({\frac {\mu _{1}^{2}-400}{2}}-(\mu _{1}-20)n{\overline {x}}\right)\leq k'\iff -(\mu _{1}-20)n{\overline {x}}\leq k''\iff {\overline {x}}\geq k.}$

(The last equivalence follows since ${\displaystyle \mu _{1}>20}$ .)

${\displaystyle \Box }$

Remark.

• This rejection region has appeared in a previous example.

Now, let us consider another example where the underlying distribution is discrete.

Example. Let ${\displaystyle X}$  be a discrete random variable. Its pmf is given by

${\displaystyle {\begin{array}{c|ccccccccc}\theta &x&1&2&3&4&5&6&7&8\\\hline 0&f(x;\theta )&0&0.02&0.02&0.02&0.02&0.02&0.02&0.88\\1&f(x;\theta )&0.01&0.02&0.03&0.04&0.05&0&0.06&0.79\\\end{array}}}$

(Notice that the sum of values in each row is 1. The parameter space is ${\displaystyle \Theta =\{0,1\}}$ .) Given a single observation ${\displaystyle x}$ , construct a MP test with size 0.1 for testing ${\displaystyle H_{0}:\theta =0\quad {\text{vs.}}\quad H_{1}:\theta =1}$ .

Solution. We use the Neyman-Pearson lemma. First, we calculate the likelihood ratio ${\displaystyle f(x;0)/f(x;1)}$  for each value of ${\displaystyle x}$ :

${\displaystyle {\begin{array}{ccccccccc}x&1&2&3&4&5&6&7&8\\\hline {\frac {f(x;0)}{f(x;1)}}&0&1&0.667&0.5&0.4&{\text{undefined}}&0.333&1.114\end{array}}}$

For convenience, let us sort the likelihood ratios in ascending order (we put the undefined value at the last):
${\displaystyle {\begin{array}{ccccccccc}x&1&7&5&4&3&2&8&6\\\hline {\frac {f(x;0)}{f(x;1)}}&0&0.333&0.4&0.5&0.667&1&1.114&{\text{undefined}}\end{array}}}$

By Neyman-Pearson lemma, the MP test with size 0.1 for testing ${\displaystyle H_{0}:\theta =0\quad {\text{vs.}}\quad H_{1}:\theta =1}$  is a test with size 0.1 and rejection region
${\displaystyle R=\left\{x:{\frac {f(x;0)}{f(x;1)}}\leq k\right\}.}$

So, it remains to determine ${\displaystyle R}$ . Since the size is 0.1, we have
${\displaystyle 0.1=\alpha (0)=\mathbb {P} _{\theta =0}(X\in R).}$

Notice that
${\displaystyle \mathbb {P} _{\theta =0}(X=1)+\mathbb {P} _{\theta =0}(X=7)+\mathbb {P} _{\theta =0}(X=5)+\mathbb {P} _{\theta =0}(X=4)+\mathbb {P} _{\theta =0}(X=3)+\mathbb {P} _{\theta =0}(X=2)=0+0.02+0.02+0.02+0.02+0.02=0.1.}$