A Wikibookian has nominated this page for cleanup because:Copy editing necessary |

## Purpose of Statistical TestsEdit

In general, the purpose of statistical tests is to determine whether some hypothesis is extremely unlikely given observed data.

There are two common philosophical approaches to such tests, *significance testing* (due to Fisher) and *hypothesis testing* (due to Neyman and Pearson).

**Significance testing** aims to quantify evidence against a particular hypothesis being true. We can think of it as testing to guide research. We believe a certain statement may be true and want to work out whether it is worth investing time investigating it. Therefore, we look at the opposite of this statement. If it is quite likely then further study would seem to not make sense. However if it is extremely unlikely then further study would make sense.

A concrete example of this might be in drugs testing. We have a number of drugs that we want to test and only limited time, so we look at the hypothesis that an individual drug has no positive effect whatsoever, and only look further if this is unlikley.

**Hypothesis testing** rather looks at evidence for a particular hypothesis being true. We can think of this as a guide to making a decision. We need to make a decision soon, and suspect that a given statement is true. Thus we see how unlikely we are to be wrong, and if we are sufficiently unlikely to be wrong we can assume that this statement is true. Often this decision is final and cannot be changed.

Statisticians often overlook these differences and incorrectly treat the terms "significance test" and "hypothesis test" as though they are interchangeable.

A data analyst frequently wants to know whether there is a difference between two sets of data, and whether that difference is likely to occur due to random fluctuations, or is instead unusual enough that random fluctuations rarely cause such differences.

In particular, frequently we wish to know something about the average (or mean), or about the variability (as measured by variance or standard deviation).

Statistical tests are carried out by first making some assumption, called the Null Hypothesis, and then determining whether the data observed is unlikely to occur given that assumption. If the probability of seeing the observed data is small enough under the assumed Null Hypothesis, then the Null Hypothesis is rejected.

A simple example might help. We wish to determine if men and women are the same height on average. We select and measure 20 women and 20 men. We assume the Null Hypothesis that there is no difference between the average value of heights for men vs. women. We can then test using the t-test to determine whether our sample of 40 heights would be unlikely to occur given this assumption. The basic idea is to assume heights are normally distributed, and to assume that the means and standard deviations are the same for women and for men. Then we calculate the average of our 20 men, and of our 20 women, we also calculate the sample standard deviation for each. Then using the t-test of two means with 40-2 = 38 degrees of freedom we can determine whether the difference in heights between the sample of men and the sample of women is sufficiently large to make it unlikely that they both came from the same normal population.