Statistics Ground Zero/Comparing groups or variables

Comparing groups or variables

Comparison of means

These tests answer the questions

Are these groups alike with respect to this phenomenon?
Are these phenomena alike in this group?

Examples will help clarify.

Let us take the first case, we can ask Is the average height of male students the same as that of female students? In this case we have two independent groups of subjects and a scalar variable and we will compare the mean scores.

For the second case, imagine that we test all of the students basic arithmetic skills; we then administer a single dose of fish oil and a day later we test their basic arithmetic skills again. We want to answer the question Does a dose of fish oil improve test performance in basic arithmetic? In this case we have one group but two variables - one pre-treatment variable and one post-treatment variable. The variables are both scalar and we will compare mean scores. This is sometimes called a repeated measures design. A further example of this second case might involve asking whether students who study both languages do as well in examinations in French as they do in Spanish. We compare all the French examination scores with all the Spanish examination scores.

It is likely that in either case the two means will never be identical and so we want to know not are they different, but are they statistically different or equivalently significantly different. So we will in each case calculate a test statistic, determine the degrees of freedom and evaluate the significance of the outcome.

One tail or two?

Notice that in my first example I ask whether the mean heights of two groups are the same. This question is answered negatively if the average height of, for example, male students is higher than for female students or if it is lower than for female students. This is a two tailed test.

In the second example I ask whether the dose of fish oil improves the performance of students. This question is answered negatively only if the post-treatment performance is less than or equal to the pre-treatment performance. This is a one tailed test.

Independent Samples t-test

The independent samples t-test is used to compare samples from two different populations. The assumption is that there is no relation between the two samples. There is no necessity for randomization of the two groups, so if we collect the data for a group of students and then group them by gender for comparison this meets the criterion of belonging to independent groups.

Equality of variance

For independent groups, the test is complicated by the groups' variances and before we compute the statistic we should determine whether the variances are (more or less) equal or otherwise. This can be done by calculating Levene′s F. The null hypothesis is that variances are equal. If Levene′s F is significant then the variances are not equal. Depending on the software package you use, you will need to either carry out the test of variances before carrying out the t-test or it will be provided automatically and you will interpret the results you get in the light of the result of Levene's test.

Interpreting t

The null hypothesis for the test is that there is no difference between the mean scores of the two groups. If the confidence level is set to 95% you reject this with p<0.05 and accept the alternative hypothesis that the mean scores differ. The degrees of freedom for the t-test are given by N-2.

Paired Samples t-test

The paired samples t-test is used to compare the same subjects on two different variables. The variables might represent two quite different scores, for example competence in two different languages, or on one score at different times or under different conditions, for example resting heart rate before and after exercise.

The paired samples t-test is similar in practice to the independent samples test with the difference that there is no requirement to test for homogeneity of variance. SPSS, for example, returns just one value for t and its significance. As before the null hypothesis is that there is no difference - this time between the two variable means - and the alternative that there is. If the confidence level is set to 95% we would reject the null hypothesis with p<0.05.

Examples

Independent samples t-test

Consider the following data collected about a class of 30 school students: 15 boys and 15 girls. We have their gender coded 1 for a girl and 2 for a boy and their score on a maths exam. We can ask if on the basis of this data girls and boys really score differently on the exam. An independent samples t-test will compare mean scores for the two groups and tell us if they are significantly different.

Girls	44	45	48	50	51	52	53	53	57	58	59	60	62	63	64
Boys	39	42	47	50	52	52	54	55	55	56	56	56	58	60	62

We will call the mean for the boys alone μ-boys and for the girls alone μ-girls. Here are the means for the two groups along with the standard deviations as an indication of the similarity of variances.

Gender	N	Mean	Std. Deviation
Girls	15	54.60	6.401
Boys	15	52.93	6.296

The null hypothesis is that μ-boys=μ-girls. We will calculate the F statistic to check for homogeneity of variance and then the t-statistic. We set the confidence level to 95% and therefore we reject the null hypothesis if p<0.05.

F=0.291 (p=0.594) - since this is not significant we assume the variances are equal
t = 0.719 (df = 28, p=0.478)

Since the t statistic is not significant we cannot reject the null hypothesis and we accept that μ-boys=μ-girls.

Analysis of variance (ANOVA)

The t statistic can only be computed for two groups or variables. It is frequently the case that we are interested in more than two groups or more than two levels of a variable. Consider for example the following: we want to know whether maths scores vary systematically with eye colour. We could call the mean maths score of those with blue eyes μ-blue, those with brown eyes μ-brown and the rest μ-other. So the null hypothesis would be that there is no difference in these means, that is μ-blue=μ-brown=μ-other.

We test this with the analysis of variance or ANOVA.

The ANOVA tries to determine if the mean scores of the groups that we observe are drawn from the same population by looking at the variance. Specifically we calculate (or rather our software does!) the sum of squares for each group and the sum of squares for the whole data set. By examining the ratio of the within group to between group variance we can determine if they are in fact all drawn form the same population. If they are from the same population then we would expect the between group variance to be less than the within group variance and conversely if they are drawn from three different populations then the within group variance should be less than the between group variance.

To answer the question we have set, we are considering a one way ANOVA. The statistic produced is F and the degrees of freedom are calculated as number of groups - 1. The ANOVA can only be relied upon if the variance of the groups is more or less equal, so that we should first check this with Levene's test (as we would for an independent samples t-test).

A variant of ANOVA is the repeated measures comparison. In this test we look at the scores of one group of subjects after each of a number of repeated treatments. For a repeated measures design there is in addition to the usual assumptions of the ANOVA the assumption of sphericity.

Example applications of ANOVA

Effect of treatment on more than two groups

We treat different strains of cell with some chemical treatment and after waiting a certain time we measure cell growth to see if it is the same for all strains.

A single factor ANOVA example

A class of thirty students were taught mathematics in one of three classes each of ten pupils. The classes were identified by number 1, 2 or 3. The teacher in each class pursued a different teaching strategy for the same material and at the end of the year students sat an examination. We wish to know if the students exam results were affected by which class they were in. The examination results were used in a one way ANOVA with a confidence level of 95%.

Levene's test gave the following results

F = 0.8581, (p=0.4352)

So we cannot reject the null hypothesis of the test for homogeneity of variance and accept that the variances of the three class examination scores are equal.

The means are

Class	Class 1	Class 2	Class 3
Mean	51.10	53.70	56.50

And the ANOVA results are

F = 1.962 (df=2, p=0.16)

Since the F statistic is not significant we accept the null hypothesis which is that the Mathematics examination scores do not differ across the classes.

Effect of repeated treatment: a repeated measures example

We test the IQ of a group of students and administer a dose of fish oil. The treatments are repeated monthly over six months and on each occasion of treatment the dosage is increased. After each treatment with fish oil we test their IQ again to see if the different levels of treatment have measurably different effects. (This experimental designed is plausibly very flawed but it does provide a simple example of the repeated measures strategy). We are comparing the mean score at each level of treatment and the null hypothesis is that μ_level1=μ_level2=μ_leveln.

Non-parametric comparison of two groups or variables

Let us consider an example where the variable of interest is not a scalar variable but an ordinal or ranked variable. We can imagine that we are comparing Australia and the United States results in an international swimming competition (rather strangely no other countries take part).

Can we say whether their rankings are more of less similar or whether one or other tends to rank higher?

Or we might consider a single group of swimmers and ask, do they rank equally in freestyle and butterfly races?

In each case notice that we are comparing rankings. One possible interpretation is that we are testing for the equality of central location (for example, median).

Mann-Whitney U test (unpaired)

The Mann-Whiteney U test compares two independent groups of ranked observations and determines whether one is greater than the other. The null hypothesis is that the distribution of rankings is equal in both groups and specifically that if I select an observation from the first of my groups, call the observation a and then one from the second group, call it b, then test compares the probability that a>b is to the probability that b>a. The alternative or experimental hypothesis is that the probability that a=b is less than 0.5, so that either the probability a>b is greater than 0.5 or that the probability a<b is greater than 0.5. The alternative hypothesis has both one tailed and two tailed formulations.