Fisher's Exact Test (2x2)
- Very common in smaller medical studies
- Appropriate for binary data
- Evaluates association between two binary classifications
- "Group 1" vs. "Group 2"
- For example: male/female, treatment given/no treatment given, treatment type 1/treatment type 2, high dose/low dose, etc.
- "Success" vs. "Failure"
- For example: alive/dead, response/no response, recurrence/no recurrence, etc.
- This defines a 2x2 matrix
- Used for small sample sizes (typically less than 50), due to limitation on computational power. Please see Χ2 Test for more information on approximations
- Hypothesis tested (H0) is that there is no association between the two classifications
- Fisher's test determines the degree to which this hypothesis is consistent with the data
- Probability of correlation of Group 1 with Outcome 1 (e.g. male - survive) is p1
- Probability of correlation of Group 2 with Outcome 2 (e.g. female - survive) is p2
- The test hypothesis (H0) is that there is no correlation, and that p1 = p2
- Therefore, the population "success" rate (Outcome 1) p0 = p1 = p2, and can be calculated as (Group 1 Outcome 1 + Group 2 Outcome 1) / (Total Population)
- Observation: We observe a given total number of Outcome 1 in the study
- Question is, are the Outcomes 1 divided between Group 1 and Group 2 such that p1 = p2, or are they split up such that it is not likely that p1 = p2?
- To objectively evaluate this question, we set up the tables showing all the possible outcomes that we could have seen, given the fixed number of cases in Group 1 and Group 2, and the fixed number of Outcome 1 and Outcome 2.
- Test statistic (T) is the number of Outcome 1 ("success") in Group 1.
- Given that the number of cases in Group 1 and Group 2 is known, and the number of Outcome 1 and Outcome 2 events is known, knowing T determines all the other points in the matrix
|Outcome 1||Outcome 2||Total|
- We then evaluate the probability that each possible outcome (including the actual observed outcome) would occur by chance from a random sampling
- Binomial coefficient calculations are done to find the probability that a table T (range of t) would be observed by chance in a random sampling:
- However, as these are computationally intensive, statistical tables or software is generally used to evaluate probability of T
- The probability of the actual outcome combined with the probability of the even less likely outcomes occurring due to chance random sampling alone defines the significance level (p)
- If the probability of these outcomes is <5% (p<0.05), typically the test hypothesis that there is no difference between the populations is rejected
- Probability of Outcome for each member of a given Group is the same; it does not vary from member to member. Random sampling ensures this
- The Outcome of one member does not affect the outcome of a different member
- Adapted from PMID 6092550 as shown in Using and Understanding Medical Statistics
- In our example above, there are 4 total relapses observed. The question is, is the rate of relapse related to treatment type (large field RT vs. small field RT)?
- If the H0 hypothesis is true, and the two failure rates are comparable, and are also comparable to the failure rate in the entire population, then p0 = 4/259 = 0.015
- The expected number of failures observed in the Small Field RT group would therefore be 23 patients x 0.015 = 0.4 failures
- However, 2 failures were observed. Is this likely due to chance, or not?
- There are 5 possibilities of how the 4 failures could have been observed in a random sampling (with Possibility 2 being the actual observation):
- The corresponding relapse rates for the Small Field RT (Group 1) are:
|Possibility 0||Possibility 1||Possibility 2 (Observed)||Possibility 3||Possibility 4||(Expected)|
|Small field RT||0.000||0.044||0.087||0.130||0.174||0.015|
- We then need to calculate the probability that each of the possibilities occurs (something typically done by professional statisticians, for more detail see below)
|Possibility 0||Possibility 1||Possibility 2 (Observed)||Possibility 3||Possibility 4||Total|
|Small field RT||0.687||0.271||0.039||0.002||0.0001||1.000|
- Our original hypothesis is that the probability of failure in Small Field RT is the same as in Large Field RT, which is the same as in the entire underlying population treated with RT. Possibility 0 (68.7%) and Possibility 1 (27.1%) would have been reasonably consistent with this hypothesis
- The probability that the actual observation (Possibility 2) occurs as a result of the random sampling process is 0.039 (3.9%), which is not as reasonable. Possibilities 3-5 are even less likely to be reasonable than the actual observation (0.2%, 0.01%)
- The significance level is the sum of probability of the observed occurrence (3.9%) and the probabilities of the even less likely possibilities (0.2%, 0.01%) = 4.11%; expressed as p=0.04
- In this example, the hypothesis that there is no difference between the two groups with respect to the two outcomes is rejected. Conclusion: Small Field RT results in significantly more failures
- Wikipedia Fisher's Exact Test