Radiation Oncology/Medical Statistics/Chi Squared



Χ2 (Chi-Squared)


Overview

edit
  • Used for comparison of two classifications schemes, which may each have multiple categories
  • Purpose is to determine the probability that observed data are (or are not) consistent with the hypothesis H0: the probability of outcomes in the different groups is the same
  • Used to approximate Fisher's Exact Test (2x2) for large numbers:
    • Accuracy of estimation depends on the total number of observations in each cell
    • Expected observations number (calculated from actual observations; see below) should be at least 5 in each cell
  • Used to extend Fisher's Exact Test for comparison of classification schemes with >2 categories

Χ2 for 2x2 Table

edit
  • Used for tables, which are too large for Fisher's Exact Test
  • The process is parallel; please see that page for details of initial set-up
2x2 Table
  Outcome 1 Outcome 2 Total
Group 1 O11 O12 R1
Group 2 O21 O22 R2
Total C1 C2 N
 
2x2 Table Example
  Basketball Softball/Baseball Total
Boys 12 14.7 27
Girls 13 15.3 28
Total 25 30 55
  • Start by assuming that H0 is true, and that p0 = p1 = p2
  • Calculate the expected 2x2 table based on the observed total numbers
    • Expected population "success rate" is p0 = C1 / N
    • Using p0, and the observed Group 1/2, Outcome 1/2 numbers, calculate the expected 2x2 table
  • Compare the expected table to the observed table, by calculating test statistic T
  • One way of calculating T is to evaluate the proportional difference in each cell between the observed and expected values, and then sum them all
    • T = ((O11-E11)2/E12) + ((O12-E12)2/E12) + ((O21-E21)2/E21) + ((O22-E22)2/E22)
  • After some nifty mathematics, this can more simply be calculated from the original observed table
    • T = N * (|O11 * O22 - O12 * O21| - 1/2*N)2 / R1 * R2 * C1 * C2
  • Because T is derived from observed-expected difference, the larger the T, the more different the tables are, and the less likely H0
  • In order to calculate the significance level, we need to evaluate the probability that the observed table was due to random sampling, which is related to the size of T. We also need to evaluate the probability of all the other possible tables that could have been observed (again, same as in Fisher's test)
  • When H0 is true, the probability distribution of T is approximately the same as the probability distribution for the Χ2 function
  • We can therefore approximately determine the probability of observed T by evaluating the Χ2 function at the T level (by looking it up in a table)
  • Because these are approximations, the table typically gives critical values:
Χ2 for 2x2 Table
Probability 0.25 0.10 0.05 0.01 0.005 0.001
T 1.323 2.706 3.841 6.635 7.879 10.83
  • This is the probability that the observed outcome (and any possible outcomes less likely than this one) occurred due to random sampling only

Χ2 for 2x2 Table Example

edit
Observed
  Graft Rejected Engraftment
Low cell dose 17 19
High cell dose 4 28
 
Expected (Calculated)
  Graft Rejected Engraftment
Low cell dose 11 25
High cell dose 10 22
  • T = 8.01
  • From the Χ2 table above, p is between 0.005 and 0.001.
  • We can therefore conclude that p < 0.005 and that high cell dose correlates strongly with engraftment