# Biostatistics with R/Printable version

Biostatistics with R

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/Biostatistics_with_R

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

# Biostatistics with R authors

The text of this book is released under the terms of the Creative Commons Attribution-ShareAlike 3.0 and GNU Free Documentation License. The particular version of that license that is being used can be found at:

Images used in this document are available under various licenses. Clicking on the image will take you to a description page where the licensing information is displayed.

# Import

## Why R for biostatistics?

R is superior to common statistical packages such as SPSS, SAS and MINITAB because it is

• powerful
• available for many platforms (Mac OS X, Windows, Linux etc.)
• programmable
• non-commercial
• extensively documented

## Obtaining R/Installation

You may refer to R FAQ

## Data Import

The format of data set available in Wiley's website are CSV, Excel, MINITAB, SAS and SPSS. Although you can import the data saved in Excel, SAS and SPSS into R using the foreign package, you should download the data in CSV format. It is because CSV is the easiest one to process in R.

For example, you would like to import the "Large Data set" data file. The downloaded data file (LDS_C02_NCBIRTH800.csv) , assuming stored in the directory "/desktop",can be imported into R as a data.frame called "largedataset" using following syntax:

if you prefer to choose the data file using the standard "point-and-click" GUI way, you may use the function file.choose(), i.e.

Now, you should imported the data from the CSV to a data frame called "largedataset". You may try to look inside the data frame by calling its name

> largedataset

You can access the variable (in computer lingo, column) "sex" inside the largedataset dataframe by

largedataset$sex For example, you want to count the frequency of sex > table(largedataset$sex)

You can attach the data frame so that you can call the variable directly

> attach(largedataset)
> table(sex)
> detach() #cancel attaching

## Basic data management

R is designed to be a analysis system instead of a integrated environment such as SPSS. Unlike SPSS, R doesn't have a spreadsheet-like environment for data input. Usually data are entered using different software (e.g. database, spreadsheet software such as OO.o Calc) and then imported to R as described above. For quick one-off calculations, you can do the data entry in R. For example, if you want to calculate the mean age of ten patients (30,31,32,34,35,36,37,30,40,45) you can enter the data into R using the c() function.

> pt_age <- c(30,31,32,34,35,36,37,30,40,45)

You may call the newly created object pt_age by its name...

> pt_age

...and then calculate the mean age of the ten patients.

> mean (pt_age)

# Introduction to Biostatistics

REVIEW EXERCISES

1. Explain what is meant by descriptive statistics.

2. Explain what is meant by inferential statistics.

3. Define: (a) Statistics (b)Biostatistics (c) Variable (d)Quantitative variable (e) Qualitative variable (f)Random variable (g) Population (h)Finite population (i) Infinite population (j)Sample (k) Discrete variable (l)Continuous variable (m) Simple random sample (n)Sampling with replacement (o) Sampling without replacement

4. Define the word measurement.

5. List, describe, and compare the four measurement scales.

6. For each of the following variables, indicate whether it is quantitative or qualitative and specify the measurement scale that is employed when taking measurements on each: (a) Class standing of the members of this class relative to each other (b) Admitting diagnosis of patients admitted to a mental health clinic (c) Weights of babies born in a hospital during a year (d) Gender of babies born in a hospital during a year (e) Range of motion of elbow joint of students enrolled in a university health sciences curriculum (f) Under-arm temperature of day-old infants born in a hospital

7. For each of the following situations, answer questions a through e: (a) What is the sample in the study? (b) What is the population? (c) What is the variable of interest? (d) How many measurements were used in calculating the reported results? (e) What measurement scale was used? Situation A. A study of 300 households in a small southern town revealed that 20 percent had at least one school-age child present. Situation B. A study of 250 patients admitted to a hospital during the past year revealed that, on the average, the patients lived 15 miles from the hospital.

8. Consider the two situations given in Exercise 7. For Situation A describe how you would use a stratified random sample to collect the data. For Situation B describe how you would use systematic sampling of patient records to collect the data.

# Descriptive Statistics

## Summary For Formular with R

Formula

Number

Name Formula Formula with R
2.3.1 Class interval width using Sturges’s Rule ${\displaystyle w={\frac {R}{k}}}$ Example
2.4.1 Mean of a population ${\displaystyle \mu ={\sum _{i=1}^{n}{x_{i}} \over N}}$ Example
2.4.2 Skewness ${\displaystyle Skewness={\frac {{\sqrt {n}}\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{3}}{(\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{2})^{\frac {3}{2}}}}={\frac {{\sqrt {n}}\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{3}}{(n-1){\sqrt {n-1}}s^{3}}}}$ Example
2.4.2 Mean of a sample ${\displaystyle {\bar {x}}={\sum _{i=1}^{n}{x_{i}} \over N}}$ Example
2.5.1 Range ${\displaystyle R=x_{L}-x_{s}}$ Example
2.5.2 Sample variance ${\displaystyle s^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{2}}$ Example
2.5.3 Population variance ${\displaystyle \sigma ^{2}={\frac {1}{N}}\sum _{i=1}^{N}\left(x_{i}-\mu \right)^{2}}$ Example
2.5.4 Standard deviation ${\displaystyle s={\sqrt {s^{2}}}={\sqrt {{\frac {1}{n-1}}\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{2}}}}$ Example
2.5.5 Coefficient of variation ${\displaystyle C.V.={\frac {s}{\bar {x}}}}$ Example
2.5.6 Quartile location in ordered array ${\displaystyle Q_{1}={\frac {1}{4}}(n+1)}$ Example
2.5.7 Interquartile range ${\displaystyle IQR=Q_{3}-Q_{1}}$ Example
2.5.8 Kurtosis ${\displaystyle Kurtosis={\frac {\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{4}}{(\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{2})^{2}}}-3={\frac {n\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{4}}{(n-1)^{2}s^{4}}}}$ Example
Symbol Key
• ${\displaystyle C.V.}$= coefficient of variation
• ${\displaystyle IQR}$ = Interquartile range
• ${\displaystyle k}$ = number of class intervals
• ${\displaystyle m}$ = population mean
• ${\displaystyle N}$ = population size
• ${\displaystyle n}$ = sample size
• ${\displaystyle (n-1)}$=degrees of freedom
• ${\displaystyle Q_{1}}$ = first quartile
• ${\displaystyle Q_{2}}$= second quartile = median
• ${\displaystyle Q_{3}}$= third quartile
• ${\displaystyle R}$ =range
• ${\displaystyle s}$ =standard deviation
• ${\displaystyle s^{2}}$= sample variance
• ${\displaystyle \sigma ^{2}}$= population variance
• ${\displaystyle x_{i}}$=${\displaystyle i^{t}h}$ data observation
• ${\displaystyle x_{L}}$= largest data point
• ${\displaystyle x_{S}}$=smallest data point
• ${\displaystyle {\bar {x}}}$= sample mean
• ${\displaystyle w}$=class width
Example

# The Ordered Array

## The Frequency Distribution

Example 2.2.1 detailed the procedure to sort an array. This array is a series of ages in subjects received two kinds of smoking cessation program. Suppose you already import the data set using the following command:

It is better to use a descriptive name (SmokeCProg for Smoking Cessation Program) rather than commonly used place holder name such as x,y. We can obtain a sorted array of ages using the following command:

> sort(SmokeCProg$AGE) The frequency distribution of Ages as shown in table 2.3.1 can be obtained using: > table(cut(SmokeCProg$AGE, b=c(0,39,49,59,69,79,89)))
(0,39] (39,49] (49,59] (59,69] (69,79] (79,89]
11      46      70      45      16       1

cut command break up AGE variables based on the break points (0,39,49,59,69,79,89) provided. In table 2.3.2, the frequency table of age was provided. As suggested by Venables et al. in the book "An Introduction to R", statistical analysis is normally done as a series of steps, with intermediate results being stored in objects. Compared to other statistical packages, R will only give minimal output. We will demonstrate this important characteristic in this example. In previous example, we calculated the frequency distribution of Ages using table() and cut() command. We can store the results in form of a object called "AgeFreqTable" using:

> AgeFreqTable <- table(cut(SmokeCProg$AGE, b=c(0,39,49,59,69,79,89))) You will get no output. Until you call the object "AgeFreqTable" > AgeFreqTable (0,39] (39,49] (49,59] (59,69] (69,79] (79,89] 11 46 70 45 16 1 In order to obtain the cumulative frequency, we can process the object "AgeFreqTable" using cumsum() command > cumsum(AgeFreqTable) (0,39] (39,49] (49,59] (59,69] (69,79] (79,89] 11 57 127 172 188 189 Before we jump to the calculation of relative frequency, we can obtain the total number of observations in a variable using length() function > length(SmokeCProg$AGE)
[1] 189

We can calculate the relative frequency by dividing each items in the object "AgeFreqTable" by the total number of observations using

> AgeFreqTable/length(SmokeCProg$AGE) (0,39] (39,49] (49,59] (59,69] (69,79] (79,89] 0.058201058 0.243386243 0.370370370 0.238095238 0.084656085 0.005291005 Similarly, the cummulative relative frequency can be calculated using > cumsum(AgeFreqTable)/length(SmokeCProg$AGE)
(0,39]    (39,49]    (49,59]    (59,69]    (69,79]    (79,89]
0.05820106 0.30158730 0.67195767 0.91005291 0.99470899 1.00000000

If you would like to round the results of relative frequency to 4 digits, you can use the round() function

> round (AgeFreqTable/length(SmokeCProg$AGE),digits=4) (0,39] (39,49] (49,59] (59,69] (69,79] (79,89] 0.0582 0.3016 0.6720 0.9101 0.9947 1.0000 Alternatively, you can store the results of relative frequency in a new object and then process that object with round() function > AgeRelFreqTable <- AgeFreqTable/length(SmokeCProg$AGE)
> round (AgeRelFreqTable, digits=4)

Exercise: Try to round the results of cummulative relative frequency to 4 digits using R command To plot a histogram, you can use the hist() function, e.g.

> hist(SmokeCProg$AGE) You can customize the histogram by adding some arguments (i.e. options), you may type ?hist to learn more about the argument of hist() function. For example, if you want to plot a histogram with only five bars (similar to Figure 2.3.2) > hist(SmokeCProg$AGE, breaks=5)

You can add more arguments to hist() functions, e.g.

> hist(SmokeCProg$AGE, breaks=5, ylim=c(0,70), main="Histogram of Ages of 189 subjects", col="red", xlab="Age") Remember, always consult the document (e.g. ?hist or help.search("histogram") ) when you have question. In 95% of the time, you can find the answer in help document. For example, you don't know how to plot a stem-and-leaf graph to display your data. You don't even know the name of the function. You can use help.search() to search for the keyword "stem", i.e. > help.search("stem") A function called stem() should be in the results. We then try to use this function to visual our data > stem(SmokeCProg$AGE)
The decimal point is 1 digit(s) to the right of the |
3 | 04
3 | 577888899
4 | 00223333334444444
4 | 55566666677777788888889999999
5 | 0000000011112222223333333333333333344444444444
5 | 555666666777777788999999
6 | 000011111111111222222233444444
6 | 556666667888999
7 | 0111111123
7 | 567888
8 | 2

Not similar to MINITAB, the steam unit is adjusted by the scale argument. The plot above using a default scale of 1 which is equivalent to steam unit =5. To change the steam unit to 10, the value of scale argument should be change to 0.5

> stem(SmokeCProg\$AGE, scale=0.5)
The decimal point is 1 digit(s) to the right of the |
3 | 04577888899
4 | 0022333333444444455566666677777788888889999999
5 | 00000000111122222233333333333333333444444444445556666667777777889999
6 | 000011111111111222222233444444556666667888999
7 | 0111111123567888
8 | 2

# Some Basic Probability Concepts

## Formular with R

Formular Number Name Formular Formular with R
3.2.1 Classical probability ${\displaystyle P(E)={\frac {m}{N}}}$ Example
3.2.2 Relative frequency probability ${\displaystyle P(E)={\frac {m}{n}}}$ Example
3.3.1–3.3.3 Properties of probability ${\displaystyle P(E_{i})\geq 0}$

${\displaystyle P(E_{1})+P(E_{2})+\dots +P(E_{n})=0}$ ${\displaystyle P(E_{i}+E_{j})=P(E_{i})+P(E_{j})}$

Example
3.4.1 Multiplication rule ${\displaystyle P(A\cap B)=P(A|B)P(B)=P(A)P(B|A)}$ Example
3.4.2 Conditional probability ${\displaystyle P(A|B)={\frac {P(A\cap B)}{P(B)}}}$ Example
3.4.3 Addition rule ${\displaystyle P(A\cup B)=P(A)+P(B)-P(A\cap B)}$ Example
3.4.4 Independent events ${\displaystyle P(A\cap B)=P(A)P(B)}$ Example
3.4.5 Complementary events ${\displaystyle P({\overline {A}})=1-P(A)}$ Example
3.4.6 Marginal probability ${\displaystyle P(A_{i})=\sum {P(A_{i}\cap B_{j})}}$ Example
Sensitivity of a screening test ${\displaystyle P(T|D)={\frac {a}{(a+c)}}}$ Example
Specificity of a screening test ${\displaystyle P({\overline {T}}|{\overline {D}})={\frac {d}{(b+d)}}}$ Example
3.5.1 Predictive value positive of a screening test ${\displaystyle P(D|T)={\frac {P(D|T)P(D)}{P(T|D)P(D)+P(T|{\overline {D}})P({\overline {D}})}}}$ Example
3.5.2 Predictive value negative of a screening test ${\displaystyle P({\overline {D}}|{\overline {T}})={\frac {P({\overline {D}}|{\overline {T}})P({\overline {D}})}{P({\overline {T}}|{\overline {D}})P({\overline {D}})+P({\overline {T}}|D)P(D)}}}$ Example
Symbol Key
• ${\displaystyle D}$= disease
• ${\displaystyle E}$= Event
• ${\displaystyle m}$= the number of times an event E_i occurs
• ${\displaystyle n}$= sample size or the total number of times a process occurs
• ${\displaystyle n}$=Population size or the total number of mutually exclusive and equally likely events
• ${\displaystyle P({\overline {A}})}$= a complementary event; the probability of an event A, not occurring
• ${\displaystyle P(E_{i})}$=probability of some event E_i occurring
• ${\displaystyle P(A\cap B)}$=an “intersection” or “and” statement; the probability of an event A and an event B occurring
• ${\displaystyle P(A\cup B)}$=an “union” or “or” statement; the probability of an event A or an event B or both occurring
• ${\displaystyle P(A|B)}$=a conditional statement; the probability of an event A occurring given that an event B has already occurred
• ${\displaystyle T}$=test results
Example

# Probability Distributions

## Summary of Formulars with R

Formular Number Name Formular Formular with R
4.2.1 Mean of a frequency distribution ${\displaystyle \mu =\sum {xp(x)}}$ Example
4.2.2 Variance of a frequency distribution ${\displaystyle \sigma ^{2}=\sum {(x-\mu )^{2}p(x)}}$

or ${\displaystyle \sigma ^{2}=\sum {x^{2}p(x)-\mu ^{2}}}$

Example
4.3.1 Combination of objects ${\displaystyle {}_{n}C_{k}={\frac {n!}{x!(n-1)!}}}$ Example
4.3.2 Binomial distribution function ${\displaystyle f(x)={}_{n}C_{k}p^{x}q^{n-x},x=0,1,2,...}$ Example
4.3.3–4.3.5 Tabled binomial probability equalities ${\displaystyle P(X=x|n,p\geq .50)=P(X=n-x|n,1-p)}$

${\displaystyle P(X\leq x|n,p>.50)=P(X\geq n-x|n,1-p)}$ ${\displaystyle P(X\geq x|n,p>.50)=P(X\leq n-x|n,1-p)}$

Example
4.4.1 Poisson distribution function ${\displaystyle f(x)={\frac {e^{-\lambda }\lambda ^{x}}{x!}},x=0,1,2,\dots }$ Example
4.6.1 Normal distribution function ${\displaystyle f(x)={\frac {1}{\sqrt {2\pi \sigma }}}e^{-(x-\mu )^{2}/2\sigma ^{2}},-\infty 0}$ Example
4.6.2 z-transformation ${\displaystyle z={\frac {X-\mu }{\sigma }}}$ Example
4.6.3 Standard normal distribution function ${\displaystyle f(z)={\frac {1}{\sqrt {2\pi }}}e^{-z^{2}/2},-\infty Example
Symbol Key

# Some Important Sampling Distributions

## Summary of Formulars with R

Formular Number Name Formular Formular with R
5.3.1 z-transformation for sample mean ${\displaystyle Z={\frac {{\bar {X}}-\mu _{x}}{\sigma /{\sqrt {n}}}}}$ Example
5.4.1 z-transformation for difference between two means ${\displaystyle Z={\frac {({\bar {X}}_{1}-{\bar {X}}_{2})-(\mu _{1}-\mu _{2})}{\sqrt {{\frac {\sigma _{1}^{2}}{n_{1}}}+{\frac {\sigma _{2}^{2}}{n_{2}}}}}}}$ Example
5.5.1 z-transformation for sample proportion ${\displaystyle Z={\frac {{\bar {p}}-p}{\sqrt {\frac {p(1-p)}{n}}}}}$ Example
5.5.2 Continuity correction when x < np ${\displaystyle Z_{c}={\frac {{\frac {x+.5}{n}}-p}{\sqrt {pq/n}}}}$ Example
5.5.3 Continuity correction when x > np ${\displaystyle Z_{c}={\frac {{\frac {X+.5}{n}}-p}{\sqrt {pq/n}}}}$ Example
5.6.1 z-transformation for difference between two proportions ${\displaystyle }$ Example
Symbol Key

# Estimation

## Summary of Formulars with R

Formular Number Name Formular Formular with R
6.2.1 Expression of an interval estimate estimator ± (reliability coefficient)× standard error of the estimator Example
6.2.2 Interval estimate for ${\displaystyle \mu }$ when ${\displaystyle \sigma }$ is known ${\displaystyle {\bar {X}}\pm z_{(1-\alpha /2)}\sigma _{\bar {x}}}$ Example
6.3.1 t-transformation ${\displaystyle t={\frac {{\bar {x}}-\mu }{s/{\sqrt {n}}}}}$ Example
6.3.2 Interval estimate for ${\displaystyle \mu }$ when ${\displaystyle \sigma }$ is unknown ${\displaystyle {\bar {X}}\pm t_{(1-\alpha /2)}{\frac {s}{\sqrt {n}}}}$ Example
6.4.1 Interval estimate for the difference between two population means when ${\displaystyle \sigma _{1}}$ and ${\displaystyle \sigma _{2}}$ are known ${\displaystyle ({\bar {x}}_{1}-{\bar {x}}_{2})\pm z_{(1-\alpha /2)}{\sqrt {{\frac {\sigma _{1}^{2}}{n_{1}}}+{\frac {\sigma _{2}^{2}}{n_{2}}}}}}$ Example
6.4.2 Pooled variance estimate ${\displaystyle s_{p}^{2}={\frac {(n_{1}-1)s_{1}^{2}+(n_{2}-1)s_{2}^{2}}{n_{1}+n_{2}-2}}}$ Example
6.4.3 Standard error of estimate ${\displaystyle s_{({\bar {X}}_{1}-{\bar {X}}_{2})}={\sqrt {{\frac {s_{p}^{2}}{n_{1}}}+{\frac {s_{p}^{2}}{n_{1}}}}}}$ Example
6.4.4 Interval estimate for the difference between two population means when s 1 is unknown ${\displaystyle ({\bar {x}}_{1}-{\bar {x}}_{2})\pm t_{(1-\alpha /2)}{\sqrt {{\frac {s_{p}^{2}}{n_{1}}}+{\frac {s_{p}^{2}}{n_{1}}}}}}$ Example
6.4.5 Cochran’s correction for reliability coefficient when variances are not equal ${\displaystyle t'_{(1-\alpha /2)}={\frac {w_{1}t_{1}+w_{2}t_{2}}{w_{1}+w_{2}}}}$ Example
6.4.6 Interval estimate using Cochran’s correction for t ${\displaystyle ({\bar {x}}_{1}-{\bar {x}}_{2})\pm t'_{(1-\alpha /2)}{\sqrt {{\frac {s_{p}^{2}}{n_{1}}}+{\frac {s_{p}^{2}}{n_{1}}}}}}$ Example
6.5.1 Interval estimate for a population proportion ${\displaystyle }$ Example
6.6.1 Interval estimate for the difference between two population proportions ${\displaystyle }$ Example
6.7.1–6.7.3 Sample size determination when sampling with replacement ${\displaystyle }$ Example
6.7.4–6.7.5 Sample size determination when sampling without replacement ${\displaystyle }$ Example
6.8.1 Sample size determination for proportions when sampling with replacement ${\displaystyle }$ Example
6.8.2 Sample size determination for proportions when sampling without replacement ${\displaystyle }$ Example
6.9.1 Interval estimate for s 2 ${\displaystyle }$ Example
6.9.2 Interval estimate for s ${\displaystyle }$ Example
6.10.1 Interval estimate for the ratio of two variances ${\displaystyle }$ Example
6.10.2 Relationship among F ratios ${\displaystyle }$ Example
Symbol Key

# Hypothesis Testing

## Summary of Formulars with R

Formular Number Name Formular Formular with R
7.1.1, 7.1.2, 7.2.1 z-transformation (using either ${\displaystyle \mu }$ or ${\displaystyle \mu _{0}}$ ) ${\displaystyle z={\frac {{\bar {x}}-\mu }{\sigma /{\sqrt {n}}}}}$ Example
7.2.2 t-transformation ${\displaystyle t={\frac {{\bar {x}}-\mu _{0}}{s/{\sqrt {n}}}}}$ Example
7.2.3 Test statistic when sampling from a population that is not normally distributed ${\displaystyle z={\frac {{\bar {x}}-\mu _{0}}{s/{\sqrt {n}}}}}$ Example
7.3.1 Test statistic when sampling from normally distributed populations:population variances known ${\displaystyle z={\frac {({\bar {x}}_{1}-{\bar {x}}_{2})-(\mu _{1}-\mu _{2})_{0}}{\sqrt {{\frac {\sigma _{1}^{2}}{n_{1}}}+{\frac {\sigma _{2}^{2}}{n_{2}}}}}}}$ Example
7.3.2 Test statistic when sampling from normally distributed populations:population variances unknown and equal Example Example
7.3.3, 7.3.4 Test statistic when sampling from normally distributed populations: population variances unknown and unequal Example Example
7.3.5 Sampling from populations that are not normally distributed Example Example
7.4.1 Test statistic for paired differences when the population variance is unknown Example Example
7.4.2 Test statistic for paired differences when the population variance is known Example Example
7.5.1 Test statistic for a single population proportion Example Example
7.6.1, 7.6.2 Test statistic for the difference between two population proportions Example Example
7.7.1 Test statistic for a single population variance Example Example
7.8.1 Variance ratio Example Example
7.9.1, 7.9.2 Upper and lower critical values for � x Example Example
7.10.1, 7.10.2 Critical value for determining sample size to control type II errors Example Example
7.10.3 Sample size to control type II errors Example Example
5.5.3 Continuity correction when x > np Example Example
5.6.1 z-transformation for difference between two proportions Example Example
Symbol Key

# Analysis of Variance

## Summary of Formulars with R

Formular Number Name Formular Formular with R
8.2.1 One-way ANOVA model Example Example
8.2.2 Total sum-of-squares Example Example
8.2.3 Within-group sum-of-squares Example Example
8.2.4 Among-group sum-of-squares Example Example
8.2.5 Within-group variance Example Example
8.2.6 Among-group variance I Example Example
8.2.9 Tukey’s HSD (equal sample sizes) Example Example
8.2.10 Tukey’s HSD (unequal sample sizes) Example Example
8.3.1 Two-way ANOVA model Example Example
8.3.2 Sum-of-squares representation Example Example
8.3.3 Sum-of-squares total Example Example
8.3.4 Sum-of-squares block Example Example
8.3.5 Sum-of-squares treatments Example Example
8.3.6 Sum-of-squares error Example Example
8.4.1 Fixed-effects, additive single-factor, repeated-measures ANOVA model Example Example
8.4.2 Fixed-effects, additive two-factor, repeated-measures ANOVA model Example Example
8.5.1 Two-factor completely randomized fixed-effects factorial model Example Example
8.5.2 Probabilistic representation of a Example Example
8.5.3 Sum-of-squares total I Example Example
8.5.4 Sum-of-squares total II Example Example
8.5.5 Sum-of-squares treatment partition Example Example
Symbol Key

# Simple Linear Regression and Correlation

## Summary of Formulars with R

Formular Number Name Formular Formular with R
9.2.1 Assumption of linearity Example Example
9.2.2 Simple linear regression model Example Example
9.2.3 Error (residual) term Example Example
9.3.1 Algebraic representation of a straight line Example Example
9.3.2 Least square estimate of the slope of a regression line Example Example
9.3.3 Least square estimate of the intercept of a regression line Example Example
9.4.1 Deviation equation Example Example
9.4.2 Sum-of-squares equation Example Example
9.4.3 Estimated population coefficient of determination Example Example
9.4.4–9.4.7 Means and variances of point estimators a and b Example Example
9.4.8 z statistic for testing hypotheses about b Example Example
9.4.9 t statistic for testing hypotheses about b Example Example
9.5.1 Prediction interval for Y for a given X Example Example
9.5.2 Confidence interval for the mean of Y for a given X Example Example
9.7.1–9.7.2 Correlation coefficient Example Example
9.7.3 t statistic for correlation coefficient Example Example
9.7.4 z statistic for correlation coefficient Example Example
9.7.5 Estimated standard deviation for z statistic Example Example
9.7.6 Z statistic for correlation coefficient Example Example
9.7.7 Z statistic for correlation coefficient when n < 25 Example Example
9.7.8 Standard deviation for z Ã Example Example
9.7.9 Z Ã statistic for correlation coefficient Example Example
9.7.10 Confidence interval for r Example Example
Symbol Key

# Multiple Regression and Correlation

## Summary of Formulars with R

Formular Number Name Formular Formular with R
10.2.1 Representation of the multiple linear regression equation Example Example
10.2.2 Representation of the multiple linear regression equation with two independent variables Example Example
10.2.3 Random deviation of a point from a plane when there are two independent variables Example Example
10.3.1 Sum-of-squared residuals Example Example
10.4.1 Sum-of-squares equation Example Example
10.4.2 Coefficient of multiple determination Example Example
10.4.3 t statistic for testing hypotheses about b i Example Example
10.5.1 Estimation equation for multiple linear regression Example Example
10.5.2 Confidence interval for the mean of Y for a given X Example Example
10.5.3 Prediction interval for Y for a given X Example Example
10.6.1 Multiple correlation model Example Example
10.6.2 Multiple correlation coefficient Example Example
10.6.3 F statistic for testing the multiple correlation coefficient Example Example
10.6.4–10.6.6 Partial correlation between two variables (1 and 2) after controlling for a third (3) Example Example
10.6.7 t statistic for testing hypotheses about partial correlation coefficients Example Example
Symbol Key

# Regression Analysis: Some Additional Techniques

## Summary of Formulars with R

Formular Number Name Formular Formular with R
11.4.1–11.4.3 Representations of the simple linear regression model Example Example
11.4.4 Simple logistic regression model Example Example
11.4.5 Alternative representation of the simple logistic regression model Example Example
11.4.6 Alternative representation of the multiple logistic regression model Example Example
11.4.7 Alternative representation of the multiple logistic regression model Example Example
Symbol Key

# The Chi-Square Distribution and the Analysis of Frequencies

## Summary of Formulars with R

Formular Number Name Formular Formular with R
12.2.1 Standard normal random variable Example Example
12.2.2 Chi-square distribution with n degrees of freedom Example Example
12.2.3 Chi-square probability density function Example Example
12.2.4 Chi-square test statistic Example Example
12.4.1 Chi-square calculation formula for a 2 Â 2 contingency table Example Example
12.4.2 Yates’s corrected chi-square calculation for a 2 Â 2 contingency table Example Example
12.6.1–12.6.2 Large-sample approximation to the chi-square Example Example
12.7.1 Relative risk estimate Example Example
12.7.2 Confidence interval for the relative risk estimate Example Example
12.7.3 Odds ratio estimate Example Example
12.7.4 Confidence interval for the odds ratio estimate Example Example
12.7.5 Expected frequency in the Mantel–Haenszel statistic Example Example
12.7.6 Stratum expected frequency in the Mantel–Haenszel statistic Example Example
12.7.7 Mantel–Haenszel test statistic Example Example
12.7.8 Mantel–Haenszel estimator of the common odds ratio Example Example
Example Example Example Example
Example Example Example Example
Symbol Key

# Nonparametric and Distribution-Free Statistics

## Summary of Formulars with R

Formular Number Name Formular Formular with R
13.3.1 Sign test statistic Example Example
13.3.2 Large-sample approximation of the sign test Example Example
13.6.1 Mann–Whitney test statistic Example Example
13.6.2 Large-sample approximation of the Mann–Whitney test Example Example
13.6.3 Equivalence of the Mann–Whitney and Wilcoxon two-sample statistics Example Example
13.7.1–13.7.2 Kolmogorov–Smirnov test statistic Example Example
13.8.1 Kruskal–Wallis test statistic Example Example
13.8.2 Kruskal–Wallis test statistic adjustment for ties Example Example
13.9.2 Friedman test statistic Example Example
13.10.1 Spearman rank correlation test statistic Example Example
13.10.2 Large-sample approximation of the Spearman rank correlation Example Example
13.10.3–13.10.4 Correction for tied observations in the Spearman rank correlation Example Example
13.11.1 Theil's estimator of b Example Example

# Survival Analysis

## Summary of Formulars with R

Formular Number Name Formular Formular with R
14.2.1 Example Example Example
14.2.2 Example Example Example
14.2.3 Example Example Example
14.2.4 Example Example Example
14.2.5 Example Example Example
Example Example Example Example
Example Example Example Example
Example Example Example Example
Example Example Example Example
Example Example Example Example

# Vital Statistics

## Summary of Formulars with R

Formular Number Name Formular Formular with R
Example Example Example Example
Example Example Example Example
Example Example Example Example
Example Example Example Example
Example Example Example Example
Example Example Example Example
Example Example Example Example
Example Example Example Example
Example Example Example Example
Example Example Example Example