Biostatistics with R

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/Biostatistics_with_R

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Biostatistics with R authors

License

The text of this book is released under the terms of the Creative Commons Attribution-ShareAlike 3.0 and GNU Free Documentation License. The particular version of that license that is being used can be found at:

Wikibooks:Creative Commons Attribution-ShareAlike 3.0 Unported License

Wikibooks:GNU Free Documentation License

Images used in this document are available under various licenses. Clicking on the image will take you to a description page where the licensing information is displayed.

Authors

List

Hanjin Deviasse
- First authorship: Sep, 2015
- Contributions:
Chainsawriot
- First authorship:2011
- Contributions: 1 page

A Brief Introduction To R/The First Step in R

What is R?

How to install R

RStudio

Use R package

Data Entry to R

Some Special Values

Reference

Import

Why R for biostatistics?

R is superior to common statistical packages such as SPSS, SAS and MINITAB because it is

powerful
available for many platforms (Mac OS X, Windows, Linux etc.)
programmable
non-commercial
extensively documented

Obtaining R/Installation

You may refer to R FAQ

Data Import

The format of data set available in Wiley's website are CSV, Excel, MINITAB, SAS and SPSS. Although you can import the data saved in Excel, SAS and SPSS into R using the foreign package, you should download the data in CSV format. It is because CSV is the easiest one to process in R.

For example, you would like to import the "Large Data set" data file. The downloaded data file (LDS_C02_NCBIRTH800.csv) , assuming stored in the directory "/desktop",can be imported into R as a data.frame called "largedataset" using following syntax:

> largedataset <- read.csv("/Desktop/LDS_C02_NCBIRTH800.csv", header=TRUE,na.strings="NA")

if you prefer to choose the data file using the standard "point-and-click" GUI way, you may use the function file.choose(), i.e.

largedataset <- read.csv(file.choose(), header=TRUE,na.strings="NA")

Now, you should imported the data from the CSV to a data frame called "largedataset". You may try to look inside the data frame by calling its name

> largedataset

You can access the variable (in computer lingo, column) "sex" inside the largedataset dataframe by

largedataset$sex

For example, you want to count the frequency of sex

> table(largedataset$sex)

You can attach the data frame so that you can call the variable directly

> attach(largedataset)
> table(sex)
> detach() #cancel attaching

Basic data management

R is designed to be a analysis system instead of a integrated environment such as SPSS. Unlike SPSS, R doesn't have a spreadsheet-like environment for data input. Usually data are entered using different software (e.g. database, spreadsheet software such as OO.o Calc) and then imported to R as described above. For quick one-off calculations, you can do the data entry in R. For example, if you want to calculate the mean age of ten patients (30,31,32,34,35,36,37,30,40,45) you can enter the data into R using the c() function.

> pt_age <- c(30,31,32,34,35,36,37,30,40,45)

You may call the newly created object pt_age by its name...

> pt_age

...and then calculate the mean age of the ten patients.

> mean (pt_age)

Introduction to Biostatistics

REVIEW EXERCISES

1. Explain what is meant by descriptive statistics.

2. Explain what is meant by inferential statistics.

3. Define: (a) Statistics (b)Biostatistics (c) Variable (d)Quantitative variable (e) Qualitative variable (f)Random variable (g) Population (h)Finite population (i) Infinite population (j)Sample (k) Discrete variable (l)Continuous variable (m) Simple random sample (n)Sampling with replacement (o) Sampling without replacement

4. Define the word measurement.

5. List, describe, and compare the four measurement scales.

6. For each of the following variables, indicate whether it is quantitative or qualitative and specify the measurement scale that is employed when taking measurements on each: (a) Class standing of the members of this class relative to each other (b) Admitting diagnosis of patients admitted to a mental health clinic (c) Weights of babies born in a hospital during a year (d) Gender of babies born in a hospital during a year (e) Range of motion of elbow joint of students enrolled in a university health sciences curriculum (f) Under-arm temperature of day-old infants born in a hospital

7. For each of the following situations, answer questions a through e: (a) What is the sample in the study? (b) What is the population? (c) What is the variable of interest? (d) How many measurements were used in calculating the reported results? (e) What measurement scale was used? Situation A. A study of 300 households in a small southern town revealed that 20 percent had at least one school-age child present. Situation B. A study of 250 patients admitted to a hospital during the past year revealed that, on the average, the patients lived 15 miles from the hospital.

8. Consider the two situations given in Exercise 7. For Situation A describe how you would use a stratified random sample to collect the data. For Situation B describe how you would use systematic sampling of patient records to collect the data.

Descriptive Statistics

Summary For Formular with R

Formula Number	Name	Formula	Formula with R
2.3.1	Class interval width using Sturges’s Rule	$w={\frac {R}{k}}$	Example
2.4.1	Mean of a population	$\mu ={\sum _{i=1}^{n}{x_{i}} \over N}$	Example
2.4.2	Skewness	$Skewness={\frac {{\sqrt {n}}\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{3}}{(\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{2})^{\frac {3}{2}}}}={\frac {{\sqrt {n}}\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{3}}{(n-1){\sqrt {n-1}}s^{3}}}$	Example
2.4.2	Mean of a sample	${\bar {x}}={\sum _{i=1}^{n}{x_{i}} \over N}$	Example
2.5.1	Range	$R=x_{L}-x_{s}$	Example
2.5.2	Sample variance	$s^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{2}$	Example
2.5.3	Population variance	$\sigma ^{2}={\frac {1}{N}}\sum _{i=1}^{N}\left(x_{i}-\mu \right)^{2}$	Example
2.5.4	Standard deviation	$s={\sqrt {s^{2}}}={\sqrt {{\frac {1}{n-1}}\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{2}}}$	Example
2.5.5	Coefficient of variation	$C.V.={\frac {s}{\bar {x}}}$	Example
2.5.6	Quartile location in ordered array	$Q_{1}={\frac {1}{4}}(n+1)$	Example
2.5.7	Interquartile range	$IQR=Q_{3}-Q_{1}$	Example
2.5.8	Kurtosis	$Kurtosis={\frac {\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{4}}{(\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{2})^{2}}}-3={\frac {n\sum _{i=1}^{n}\left(x_{i}-{\overline {x}}\right)^{4}}{(n-1)^{2}s^{4}}}$	Example
Symbol Key	$C.V.$ = coefficient of variation $IQR$ = Interquartile range $k$ = number of class intervals $m$ = population mean $N$ = population size $n$ = sample size $(n-1)$ =degrees of freedom $Q_{1}$ = first quartile $Q_{2}$ = second quartile = median $Q_{3}$ = third quartile $R$ =range $s$ =standard deviation $s^{2}$ = sample variance $\sigma ^{2}$ = population variance $x_{i}$ = $i^{t}h$ data observation $x_{L}$ = largest data point $x_{S}$ =smallest data point ${\bar {x}}$ = sample mean $w$ =class width		Example

The Ordered Array

The Frequency Distribution

Example 2.2.1 detailed the procedure to sort an array. This array is a series of ages in subjects received two kinds of smoking cessation program. Suppose you already import the data set using the following command:

> SmokeCProg <- read.csv("/EXA_C01_S04_01.csv", header=T, na.strings=NA)

It is better to use a descriptive name (SmokeCProg for Smoking Cessation Program) rather than commonly used place holder name such as x,y. We can obtain a sorted array of ages using the following command:

> sort(SmokeCProg$AGE)

The frequency distribution of Ages as shown in table 2.3.1 can be obtained using:

> table(cut(SmokeCProg$AGE, b=c(0,39,49,59,69,79,89)))
(0,39] (39,49] (49,59] (59,69] (69,79] (79,89] 
    11      46      70      45      16       1

cut command break up AGE variables based on the break points (0,39,49,59,69,79,89) provided. In table 2.3.2, the frequency table of age was provided. As suggested by Venables et al. in the book "An Introduction to R", statistical analysis is normally done as a series of steps, with intermediate results being stored in objects. Compared to other statistical packages, R will only give minimal output. We will demonstrate this important characteristic in this example. In previous example, we calculated the frequency distribution of Ages using table() and cut() command. We can store the results in form of a object called "AgeFreqTable" using:

> AgeFreqTable <- table(cut(SmokeCProg$AGE, b=c(0,39,49,59,69,79,89)))

You will get no output. Until you call the object "AgeFreqTable"

> AgeFreqTable
(0,39] (39,49] (49,59] (59,69] (69,79] (79,89] 
    11      46      70      45      16       1

In order to obtain the cumulative frequency, we can process the object "AgeFreqTable" using cumsum() command

> cumsum(AgeFreqTable)
(0,39] (39,49] (49,59] (59,69] (69,79] (79,89] 
    11      57     127     172     188     189

Before we jump to the calculation of relative frequency, we can obtain the total number of observations in a variable using length() function

> length(SmokeCProg$AGE)
[1] 189

We can calculate the relative frequency by dividing each items in the object "AgeFreqTable" by the total number of observations using

> AgeFreqTable/length(SmokeCProg$AGE)
    (0,39]     (39,49]     (49,59]     (59,69]     (69,79]     (79,89] 
0.058201058 0.243386243 0.370370370 0.238095238 0.084656085 0.005291005

Similarly, the cummulative relative frequency can be calculated using

> cumsum(AgeFreqTable)/length(SmokeCProg$AGE)
    (0,39]    (39,49]    (49,59]    (59,69]    (69,79]    (79,89] 
0.05820106 0.30158730 0.67195767 0.91005291 0.99470899 1.00000000

If you would like to round the results of relative frequency to 4 digits, you can use the round() function

> round (AgeFreqTable/length(SmokeCProg$AGE),digits=4)
 (0,39] (39,49] (49,59] (59,69] (69,79] (79,89] 
0.0582  0.3016  0.6720  0.9101  0.9947  1.0000

Alternatively, you can store the results of relative frequency in a new object and then process that object with round() function

> AgeRelFreqTable <- AgeFreqTable/length(SmokeCProg$AGE)
> round (AgeRelFreqTable, digits=4)

Exercise: Try to round the results of cummulative relative frequency to 4 digits using R command To plot a histogram, you can use the hist() function, e.g.

> hist(SmokeCProg$AGE)

You can customize the histogram by adding some arguments (i.e. options), you may type ?hist to learn more about the argument of hist() function. For example, if you want to plot a histogram with only five bars (similar to Figure 2.3.2)

> hist(SmokeCProg$AGE, breaks=5)

You can add more arguments to hist() functions, e.g.

> hist(SmokeCProg$AGE, breaks=5, ylim=c(0,70), main="Histogram of Ages of 189 subjects", col="red", xlab="Age")

Remember, always consult the document (e.g. ?hist or help.search("histogram") ) when you have question. In 95% of the time, you can find the answer in help document. For example, you don't know how to plot a stem-and-leaf graph to display your data. You don't even know the name of the function. You can use help.search() to search for the keyword "stem", i.e.

> help.search("stem")

A function called stem() should be in the results. We then try to use this function to visual our data

> stem(SmokeCProg$AGE)
The decimal point is 1 digit(s) to the right of the |
 3 | 04
 3 | 577888899
 4 | 00223333334444444
 4 | 55566666677777788888889999999
 5 | 0000000011112222223333333333333333344444444444
 5 | 555666666777777788999999
 6 | 000011111111111222222233444444
 6 | 556666667888999
 7 | 0111111123
 7 | 567888
 8 | 2

Not similar to MINITAB, the steam unit is adjusted by the scale argument. The plot above using a default scale of 1 which is equivalent to steam unit =5. To change the steam unit to 10, the value of scale argument should be change to 0.5

> stem(SmokeCProg$AGE, scale=0.5)
 The decimal point is 1 digit(s) to the right of the |
 3 | 04577888899
 4 | 0022333333444444455566666677777788888889999999
 5 | 00000000111122222233333333333333333444444444445556666667777777889999
 6 | 000011111111111222222233444444556666667888999
 7 | 0111111123567888
 8 | 2

Central Tendency

Some Basic Probability Concepts

Formular with R

Formular Number	Name	Formular	Formular with R
3.2.1	Classical probability	$P(E)={\frac {m}{N}}$	Example
3.2.2	Relative frequency probability	$P(E)={\frac {m}{n}}$	Example
3.3.1–3.3.3	Properties of probability	$P(E_{i})\geq 0$ $P(E_{1})+P(E_{2})+\dots +P(E_{n})=0$ $P(E_{i}+E_{j})=P(E_{i})+P(E_{j})$	Example
3.4.1	Multiplication rule	$P(A\cap B)=P(A\|B)P(B)=P(A)P(B\|A)$	Example
3.4.2	Conditional probability	$P(A\|B)={\frac {P(A\cap B)}{P(B)}}$	Example
3.4.3	Addition rule	$P(A\cup B)=P(A)+P(B)-P(A\cap B)$	Example
3.4.4	Independent events	$P(A\cap B)=P(A)P(B)$	Example
3.4.5	Complementary events	$P({\overline {A}})=1-P(A)$	Example
3.4.6	Marginal probability	$P(A_{i})=\sum {P(A_{i}\cap B_{j})}$	Example
	Sensitivity of a screening test	$P(T\|D)={\frac {a}{(a+c)}}$	Example
	Specificity of a screening test	$P({\overline {T}}\|{\overline {D}})={\frac {d}{(b+d)}}$	Example
3.5.1	Predictive value positive of a screening test	$P(D\|T)={\frac {P(D\|T)P(D)}{P(T\|D)P(D)+P(T\|{\overline {D}})P({\overline {D}})}}$	Example
3.5.2	Predictive value negative of a screening test	$P({\overline {D}}\|{\overline {T}})={\frac {P({\overline {D}}\|{\overline {T}})P({\overline {D}})}{P({\overline {T}}\|{\overline {D}})P({\overline {D}})+P({\overline {T}}\|D)P(D)}}$	Example
Symbol Key	$D$ = disease $E$ = Event $m$ = the number of times an event E_i occurs $n$ = sample size or the total number of times a process occurs $n$ =Population size or the total number of mutually exclusive and equally likely events $P({\overline {A}})$ = a complementary event; the probability of an event A, not occurring $P(E_{i})$ =probability of some event E_i occurring $P(A\cap B)$ =an “intersection” or “and” statement; the probability of an event A and an event B occurring $P(A\cup B)$ =an “union” or “or” statement; the probability of an event A or an event B or both occurring $P(A\|B)$ =a conditional statement; the probability of an event A occurring given that an event B has already occurred $T$ =test results		Example

Probability Distributions

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
4.2.1	Mean of a frequency distribution	$\mu =\sum {xp(x)}$	Example
4.2.2	Variance of a frequency distribution	$\sigma ^{2}=\sum {(x-\mu )^{2}p(x)}$ or $\sigma ^{2}=\sum {x^{2}p(x)-\mu ^{2}}$	Example
4.3.1	Combination of objects	${}_{n}C_{k}={\frac {n!}{x!(n-1)!}}$	Example
4.3.2	Binomial distribution function	$f(x)={}_{n}C_{k}p^{x}q^{n-x},x=0,1,2,...$	Example
4.3.3–4.3.5	Tabled binomial probability equalities	$P(X=x\|n,p\geq .50)=P(X=n-x\|n,1-p)$ $P(X\leq x\|n,p>.50)=P(X\geq n-x\|n,1-p)$ $P(X\geq x\|n,p>.50)=P(X\leq n-x\|n,1-p)$	Example
4.4.1	Poisson distribution function	$f(x)={\frac {e^{-\lambda }\lambda ^{x}}{x!}},x=0,1,2,\dots$	Example
4.6.1	Normal distribution function	$f(x)={\frac {1}{\sqrt {2\pi \sigma }}}e^{-(x-\mu )^{2}/2\sigma ^{2}},-\infty <x<\infty ,-\infty <\mu <\infty ,\sigma >0$	Example
4.6.2	z-transformation	$z={\frac {X-\mu }{\sigma }}$	Example
4.6.3	Standard normal distribution function	$f(z)={\frac {1}{\sqrt {2\pi }}}e^{-z^{2}/2},-\infty <z<\infty$	Example
Symbol Key

Some Important Sampling Distributions

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
5.3.1	z-transformation for sample mean	$Z={\frac {{\bar {X}}-\mu _{x}}{\sigma /{\sqrt {n}}}}$	Example
5.4.1	z-transformation for difference between two means	$Z={\frac {({\bar {X}}_{1}-{\bar {X}}_{2})-(\mu _{1}-\mu _{2})}{\sqrt {{\frac {\sigma _{1}^{2}}{n_{1}}}+{\frac {\sigma _{2}^{2}}{n_{2}}}}}}$	Example
5.5.1	z-transformation for sample proportion	$Z={\frac {{\bar {p}}-p}{\sqrt {\frac {p(1-p)}{n}}}}$	Example
5.5.2	Continuity correction when x < np	$Z_{c}={\frac {{\frac {x+.5}{n}}-p}{\sqrt {pq/n}}}$	Example
5.5.3	Continuity correction when x > np	$Z_{c}={\frac {{\frac {X+.5}{n}}-p}{\sqrt {pq/n}}}$	Example
5.6.1	z-transformation for difference between two proportions		Example
Symbol Key

Estimation

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
6.2.1	Expression of an interval estimate	estimator ± (reliability coefficient)× standard error of the estimator	Example
6.2.2	Interval estimate for $\mu$ when $\sigma$ is known	${\bar {X}}\pm z_{(1-\alpha /2)}\sigma _{\bar {x}}$	Example
6.3.1	t-transformation	$t={\frac {{\bar {x}}-\mu }{s/{\sqrt {n}}}}$	Example
6.3.2	Interval estimate for $\mu$ when $\sigma$ is unknown	${\bar {X}}\pm t_{(1-\alpha /2)}{\frac {s}{\sqrt {n}}}$	Example
6.4.1	Interval estimate for the difference between two population means when $\sigma _{1}$ and $\sigma _{2}$ are known	$({\bar {x}}_{1}-{\bar {x}}_{2})\pm z_{(1-\alpha /2)}{\sqrt {{\frac {\sigma _{1}^{2}}{n_{1}}}+{\frac {\sigma _{2}^{2}}{n_{2}}}}}$	Example
6.4.2	Pooled variance estimate	$s_{p}^{2}={\frac {(n_{1}-1)s_{1}^{2}+(n_{2}-1)s_{2}^{2}}{n_{1}+n_{2}-2}}$	Example
6.4.3	Standard error of estimate	$s_{({\bar {X}}_{1}-{\bar {X}}_{2})}={\sqrt {{\frac {s_{p}^{2}}{n_{1}}}+{\frac {s_{p}^{2}}{n_{1}}}}}$	Example
6.4.4	Interval estimate for the difference between two population means when s 1 is unknown	$({\bar {x}}_{1}-{\bar {x}}_{2})\pm t_{(1-\alpha /2)}{\sqrt {{\frac {s_{p}^{2}}{n_{1}}}+{\frac {s_{p}^{2}}{n_{1}}}}}$	Example
6.4.5	Cochran’s correction for reliability coefficient when variances are not equal	$t'_{(1-\alpha /2)}={\frac {w_{1}t_{1}+w_{2}t_{2}}{w_{1}+w_{2}}}$	Example
6.4.6	Interval estimate using Cochran’s correction for t	$({\bar {x}}_{1}-{\bar {x}}_{2})\pm t'_{(1-\alpha /2)}{\sqrt {{\frac {s_{p}^{2}}{n_{1}}}+{\frac {s_{p}^{2}}{n_{1}}}}}$	Example
6.5.1	Interval estimate for a population proportion		Example
6.6.1	Interval estimate for the difference between two population proportions		Example
6.7.1–6.7.3	Sample size determination when sampling with replacement		Example
6.7.4–6.7.5	Sample size determination when sampling without replacement		Example
6.8.1	Sample size determination for proportions when sampling with replacement		Example
6.8.2	Sample size determination for proportions when sampling without replacement		Example
6.9.1	Interval estimate for s 2		Example
6.9.2	Interval estimate for s		Example
6.10.1	Interval estimate for the ratio of two variances		Example
6.10.2	Relationship among F ratios		Example
Symbol Key

Hypothesis Testing

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
7.1.1, 7.1.2, 7.2.1	z-transformation (using either $\mu$ or $\mu _{0}$ )	$z={\frac {{\bar {x}}-\mu }{\sigma /{\sqrt {n}}}}$	Example
7.2.2	t-transformation	$t={\frac {{\bar {x}}-\mu _{0}}{s/{\sqrt {n}}}}$	Example
7.2.3	Test statistic when sampling from a population that is not normally distributed	$z={\frac {{\bar {x}}-\mu _{0}}{s/{\sqrt {n}}}}$	Example
7.3.1	Test statistic when sampling from normally distributed populations:population variances known	$z={\frac {({\bar {x}}_{1}-{\bar {x}}_{2})-(\mu _{1}-\mu _{2})_{0}}{\sqrt {{\frac {\sigma _{1}^{2}}{n_{1}}}+{\frac {\sigma _{2}^{2}}{n_{2}}}}}}$	Example
7.3.2	Test statistic when sampling from normally distributed populations:population variances unknown and equal	Example	Example
7.3.3, 7.3.4	Test statistic when sampling from normally distributed populations: population variances unknown and unequal	Example	Example
7.3.5	Sampling from populations that are not normally distributed	Example	Example
7.4.1	Test statistic for paired differences when the population variance is unknown	Example	Example
7.4.2	Test statistic for paired differences when the population variance is known	Example	Example
7.5.1	Test statistic for a single population proportion	Example	Example
7.6.1, 7.6.2	Test statistic for the difference between two population proportions	Example	Example
7.7.1	Test statistic for a single population variance	Example	Example
7.8.1	Variance ratio	Example	Example
7.9.1, 7.9.2	Upper and lower critical values for � x	Example	Example
7.10.1, 7.10.2	Critical value for determining sample size to control type II errors	Example	Example
7.10.3	Sample size to control type II errors	Example	Example
5.5.3	Continuity correction when x > np	Example	Example
5.6.1	z-transformation for difference between two proportions	Example	Example
Symbol Key

Analysis of Variance

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
8.2.1	One-way ANOVA model	Example	Example
8.2.2	Total sum-of-squares	Example	Example
8.2.3	Within-group sum-of-squares	Example	Example
8.2.4	Among-group sum-of-squares	Example	Example
8.2.5	Within-group variance	Example	Example
8.2.6	Among-group variance I	Example	Example
8.2.9	Tukey’s HSD (equal sample sizes)	Example	Example
8.2.10	Tukey’s HSD (unequal sample sizes)	Example	Example
8.3.1	Two-way ANOVA model	Example	Example
8.3.2	Sum-of-squares representation	Example	Example
8.3.3	Sum-of-squares total	Example	Example
8.3.4	Sum-of-squares block	Example	Example
8.3.5	Sum-of-squares treatments	Example	Example
8.3.6	Sum-of-squares error	Example	Example
8.4.1	Fixed-effects, additive single-factor, repeated-measures ANOVA model	Example	Example
8.4.2	Fixed-effects, additive two-factor, repeated-measures ANOVA model	Example	Example
8.5.1	Two-factor completely randomized fixed-effects factorial model	Example	Example
8.5.2	Probabilistic representation of a	Example	Example
8.5.3	Sum-of-squares total I	Example	Example
8.5.4	Sum-of-squares total II	Example	Example
8.5.5	Sum-of-squares treatment partition	Example	Example
Symbol Key

Simple Linear Regression and Correlation

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
9.2.1	Assumption of linearity	Example	Example
9.2.2	Simple linear regression model	Example	Example
9.2.3	Error (residual) term	Example	Example
9.3.1	Algebraic representation of a straight line	Example	Example
9.3.2	Least square estimate of the slope of a regression line	Example	Example
9.3.3	Least square estimate of the intercept of a regression line	Example	Example
9.4.1	Deviation equation	Example	Example
9.4.2	Sum-of-squares equation	Example	Example
9.4.3	Estimated population coefficient of determination	Example	Example
9.4.4–9.4.7	Means and variances of point estimators a and b	Example	Example
9.4.8	z statistic for testing hypotheses about b	Example	Example
9.4.9	t statistic for testing hypotheses about b	Example	Example
9.5.1	Prediction interval for Y for a given X	Example	Example
9.5.2	Confidence interval for the mean of Y for a given X	Example	Example
9.7.1–9.7.2	Correlation coefficient	Example	Example
9.7.3	t statistic for correlation coefficient	Example	Example
9.7.4	z statistic for correlation coefficient	Example	Example
9.7.5	Estimated standard deviation for z statistic	Example	Example
9.7.6	Z statistic for correlation coefficient	Example	Example
9.7.7	Z statistic for correlation coefficient when n < 25	Example	Example
9.7.8	Standard deviation for z Ã	Example	Example
9.7.9	Z Ã statistic for correlation coefficient	Example	Example
9.7.10	Confidence interval for r	Example	Example
Symbol Key

Multiple Regression and Correlation

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
10.2.1	Representation of the multiple linear regression equation	Example	Example
10.2.2	Representation of the multiple linear regression equation with two independent variables	Example	Example
10.2.3	Random deviation of a point from a plane when there are two independent variables	Example	Example
10.3.1	Sum-of-squared residuals	Example	Example
10.4.1	Sum-of-squares equation	Example	Example
10.4.2	Coefficient of multiple determination	Example	Example
10.4.3	t statistic for testing hypotheses about b i	Example	Example
10.5.1	Estimation equation for multiple linear regression	Example	Example
10.5.2	Confidence interval for the mean of Y for a given X	Example	Example
10.5.3	Prediction interval for Y for a given X	Example	Example
10.6.1	Multiple correlation model	Example	Example
10.6.2	Multiple correlation coefficient	Example	Example
10.6.3	F statistic for testing the multiple correlation coefficient	Example	Example
10.6.4–10.6.6	Partial correlation between two variables (1 and 2) after controlling for a third (3)	Example	Example
10.6.7	t statistic for testing hypotheses about partial correlation coefficients	Example	Example
Symbol Key

Regression Analysis: Some Additional Techniques

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
11.4.1–11.4.3	Representations of the simple linear regression model	Example	Example
11.4.4	Simple logistic regression model	Example	Example
11.4.5	Alternative representation of the simple logistic regression model	Example	Example
11.4.6	Alternative representation of the multiple logistic regression model	Example	Example
11.4.7	Alternative representation of the multiple logistic regression model	Example	Example
Symbol Key

The Chi-Square Distribution and the Analysis of Frequencies

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
12.2.1	Standard normal random variable	Example	Example
12.2.2	Chi-square distribution with n degrees of freedom	Example	Example
12.2.3	Chi-square probability density function	Example	Example
12.2.4	Chi-square test statistic	Example	Example
12.4.1	Chi-square calculation formula for a 2 Â 2 contingency table	Example	Example
12.4.2	Yates’s corrected chi-square calculation for a 2 Â 2 contingency table	Example	Example
12.6.1–12.6.2	Large-sample approximation to the chi-square	Example	Example
12.7.1	Relative risk estimate	Example	Example
12.7.2	Confidence interval for the relative risk estimate	Example	Example
12.7.3	Odds ratio estimate	Example	Example
12.7.4	Confidence interval for the odds ratio estimate	Example	Example
12.7.5	Expected frequency in the Mantel–Haenszel statistic	Example	Example
12.7.6	Stratum expected frequency in the Mantel–Haenszel statistic	Example	Example
12.7.7	Mantel–Haenszel test statistic	Example	Example
12.7.8	Mantel–Haenszel estimator of the common odds ratio	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Symbol Key

Nonparametric and Distribution-Free Statistics

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
13.3.1	Sign test statistic	Example	Example
13.3.2	Large-sample approximation of the sign test	Example	Example
13.6.1	Mann–Whitney test statistic	Example	Example
13.6.2	Large-sample approximation of the Mann–Whitney test	Example	Example
13.6.3	Equivalence of the Mann–Whitney and Wilcoxon two-sample statistics	Example	Example
13.7.1–13.7.2	Kolmogorov–Smirnov test statistic	Example	Example
13.8.1	Kruskal–Wallis test statistic	Example	Example
13.8.2	Kruskal–Wallis test statistic adjustment for ties	Example	Example
13.9.2	Friedman test statistic	Example	Example
13.10.1	Spearman rank correlation test statistic	Example	Example
13.10.2	Large-sample approximation of the Spearman rank correlation	Example	Example
13.10.3–13.10.4	Correction for tied observations in the Spearman rank correlation	Example	Example
13.11.1	Theil's estimator of b	Example	Example

Survival Analysis

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
14.2.1	Example	Example	Example
14.2.2	Example	Example	Example
14.2.3	Example	Example	Example
14.2.4	Example	Example	Example
14.2.5	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example

Vital Statistics

Summary of Formulars with R

Formular Number	Name	Formular	Formular with R
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example
Example	Example	Example	Example

Biostatistics with R/Printable version

Biostatistics with R authors

License

Authors

List

A Brief Introduction To R/The First Step in R

What is R?

How to install R

RStudio

Use R package

Data Entry to R

Some Special Values

Reference

Import

Why R for biostatistics?

Obtaining R/Installation

Data Import

Basic data management

Introduction to Biostatistics

Descriptive Statistics

Summary For Formular with R

The Ordered Array

The Frequency Distribution

Central Tendency

Some Basic Probability Concepts

Formular with R

Probability Distributions

Summary of Formulars with R

Some Important Sampling Distributions

Summary of Formulars with R

Estimation

Summary of Formulars with R

Hypothesis Testing

Summary of Formulars with R

Analysis of Variance

Summary of Formulars with R

Simple Linear Regression and Correlation

Summary of Formulars with R

Multiple Regression and Correlation

Summary of Formulars with R

Regression Analysis: Some Additional Techniques

Summary of Formulars with R

The Chi-Square Distribution and the Analysis of Frequencies

Summary of Formulars with R

Nonparametric and Distribution-Free Statistics

Summary of Formulars with R

Survival Analysis

Summary of Formulars with R

Vital Statistics

Summary of Formulars with R

Further reading

For Biostatistics

For R programming