R Programming/Probability Functions/Binomial

The Binomial Distribution

The sum of N Bernoulli trials (all with common success probability)
The number of heads in N tosses of possibly-unfair coin.
Of N oocysts truly present in a sample of water, the number actually counted, given each has same recovery probability.
This distribution has 2 parameters (N and P), though we usually know the number of trials (N), so only one parameter is unknown (P).

Probability Mass Function

dbinom(K,N,P), where K is the number of success, N is the number of trials, and P is the probability of success.
dbinom(5,10,0.5) = 0.2460938

Binomial probability mass functions with same number of trials (10), but different success rates (0.5 and 0.2).

{\begin{array}{l}\operatorname {dbinom} (K,N,P)=\operatorname {combin} (N,K)\cdot p^{K}\cdot (1-p)^{N-K}\\\operatorname {combin} (N,K)={\frac {\displaystyle N!}{\displaystyle K!\cdot (N-K)!}}\end{array}}

Distribution Function

pbinom(K,N,P)
pbinom(5,10,0.5) = 0.6230469

N=10, P=0.2 (blue), and P=0.5 (red).

Generating Random Variables

rbinom(M,N,P)
rbinom(12,10,0.5) -> 5 5 7 5 5 6 7 6 6 6 4 7
hist(rbinom(1000,10,0.5)) --> histogram

File:Binom hist.JPG

Sample of 1000 binomial deviates, displayed as histogram.

hist(rbinom(1000,10,0.5), breaks = seq(from=-0.5, to=12.5)) will put integer values at bar centers (rather than at bar-right.

Parameter Estimation

Most of the time, we get to count the number of trials, so that parameter (N) is known. We observe the number of positives (K) and use this information to estimate the unobserved "success" probability (P).

Sum of M binomials is same as sum of M*N Bernoulli Trials = binom(M*N,P)
Maximum Likelihood
- lambda = sum(successes)/sum(trials) = sum(K)/sum(N)
Bayesian
- With uniform prior, posterior is Beta(alpha=1+sum(K), beta=1+sum(N)-sum(K))
- With prior probability mass on 0 and 1 and the remaining mass given to Beta(1,1), see coin tossing example.

File:Beta1.JPG

Image from Mathcad

File:Binom.jpg

Image from WinBUGS

Classical
- Normal Approximation
- Exact Confidence Interval

BUT, what if the number of trials is not known for M binomial trials? Can we used the data K[1] through K[M] to estimate both N and P?

Estimating both parameters from repeated i.i.d. binomial trials

External Links

Wikipedia: http://en.wikipedia.org/wiki/binomial_distribution
NIST/SEMATECH: http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm