Statistics/Distributions/Hypergeometric

Hypergeometric
	Probability mass function; <img alt="Hypergeometric PDF plot" src="//upload.wikimedia.org/wikipedia/commons/thumb/c/c1/HypergeometricPDF.png/300px-HypergeometricPDF.png" decoding="async" width="300" height="257" class="mw-file-element" data-file-width="515" data-file-height="442">
	Cumulative distribution function; <img alt="Hypergeometric CDF plot" src="//upload.wikimedia.org/wikipedia/commons/thumb/b/b4/HypergeometricCDF.png/300px-HypergeometricCDF.png" decoding="async" width="300" height="257" class="mw-file-element" data-file-width="515" data-file-height="442">
Notation	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/d08d8e756f7a44d6a63eb663bce37e604ee6e481" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -3.338ex; width:18.177ex; height:7.843ex;" alt="{\displaystyle h(k)={{{m \choose k}{{N-m} \choose {n-k}}} \over {N \choose n}}}">
Parameters	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/086f62706844be993b2fdc153dee74f38befcf72" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -4.005ex; width:21.165ex; height:9.176ex;" alt="{\displaystyle {\begin{aligned}N&\in \left\{0,1,2,\dots \right\}\\m&\in \left\{0,1,2,\dots ,N\right\}\\n&\in \left\{0,1,2,\dots ,N\right\}\end{aligned}}\,}">
Support	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/2680fb70feb6224199a5ef8a14403f72020b7f1f" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -0.671ex; width:28.89ex; height:2.176ex;" alt="{\displaystyle \scriptstyle {k\,\in \,\left\{\max {(0,\,n+m-N)},\,\dots ,\,\min {(n,\,m)}\right\}}\,}">
PMF	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/14c729774d249bc6a179147f30487afa11c1b511" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -3.338ex; width:10.719ex; height:7.843ex;" alt="{\displaystyle {{{m \choose k}{{N-m} \choose {n-k}}} \over {N \choose n}}}">
CDF	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/70d1d2d44a9d8a3ede85e778fa510f5761c9d59d" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -3.338ex; width:53.836ex; height:7.843ex;" alt="{\displaystyle 1-{{{n \choose {k+1}}{{N-n} \choose {m-k-1}}} \over {N \choose m}}\,_{3}F_{2}\!\!\left[{\begin{array}{c}1,\ k+1-m,\ k+1-n\\k+2,\ N+k+2-m-n\end{array}};1\right],}"> where <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/3c5609cf9098e640e342465c1d255923c0f8b6c5" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -1.005ex; width:3.929ex; height:2.843ex;" alt="{\displaystyle \,_{p}F_{q}}"> is the generalized hypergeometric function
Mean	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/c267ec869bfedbc65f27ea1308770ae6cfbaaa4e" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -1.838ex; width:4.271ex; height:4.676ex;" alt="{\displaystyle {nm \over N}}">
Median	mode = <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/af477f984b218e220c3f8f36f6956bf49527cc24" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -2.505ex; width:42.249ex; height:6.343ex;" alt="{\displaystyle \left\lceil {\frac {(n+1)(m+1)}{N+2}}\right\rceil -1,\left\lfloor {\frac {(n+1)(m+1)}{N+2}}\right\rfloor }">
Variance	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/549031bb481b5672916ce5a67ba5a75b4991b67b" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -2.505ex; width:29.05ex; height:6.176ex;" alt="{\displaystyle {nm \over N}\left(1-{n \over N}\right)\left(1-{m-1 \over N-1}\right)}">
Skewness	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/450c7c5a3a3b8ae698af5ee0cd3d225a1fa89d68" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -3.838ex; width:32.038ex; height:8.843ex;" alt="{\displaystyle {\frac {(N-2m)(N-1)^{\frac {1}{2}}(N-2n)}{[nm(N-m)(N-n)]^{\frac {1}{2}}(N-2)}}}">
Ex. kurtosis	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/6141196c36292cbe9f6e5bc2c0bfc2d639690e51" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -2.671ex; width:37.531ex; height:6.009ex;" alt="{\displaystyle \left.{\frac {1}{nm(N-m)(N-n)(N-2)(N-3)}}\cdot \right.}"> <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/0e5e3af518734254399752d4f2ef147f8d26e77f" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -1.838ex; width:56.008ex; height:4.843ex;" alt="{\displaystyle {\Big [}(N-1)N^{2}{\Big (}N(N+1)-6m(N-m)-6n(N-n){\Big )}+{}}"> <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/5778f9423c4744b49f0a62a91555233dbb7a5b20" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -1.838ex; width:34.435ex; height:4.843ex;" alt="{\displaystyle {}+6nm(N-m)(N-n)(5N-6){\Big ]}}">
Entropy	???
MGF	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/4cd01643d22f28e9480c42d3806bef3e5d0308c8" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -3.338ex; margin-right: -0.387ex; width:28.236ex; height:7.843ex;" alt="{\displaystyle {\frac {{N-m \choose n}\scriptstyle {\,_{2}F_{1}(-n,-m;N-m-n+1;e^{t})}}{N \choose n}}\,\!}">
CF	<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/9898a96c190ca6082a7071ba490f97bb181ed2d7" class="mwe-math-fallback-image-inline mw-invert skin-invert" aria-hidden="true" style="vertical-align: -3.338ex; width:28.31ex; height:7.843ex;" alt="{\displaystyle {\frac {{N-m \choose n}\scriptstyle {\,_{2}F_{1}(-n,-m;N-m-n+1;e^{it})}}{N \choose n}}}">

Hypergeometric Distribution

The hypergeometric distribution describes the number of successes in a sequence of n draws without replacement from a population of N that contained m total successes.

Its probability mass function is:

f(x)={{{m \choose x}{{N-m} \choose {n-x}}} \over {N \choose n}}{\text{ for all }}x\in [0,n]

Technically the support for the function is only where x∈[max(0, n+m-N), min(m, n)]. In situations where this range is not [0,n], f(x)=0 since for k>0, ${0 \choose k}=0$ .

Probability Density Function

We first check to see that f(x) is a valid pmf. This requires that it is non-negative everywhere and that its total sum is equal to 1. The first condition is obvious. For the second condition we will start with Vandermonde's identity

\sum _{x=0}^{n}{a \choose x}{b \choose n-x}={a+b \choose n}

\sum _{x=0}^{n}{{a \choose x}{b \choose n-x} \over {a+b \choose n}}=1

We now see that if a=m and b=N-m that the condition is satisfied.

Mean

We derive the mean as follows:

\operatorname {E} [X]=\sum _{x=0}^{n}x\cdot f(x;n,m,N)=\sum _{x=0}^{n}x\cdot {{{m \choose x}{{N-m} \choose {n-x}}} \over {N \choose n}}

\operatorname {E} [X]=0\cdot {{{m \choose 0}{{N-m} \choose {n-0}}} \over {N \choose n}}+\sum _{x=1}^{n}x\cdot {{{m \choose x}{{N-m} \choose {n-x}}} \over {N \choose n}}

We use the identity ${\binom {a}{b}}={\frac {a}{b}}{\binom {a-1}{b-1}}$ in the denominator.

\operatorname {E} [X]=0+\sum _{x=1}^{n}x\cdot {{{m \choose x}{{N-m} \choose {n-x}}} \over {{N \over n}{{N-1} \choose {n-1}}}}

\operatorname {E} [X]={n \over N}\sum _{x=1}^{n}x\cdot {{{m \choose x}{{N-m} \choose {n-x}}} \over {{N-1} \choose {n-1}}}

Next we use the identity $b{\binom {a}{b}}=a{\binom {a-1}{b-1}}$ in the first binomial of the numerator.

\operatorname {E} [X]={n \over N}\sum _{x=1}^{n}{m{{m-1 \choose x-1}{{N-m} \choose {n-x}}} \over {{N-1} \choose {n-1}}}

Next, for the variables inside the sum we define corresponding prime variables that are one less. So N′=N−1, m′=m−1, x′=x−1, n′=n-1.

\operatorname {E} [X]={mn \over N}\sum _{x'=0}^{n'}{{{m' \choose x'}{{N'-m'} \choose {n'-x'}}} \over {{N'} \choose {n'}}}

\operatorname {E} [X]={mn \over N}\sum _{x'=0}^{n'}f(x';n',m',N')

Now we see that the sum is the total sum over a Hypergeometric pmf with modified parameters. This is equal to 1. Therefore

\operatorname {E} [X]={nm \over N}

Variance

We first determine E(X²).

\operatorname {E} [X^{2}]=\sum _{x=0}^{n}f(x;n,m,N)\cdot x^{2}=\sum _{x=0}^{n}{{{m \choose x}{{N-m} \choose {n-x}}} \over {N \choose n}}\cdot x^{2}

\operatorname {E} [X^{2}]={{{m \choose 0}{{N-m} \choose {n-0}}} \over {N \choose n}}\cdot 0^{2}+\sum _{x=1}^{n}{{{m \choose x}{{N-m} \choose {n-x}}} \over {N \choose n}}\cdot x^{2}

\operatorname {E} [X^{2}]=0+\sum _{x=1}^{n}{{m{m-1 \choose x-1}{{N-m} \choose {n-x}}} \over {{N \over n}{{N-1} \choose {n-1}}}}\cdot x

\operatorname {E} [X^{2}]={mn \over N}\sum _{x=1}^{n}{{{m-1 \choose x-1}{{N-m} \choose {n-x}}} \over {{N-1} \choose {n-1}}}\cdot x

We use the same variable substitution as when deriving the mean.

\operatorname {E} [X^{2}]={mn \over N}\sum _{x'=0}^{n'}{{{m' \choose x'}{{N'-m'} \choose {n'-x'}}} \over {{N'} \choose {n'}}}(x'+1)

\operatorname {E} [X^{2}]={mn \over N}\left[\sum _{x'=0}^{n'}{{{m' \choose x'}{{N'-m'} \choose {n'-x'}}} \over {{N'} \choose {n'}}}x'+\sum _{x'=0}^{n'}{{{m' \choose x'}{{N'-m'} \choose {n'-x'}}} \over {{N'} \choose {n'}}}\right]

The first sum is the expected value of a hypergeometric random variable with parameteres (n',m',N'). The second sum is the total sum that random variable's pmf.

\operatorname {E} [X^{2}]={mn \over N}\left[{n'm' \over N'}+1\right]

\operatorname {E} [X^{2}]={mn \over N}\left[{(n-1)(m-1) \over (N-1)}+1\right]={mn \over N}\left[{{(n-1)(m-1)+(N-1)} \over (N-1)}\right]

We then solve for the variance

\operatorname {Var} (X)=\operatorname {E} [X^{2}]-(\operatorname {E} [X])^{2}

\operatorname {Var} (X)={mn \over N}\left[{{(n-1)(m-1)+(N-1)} \over (N-1)}\right]-\left({mn \over N}\right)^{2}

\operatorname {Var} (X)={Nmn \over N^{2}}\left[{{(n-1)(m-1)+(N-1)} \over (N-1)}\right]-{(N-1)(mn)^{2} \over (N-1)N^{2}}

\operatorname {Var} (X)={nm(N-n)(N-m) \over N^{2}(N-1)}

or, equivalently,

\operatorname {Var} (X)={nm \over N}\left(1-{n \over N}\right)\left(1-{m-1 \over N-1}\right)

Probability mass function
Cumulative distribution function
Notation	$h(k)={{{m \choose k}{{N-m} \choose {n-k}}} \over {N \choose n}}$
Parameters	${\begin{aligned}N&\in \left\{0,1,2,\dots \right\}\\m&\in \left\{0,1,2,\dots ,N\right\}\\n&\in \left\{0,1,2,\dots ,N\right\}\end{aligned}}\,$
Support	$\scriptstyle {k\,\in \,\left\{\max {(0,\,n+m-N)},\,\dots ,\,\min {(n,\,m)}\right\}}\,$
PMF	${{{m \choose k}{{N-m} \choose {n-k}}} \over {N \choose n}}$
CDF	$1-{{{n \choose {k+1}}{{N-n} \choose {m-k-1}}} \over {N \choose m}}\,_{3}F_{2}\!\!\left[{\begin{array}{c}1,\ k+1-m,\ k+1-n\\k+2,\ N+k+2-m-n\end{array}};1\right],$ where $\,_{p}F_{q}$ is the generalized hypergeometric function
Mean	${nm \over N}$
Median	mode = $\left\lceil {\frac {(n+1)(m+1)}{N+2}}\right\rceil -1,\left\lfloor {\frac {(n+1)(m+1)}{N+2}}\right\rfloor$
Variance	${nm \over N}\left(1-{n \over N}\right)\left(1-{m-1 \over N-1}\right)$
Skewness	${\frac {(N-2m)(N-1)^{\frac {1}{2}}(N-2n)}{[nm(N-m)(N-n)]^{\frac {1}{2}}(N-2)}}$
Ex. kurtosis	$\left.{\frac {1}{nm(N-m)(N-n)(N-2)(N-3)}}\cdot \right.$ ${\Big [}(N-1)N^{2}{\Big (}N(N+1)-6m(N-m)-6n(N-n){\Big )}+{}$ ${}+6nm(N-m)(N-n)(5N-6){\Big ]}$
Entropy	???
MGF	${\frac {{N-m \choose n}\scriptstyle {\,_{2}F_{1}(-n,-m;N-m-n+1;e^{t})}}{N \choose n}}\,\!$
CF	${\frac {{N-m \choose n}\scriptstyle {\,_{2}F_{1}(-n,-m;N-m-n+1;e^{it})}}{N \choose n}}$