Probability/Important Distributions

Distributions of a discrete random variable edit

Preliminary conept: Bernoulli trial edit

Definition. (Bernoulli trial) A Bernoulli trial is an experiment with only two possible outcomes, namely success and failure.

Remark.

  • 'Success' and 'failure' are acting as labels only, i.e. we can define any one of two outcomes in the experiment as 'success'.

Definition. (Independence of Bernoulli trials) Let   be the event  [1]. If   are independent, then the corresponding Bernoulli trials is independent.

Example. If we interpret the outcomes of tossing a coin as 'head comes up' and 'tail comes up', then tossing a coin is a Bernoulli trial.

 

Exercise.

If we interpret the outcomes of tossing a coin as 'head comes up', 'tail comes up' and 'the coin lands on edge', then is tossing a coin a Bernoulli trial?

Yes.
No.



Remark.

  • We typically interpret the outcomes of tossing a coin as 'head comes up' and 'tail comes up'.

Binomial distribution edit

Motivation edit

Consider   independent Bernoulli trials with the same success probability  . We would like to calculate to probability  .

Let   be the event  , as in the previous section. Let's consider a particular sequence of outcomes such that there are   successes in   trials:

 
Its probability is
 
[2] Since the probability of other sequences with some of   successes occurring in other trials is the same, and there are   distinct possible sequences[3],
 
This is the pmf of a random variable following the binomial distribution.

Definition edit

Definition. (Binomial distribution)

 
Pmf's of   and  .

A random variable   follows the binomial distribution with   independent Bernoulli trials and success probability  , denoted by  , if its pmf is

 
 
Cdf's of   and  .

Remark.

  • The " " in the pmf emphasizes that the values of parameters of the distribution (which are quantities that describes the distribution) are   and  . We can similar notations to pdf.
  • There are some alternative notations for emphasizing the parameter values. For example, when the parameter value is  , then the pdf/pmf can be denoted by  
  • Of course, it is not necessary to adding these to the pdf/pmf, but it makes the parameter values involved explicit and clear.
  • The pmf involves a binomial coefficient, and hence the name 'binomial distribution'.
  • General remark for each distribution:
  • We may also just write down the notation for the distribution to denote the distribution itself, e.g.   stands for the binomial distribution.
  • We sometimes say pmf, pdf, or support of a distribution, to mean pmf, pdf or support (respectively) of a random variable following that distribution, for simplicity (it also applies for other properties of distribution (discussed in a later chapter), e.g. mean, variance, etc.).



Bernoulli distribution edit

Bernoulli distribution is simply a special case of binomial distribution, as follows:

Definition. (Bernoulli distribution)

 
Pmf's of   and  .

A random variable   follows the Bernoulli distribution with success probability  , denoted by  , if its pmf is

 
 
Cdf's of   and  .

Remark.

  •  .
  • One Bernoulli trial is involved, and hence the name 'Bernoulli distribution'.

Poisson distribution edit

Motivation edit

The Poisson distribution can be viewed as the 'limit case' for the binomial distribution.

Consider   independent Bernoulli trials with success probability  . By the binomial distribution,

 

After that, consider an unit time interval, with (positive) occurrence rate   of a rare event (i.e. the mean of number of occurrence of the rare event is  ). We can divide the unit time interval to   time subintervals of time length   each. If   is large and   is relatively small, such that the probability for occurrence of two or more rare events at a single time interval is negligible, then the probability for occurrence of exactly one rare event for each time subinterval is   by definition of mean. Then, we can view the unit time interval as a sequence of   Bernoulli trials [4] with success probability  . After that, we can use   to model the number of occurrences of rare event. To be more precise,

 
This is the pmf of a random variable following the Poisson distribution, and this result is known as the Poisson limit theorem (or law of rare events). We will introduce it formally after introducing the definition of Poisson distribution.

Definition edit

Definition. (Poisson distribution)

 
Pmf's of   and  .

A random variable   follows the Poisson distribution with positive rate parameter  , denoted by  , if its pmf is

 
 
Cdf's of   and  .

Remark.

Theorem. (Poisson limit theorem) A random variable following   converges in distribution to a random variable following   as  .

Proof. The result follows from the result proved above: the pmf of   approaches the pmf of   as  .

 

Remark.

  • As a result, the Poisson distribution can be used as an approximation to the binomial distributions for large   and relatively small  .


Geometric distribution edit

Motivation edit

Consider a sequence of independent Bernoulli trials with success probability  . We would like to calculate the probability  . By considering this sequence of outcomes:

 
we can calculate that
 
[5] This is the pmf of a random variable following the geometric distribution.

Definition edit

Definition. (Geometric distribution)

 
Pmf's of   and  .

A random variable   follows the geometric distribution with success probability  , denoted by  , if its pmf is

 
 
Cdf's of   and  .

Remark.

  • The sequence of the probabilities starting from  , with input value   increased one by one (i.e.  ) is a geometric sequence, and hence the name 'geometric distribution'.
  • For an alternative definition, the pmf is instead  , which is the proability  , with support  .

Proposition. (Memorylessness of geometric distribution) If  , then

 
for each nonnegative integer   and  .

Proof.

 
  • In particular,   since  .

 

Remark.

  •   can be interpreted as 'there are more than   failures before the first success';
  •   can be interpreted as '  failures have occured, so there are more than or equal to   failures before the first success'.
  • It implies that the condition   does not affect the distribution of the remaining number of failures before the first success (it still follows geometric distribution with the same success probability).
  • So, we can assume the trials start afresh after an arbitrary trial for which failure occurs.
  • E.g., if failure occurs in first trial, then the distribution of the remaining number of failures before the first success is not affected.
  • Also, if success occurs in first trial, then the condition becomes  , instead of  , so the above formula cannot be applied in this situation.
  • Indeed,  , since   cannot exceed zero given that  .

Negative binomial distribution edit

Motivation edit

Consider a sequence of independent Bernoulli trials with success probability  . We would like to calculate the probability  . By considering this sequence of outcomes:

 
we can calculate that
 
Since the probability of other sequences with some of   failures occuring in other trials (and some of   successes (excluding the  th success, which must occur in the last trial) occuring in other trials), is the same, and there are   (or  , which is the same numerically) distinct possible sequences [6],
 
This is the pmf of a random variable following the negative binomial distribution.

Definition edit

Definition. (Negative binomial distribution)

 
Pmf's of   and  .

A random variable   follows the negative binomial distribution with success probability  , denoted by  , if its pmf is

 
 
Cdf's of   and  .

Remark.

  • Negative binomial coefficient is involved and hence the name 'negative binomial distribution'.


Hypergeometric distribution edit

Motivation edit

Consider a sample of size   are drawn without replacement from a population size  , containing   objects of type 1 and   of another type. Then, the probability

 
[7].
  •  : unordered selection of   objects of type 1 from   (distinguishable) objects of type 1 without replacement;
  •  : unordered selection of   objects of another type from   (distinguishable) objects of another type without replacement;
  •  : unordered selection of   objects from   (distinguishable) objects without replacement.

This is the pmf of a random variable following the hypergeometric distribution.

Definition edit

Definition. (Hypergeometric distribution)

 
Pmf's of   and  .

A random variable   follows the hypergeometric distribution with   objects drawn from a collection of   objects of type 1 and   of another type, denoted by  , if its pmf is

 
 
Cdf's of   and  .

Remark.

  • The pmf is sort of similar to hypergeometric series [8], and hence the name 'hypergeometric distribution'.


Finite discrete distribution edit

This type of distribution is a generalization of all discrete distribution with finite support, e.g. Bernoulli distribution and hypergeometric distribution.

Another special case of this type of distribution is discrete uniform distribution, which is similar to the continuous uniform distribution (will be discussed later).

Definition. (Finite discrete distribution) A random variable   follows the finite discrete distribution with vector   and probability vector  , denoted by   if its pmf is

 

Remark.

  • For mean and variance, we can calculate them by definition directly. There are no special formulas for finite discrete distribution.

Definition. (Discrete uniform distribution) The discrete uniform distribution, denoted by  , is  .

Remark.

  • Its pmf is  

Example. Suppose a r.v.  . Then,

 
Illustration of the pmf:
|
|              *
|              |
|         *    |
|    *    |    |
|    |    |    |
*----*----*----*-------
     1    2    3

Example. Suppose a r.v.  . Then,

 
Illustration of the pmf:
|
|               
|               
|    *    *    *
|    |    |    |
|    |    |    |
*----*----*----*-------
     1    2    3

Exercises edit

 

Exercise.

  

1 Which of the following distributions should be used for modeling the number of car accidents in a day at a town?

Binomial distribution.
Poisson distribution.
Geometric distribution.
Negative binomial distribution.
Hypergeometric distribution.

2 Among 200 people, each of them has probability 0.1 to be a smoker independently. We select one person from them without replacement, until a smoker is selected. Which of the following distributions should be used for modeling the number of selection needed, just before the smoker is selected?

Binomial distribution.
Poisson distribution.
Geometric distribution.
Negative binomial distribution.
Hypergeometric distribution.

3 It is given that among 1000 taxi drivers, 80% of them are insured by an insurance company. 30 taxi drivers are chosen randomly from them without replacement. Which of the following distributions should be used for modeling the number of uninsured drivers chosen?

Binomial distribution.
Poisson distribution.
Geometric distribution.
Negative binomial distribution.
Hypergeometric distribution.

4 An insurance company has sold 500 policies. An actuary determines that for each of the policy, there is 0.1 probability that claim payment to the policyholder is needed, independently. Which of the following distributions should be used for modeling the number of policies for which claim payment to the policy holder is needed?

Binomial distribution.
Poisson distribution.
Geometric distribution.
Negative binomial distribution.
Hypergeometric distribution.

5 An insurance company has sold 500 policies. An actuary determines that for each of the policy, there is 0.1 probability that claim payment to the policyholder is needed, independently. Which of the following distributions should be used for modeling the number of policies checked just before 10 claim payments to the policy holder are made?

Binomial distribution.
Poisson distribution.
Geometric distribution.
Negative binomial distribution.
Hypergeometric distribution.

6 Which of the following distributions should be used for modeling the number of people infected by a rare disease in a town?

Binomial distribution.
Poisson distribution.
Geometric distribution.
Negative binomial distribution.
Hypergeometric distribution.

7 A box contains 100 red balls, 300 blue balls and 250 green balls. 100 balls are drawn from the box. Which distribution does the number of balls that are not blue drawn from the box follow?

 
 
 
 
 

8 Which of the following distribution(s) has (have) exactly two parameters?

Binomial distribution.
Bernoulli distribution.
Poisson distribution.
Geometric distribution.
Negative binomial distribution.
Hypergeometric distribution.

9 An manufacturer sells 200 light bulbs, which costs $100 each. The manufacturer promises that a full refund will be made to the buyer if the light bulb he buys fails within first week of purchase. Given that each light bulb has a probability 0.001 to fail within the first week independently, which distribution does the number of refunds paid follow?

 
 
 
 
 


Distributions of a continuous random variable edit

Uniform distribution (continuous) edit

The continuous uniform distribution is a model for 'no preference', i.e. all intervals of the same length on its support are equally likely [9] (it can be seen from the pdf corresponding to continuous uniform distribution). There is also discrete uniform distribution, but it is less important than continuous uniform distribution. So, from now on, simply 'uniform distribution' refers to the continuous one, instead of the discrete one.

Definition. (Uniform distribution)

 
Pdf's of  .

A random variable   follows the uniform distribution, denoted by  , if its pdf is

 

Remark.

  • The support of   can also be alternatively   or  , without affecting the probabilities of events involved, since the probability calculated, using pdf at a single point, is zero anyways.
  • The distribution   is the standard uniform distribution.

Proposition.

 
Cdf's of  .

(Cdf of uniform distribution) The cdf of   is

 

Proof.

 
Then, the result follows.

 


Exponential distribution edit

The exponential distribution with rate parameter   is often used to describe the interarrival time of rare events with rate  .

Comparing this with the Poisson distribution, the exponential distribution describes the interarrival time of rare events, while Poisson distribution describes the number of occurrences of rare events within a fixed time interval.

By definition of rate, when the rate  , then interarrival time   (i.e. frequency of the rare event  ).

So, we would like the pdf to be more skewed to left when  (i.e. the pdf has higher value for small   when  ), so that areas under the pdf for intervals involving small value of     when  .

Also, since with a fixed rate  , the interarrival time should be less likely of higher value. So, intuitively, we would also like the pdf to be a strictly decreasing function, so that the probability involved (area under the pdf for some interval)   when  .

As we can see, the pdf of exponential distribution satisfies both of these properties.

Definition. (Exponential distribution)

 
Pdf's of   and  .

A random variable   follows the exponential distribution with positive rate parameter  , denoted by  , if its pdf is

 

Proposition. (Cdf of exponential distribution)

 
Cdf's of   and  .

The cdf of   is

 

Proof. Suppose  . The cdf of   is

 

 

Proposition. (Memorylessness of exponential distribution) If  , then

 
for each nonnegative number   and  .

Proof.

 

 

Remark.

  •   can be interpreted as 'the rare event will not occur within next   units of time';
  •   can be interpreted as 'the rare event has not occurred for past   units of time'.
  • It implies that the condition   does not affect the distribution of the remaining waiting time for the rare event (it still follows exponential distribution with the same parameter).
  • So, we can assume the arrival process of the event starts afresh at arbitrary time point of observation.


Gamma distribution edit

Gamma distribution is a generalized exponential distribution, in the sense that we can also change the shape of the pdf of exponential distribution.

Definition. (Gamma distribution)

 
Pdf's of   and  .

A random variable   follows the gamma distribution with positive shape parameter   and positive rate parameter  , denoted by  , if its pdf is

 
 
Cdf's of   and  .

Remark.

  •  , since the pdf of  

 
which is the pdf of  .

Beta distribution edit

Beta distribution is a generalized  , in the sense that we can also change the shape of the pdf, using two shape parameters.

Definition. (Beta distribution)

 
Pdf's of  ,   and  .

A random variable   follows the beta distribution with positive shape parameters   and  , denoted by  , if its pdf is

 
 
Cdf's of  ,   and  .

Remark.

  •  , since the pdf of   is

 
which is the pdf of  .

Cauchy distribution edit

The Cauchy distribution is a heavy-tailed distribution [10]. As a result, it is a 'pathological' distribution, in the sense that it has some counter-intuitive properties, e.g. undefined mean and variance, despite its mean and variance seems to be defined when we look at its graph directly.

Definition. (Cauchy distribution)

 
Pdf and cdf of  .

A random variable   follows the Cauchy distribution with location parameter  , denoted by  , if its pdf is

 

Remark.

  • This definition is referring to a special case of Cauchy distribution. To be more precise, there is also the scale parameter in the complete definition of Cauchy distribution, and it is set to be one in the pdf here.
  • This definition is used here for simplicity.
  • The pdf is symmetric about  , since  .

Normal distribution (very important) edit

The normal or Gaussian distribution is a thing of beauty, appearing in many places in nature. This is probably because sample means or sample sums often follow normal distributions approximately by central limit theorem. As a result, the normal distribution is important in statistics.

Definition. (Normal distribution)

 
Pdf's of   and  .

A random variable   follows the normal distribution with mean   and variance  , denoted by  , if its pdf is

 
 
Cdf's of   and  .

Remark.

  • The distribution   is the standard normal distribution.
  • For  , its pdf is often denoted by  , and its cdf is often denoted by  .
  • pdf of   is  .
  • It follows that the pdf of   is  .
  • It will be proved that   is actually the mean, and   is actually the variance.
  • The pdf is symmetric about  , since  .

Proposition. (Distributions for linear transformation of normally distributed random variables) If  , and   and   are constants,  .

Proof. Assume   [11]. Let   and   be cdf of   and   respectively. Since

 
by differentiation,
 
which is the pdf of  .

 

Remark.

  • A special case is when   and  ,   since
  •  ;
  •  .
  • This shows that we can transform each normally distributed r.v. to the r.v. following standard normal distribution.
  • This can ease the calculation for the probability relating the normally distributed r.v., since we have the standard normal table, in which values of   at different   are given.
  • For some types of standard normal table, only the values of   at different nonnegative   are given.
  • Then, we can calculate its values at different negative   using

 
  • This formula holds since
     


Important distributions for statistics especially edit

The following distributions are important in statistics especially, and they are all related to normal distribution. We will introduce them briefly.

Chi-squared distribution edit

The chi-squared distribution is a special case of Gamma distribution, and also related to standard normal distribution.

Definition. (Chi-squared distribution)

 
Pdf's of   and  .

The chi-squared distribution with positive   degrees of freedom, denoted by  , is the distribution of  , in which   are i.i.d., and they all follow  .

 
Cdf's of   and  .

Remark.

  • It can be proved that   and thus  . (Then, we can deduce the pdf of   through this.)
  • This implies for the random variable  ,  .
  • A random variable   follows the chi-squared distribution with   degrees of freedom is denoted by  .

Student's t-distribution edit

The Student's  -distribution is related to chi-squared distribution and normal distribution.

Definition. (Student's  -distribution)

 
Pdf's of   and  .

The Student's  -distribution with   degrees of freedom, denoted by  , is the distribution of   in which   and  .

 
Cdf's of   and  .

Remark.

  •   and   (the   is extended real number).
  • The tails of the pdf is heavier as  .
  • A random variable   follows the (Student's ) -distribution with   degrees of freedom is denoted by  .
  • It can be proved that the pdf of   is

 

F-distribution edit

The  -distribution is sort of a generalized Student's  -distribution, in the sense that it has one more changeable parameter for another degrees of freedom.

Definition. ( -distribution) The  -distribution with   and   degrees of freedom, denoted by  , is the distribution of   in which   and  .

 
Pdf's of   and  .
 
Cdf's of   and  .

Remark.

  •  .
  • A random variable   following the  -distribution with   and   degrees of freedom is denoted by  .
  • It can be proved that the pdf of   is

 

If you are interested in knowing how chi-squared distribution, Student's  -distribution, and  -distribution are useful in statistics, then you may briefly look at, for instance, Statistics/Interval Estimation (applications in confidence interval construction) and Statistics/Hypothesis Testing (applications in hypothesis testing).

Joint distributions edit

Multinomial distribution edit

Motivation edit

Multinomial distribution is generalized binomial distribution, in the sense that each trial has more than two outcomes.

Suppose   objects are to be allocated to   cells independently, for which each object is allocated to one and only one cell, with probability   to be allocated to the  th cell ( ) [12]. Let   be the number of objects allocated to cell  . We would like to calculate the probability  , i.e. the probability that  th cell has   objects.

We can regard each allocation as an independent trial with   outcomes (since it can be allocated to one and only one of   cells). We can recognize that the allocation of   objects is partition of   objects into   groups. There are hence   ways of allocation.

So,   In particular, the probability of allocating   objects to  th cell is   by independence, and so that of a particular case of allocation of   objects to   cells is   by independence.

Definition edit

Definition. (Multinomial distribution) A random vector   follows the multinomial distribution with   trials and probability vector  , denoted by  , if its joint pmf is

 

Remark.

  •   if  .
  • In this case, if  ,   is the number of successes for the binomial distribution (and   is the number of failures).
  • Also,  . It can be seen by regarding allocating the object into  th cell as 'success' for each allocation of single object [13]. Then, the success probability is  .


Multivariate normal distribution edit

Multivariate normal distribution is, as suggested by its name, a multivariate (and also generalized) version of the normal distribution (univariate).

Definition. (Multivariate normal distribution) A random vector   follows the  -dimensional normal distribution with mean vector   and covariance matrix  , denoted by  [14] if its joint pdf is

 
in which   is the mean vector, and   is the covariance matrix (with size  ).

Remark.

  • The distribution for case   is more usually used, and that is called the bivariate normal distribution.
  • An alternative and equivalent definition is that   if

 
for some constants  , and   are   i.i.d. standard normal random variables.
  • Using the above result, the marginal distribution followed by   is  , as one will expect.
  • By proposition about the sum of independent normal random variables and distribution of linear transformation of normal random variables (see Probability/Transformation of Random Variables chapter), the mean is  , and the variance is   (this equals   by definition).

Proposition. (Joint pdf of the bivariate normal distribution) The joint pdf of   is

 
in which   and   are positive.
 
Graph of an example of bivariate normal distribution

Proof. For the bivariate normal distribution,

  • the mean vector is  ;
  • the covariance matrix is  
  • Hence,

 
  • It follows that the joint pdf is

 

 


  1. Alternatively, we can define the events as  
  2. 'indpt.' stands for independence.
  3. This is because there is unordered selection of (distinguishable and ordered)   trials for 'success' without replacement from   trials (then the remaining position is for 'failure').
  4. Occurrence of the rare event is viewed as 'success' and non-occurrence of the rare event is viewed as 'failure'.
  5. Unlike the outcomes for the binomial distribution, there is only one possible sequence for each  .
  6. There is unordered selection of   trials for 'failures' (or   trials for 'successes') from   trials without replacement
  7. The restriction on   is imposed so that the binomial coefficients are defined, i.e. the expression 'makes sense'. In practice, we rarely use this condition directly. Instead, we usually directly determine whether a specific value of   'makes sense'.
  8. It is out of scope for this book.
  9. The probability is 'distributed uniformly over an interval'.
  10. A random variable following the Cauchy distribution has a relatively high probability to take extreme values, compared with other light-tailed distributions (e.g. the normal distribution). Graphically, the 'tails' (i.e. left end and right end) of the pdf.
  11. The case for   holds similarly (The inequality sign is in opposite direction, and eventually we will have two negative signs cancelling each other). Also when  , the r.v. becomes a non-random constant, and so we are not interested in this case.
  12. Then,  .
  13. If the object is allocated to a cell other than  th cell, then it is 'failure'
  14. The subscript   for   is to emphasize that the distribution is  -dimensional, and is optional.