Probability/Properties of Distributions


Introduction edit

Recall that pdf (or cdf) describes the random behaviour of a random variable completely . However, we may sometimes find the pdf (or cdf) to be too complicated, and only want to know some partial information about the random variable. In view of this, we study some properties of distributions in this chapter, which provide partial descriptions of the random behaviour of the random variable.

Some examples of such partial descriptions include

  • location (e.g. pdf is 'located' at left or right?),
  • dispersion (e.g. 'sharp' of 'flat' pdf?),
  • skewness (e.g. pdf is symmetric, skewed to left, or skewed to right?), and
  • tail property (e.g. pdf have 'light' or 'heavy' tails?).

We can qualitatively describe them, but such descriptions are quite subjective and inaccurate. To give a more objective and accurate measure to such descriptions, we evaluate them quantitatively using some quantitative measures derived from the pdf (or cdf) of the random variable.

We will discuss some of such quantitative measures in this chapter. Among these, the expectation is the most important one, since many of other properties base upon the concept of expectation.

Expectation edit

We have different alternative names for expectation, e.g. expected value and mean.

Definition. (Expectation) The expectation of a random variable   is

(i) (if   is discrete)

 
in which   is pmf of  ;

(ii) (if   is continuous)

 
in which   is pdf of  ;

(iii) (if   is mixed) If

 
 
in which   is pmf of   and   is pdf of  .

Remark.

  • The expectation of   is what we would expect of its value if we are to take an observation of  .
  • It is a weighted average of all possible values attainable by   (i.e.  ), with heavier weighting to points with higher value of pmf or pdf.
  • Expectation tells us the 'centre' of distribution of  , and 'average' position of   when generated in the long run.
  • Actually the ' ' is not needed, since the pmf or pdf will equal zero for input outside its support.

Example. Let   be the number facing up after throwing a fair six-faced dice once. Then, the expectation of   is

 
If the dice is loaded, and thus the probability that number '6' faces up becomes 0.5, and there are equal probability for the other five numbers facing up, the expectation of   becomes
 

Example. (Expectation of uniform distribution) Let  , the uniform distribution with parameters   and  . Then, the pdf of   is

 
and the expectation of   is
 
 

Exercise. In a process, we first toss an unfair coin one time, with probability   for the head to come up. If head comes up in the first toss, we toss another unfair coin one time, with probability   for the head to come up. If tails comes up in the first toss instead, We throw an arrow to the ground one time. Let   be the number of head coming up in all tosses,   be the angle from the north direction to the direction pointed by the arrow, measured clockwise and in radian, and   be the number we get from the process finally. Suppose that  .

1 Choose correct expression(s) of  .

 
 
 
 
 

2 Choose correct expression(s) of  .

 
 
 
 
 

3 Choose correct expression(s) of  .

 
 
 
 

4 If the two coins are fair, choose correct statement(s).

  increases.
  decreases.
Change in   depends on values of   and  .
  remains unchanged.
  increases if  .


In the following, we introduce a useful result that gives the relationship between expectation and probability, we can use expectation to ease the computation of probability using this result.

Proposition. (Fundamental bridge between probability and expectation) For each event  ,

 

Proof. Let  . Since   (which is a discrete random variable),

 

 

When there are multiple random variables involved, we may derive the joint pmf or pdf first to compute the expectation, but it can be quite difficult and complicated to do so. Practically, we use the following theorem more often.

Theorem. (Law of the unconscious statistician (LOTUS)) Let   be random variables. Define   for a function  . Then,

(i) (if   is discrete)

 
in which   is joint pmf of  ;

(ii) (if   is continuous)

 
in which   is joint pdf of  .

Remark.

  • If   is mixed, we can apply the definition of expectation and use the above two results for the expectations of the discrete and continuous random variables.
  • This theorem is known as the law of the unconscious statistician, because we often tend to use this identity without realizing that it is a result of a theorem, instead of a definition.
  • This theorem also holds when there is only one random variables involved (the joint pmf and pdf become normal pmf and pdf) , e.g.

 

The proof is quite complicated, and hence we skip it. In the following, we will introduce several properties of expectation that can help us to simplify computations of the expectation.

Proposition. (Properties of expectation) For each constant   and random variable  ,

  • (Linearity)  ;
  • (Nonnegativity) if  ,  ;
  • (Monotonicity) if  ,  ;
  • (Triangle inequality)  ;
  • (Multiplicativity under independence) if   are independent,  .

Proof.

Linearity:

for continuous random variables  ,

 
Similarly, for discrete random variables  ,
 

Nonnegativity:

For continuous random variable  ,

 
Similarly, for discrete random variable  ,
 

Monotonicity:

For random variables   that are either both discrete or both continuous,

 

Triangle inequality:

 

Multiplicativity under independence:

For continuous random variables  ,

 
Similarly, for discrete random variables  ,
 

 

Remark.

  • (Nonmultiplicativity)   in general.
  • We cannot apply linearity property similarly when the function inside the expectation is nonlinear. E.g.,   in general.
  • From linearity, we can see that the expectation of a constant is simply the constant itself. This is intuitive, since the value we expect for a constant is simply the constant.
  • The converse of multiplicativity under independence is true in general, but not always true. For some special dependent random variables, the converse does not hold.

Mean of some distributions of a discrete random variable edit

Proposition. (Mean of Bernoulli and binomial r.v.'s) Let   and  . Then,  , and  .

Proof.

  •  .
  • Since  , in which   are i.i.d. and follow   [1],
  •  .

 

Proposition. (Mean of Poisson r.v.'s) Let  . Then,  

Proof.

 

 

Proposition. (Mean of geometric and negative binomial r.v.'s) Let   and   . Then,  , and  .

Proof.

  • Since

 
  • it follows that  .
  • Since   in which   are i.i.d., and follow   [2],
  •  

 

Proposition. (Mean of hypergeometric r.v.'s) Let  . Then,  .

Proof.

  • Since   in which   (each of the Bernoulli r.v.'s indicates whether the corresponding draw of ball is of type 1, with probability   without knowing the results of other draws [3], since each draw is equally likely to be any of the   balls) [4] ,
  • it follows that  

 


Mean of some distributions of a continuous random variable edit

We will introduce the formulas for mean of some distributions of a continuous random variable, which are relatively simpler.

Proposition. (Mean of uniform r.v.'s) Let   ( ). Then,  .

Proof.

 

 

Proposition. (Mean of gamma, exponential, and chi-squared r.v.'s) Let  ,  , and  . Then,  ,  , and  .

Proof.

  • It suffices to prove the formula for mean of gamma r.v.'s, since exponential and chi-squared r.v.'s are essentially special cases of gamma r.v.'s, and thus we can simply substitute some values into the formula for mean of gamma r.v.'s to obtain the formulas for them.
  •  
  • Since  ,   by substituting  .
  • Since  ,   by substituting   and  .

 

Proposition. (Mean of beta r.v.'s) Let  . Then,  .

Proof.

  • We use similar approach from the previous proof.

 

 

Proposition. (Undefined mean of Cauchy r.v.'s) Let  . Then,   is undefined.

Proof.

 

 

Proposition. (Mean of normal r.v.'s) Let  . Then,  .

Proof.

  • Let  .
  •  
  • It follows that  .

 

Examples edit

Example. (St. Petersburg Paradox) Consider a game in which the player toss a fair coin   times, until a head comes up. Since  , the expected value of   is

 
That is, the player requires two tosses to get a head coming up on average.

The game rewards the player   to play the game, but the player must pay back   after getting a head coming up.

Some may think that the expected net gain of the player is

 
so the player has an advantage in this game.

However, this is wrong since the correct expected net gain is instead

 
i.e., on average the player has infinite loss!
 

Exercise.

1 Choose correct statement(s).

  for each random variable  .
  for each random variable  .
  for each random variable  .
  if random variables   and   are pairwise independent.

2 Given that  , Choose correct expression(s) for  .

 
 
 
 


Let us illustrate the usefulness of fundamental bridge between probability and expectation by giving a proof to inclusion-exclusion using this bridge.

Example. (Proof of inclusion-exclusion formula) Recall that the inclusion-exclusion formula is

For each event   and  ,

 
The proof is as follows:

 
 

Probability generating functions edit

An application of expectation is probability generating functions. As suggested by its name, it can generate probabilities in some sense.

Definition. (Probability generating function) Let   be a discrete r.v. with support  . The probability generating functions of   is

 

Remark.

  • There is also moment generating function, which can generate moments (see next section for the definition) in some sense. We will discuss in the transformation of random variables chapter.
  • By taking derivatives of the probability generating function, we can generate probabilities:

 
  • This can be seen directly by evaluating the derivatives.

Variance (and standard deviation) edit

Indeed, variance is a special case of central moment, and is related to moment in some sense.

Definition. ( th moment) The  th moment of a random variable   is  .

Definition. ( th central moment) The  th central moment of a random variable   is  .

Definition. (Variance) The variance of a random variable  , denoted by  , is its 2nd central moment, i.e.  .

Since   is the squared deviation of the value of   from its mean, in view of the definition of variance, we can see that variance measure the dispersion (or spread) of distribution, since it is what we would expect of the squared deviation if we are to take an observation of the random variable.

Another term which is closed related is standard deviation.

Definition. (Standard deviation) The standard deviation of random variable  , usually denoted as  , is   .

Remark.

  • the interpretation of standard deviation is similar to that of variance
  • standard deviation is also sometimes abbreviated as 's.d.'
  • standard deviation of random variable   has the same unit as  , which is one of its advantage, and one of the reasons to use standard deviation instead of variance to measure dispersion.
  • since standard deviation is usually denoted as  , it follows that we can denote variance as  , although it is not as common as the   notation.

Proposition. (Properties of variance)

  • (alternative expression for variance)

 
  • (invariance under change in location parameter)

 
for each constant  
  • (homogeneity of degree two)

 
for each constant  
  • (nonnegativity)

 
  • (zero variance implies non-randomness)

 
  • (additivity under independence)

 

Proof.

  • alternative expression for variance:
Let   for clearer expression.

 
and the result follows.
  • invariance under change in location parameter:

 
  • nonnegativity: it follows from  .
  • zero variance implies non-randomness:
Let   for clearer expression. Consider the event  , in which   is a positive integer.
Since
 
we have  .
Thus,

 
  • additivity under independence:
For each random variable   and   that are independent with means   respectively,

 
Thus, inductively,
 
if   are independent.

 

Variance of some distributions of a discrete random variable edit

Proposition. (Variance of Bernoulli and binomial r.v.'s) Let   and  . Then,   and  .

Proof.

  •   since   is nonnegative.
  • It follows that  .
  • Similar to the proof for the mean of Bernoulli and binomial r.v.'s,   in which   are i.i.d. and follow  .
  • Because of the independence (from i.i.d. property),  

 

Proposition. (Variance of Poisson r.v.'s) Let  . Then,  .

Proof.

  •  
  • Hence,  

 

Proposition. (Variance of geometric and negative binomial r.v.'s) Let   and   . Then,  , and  .

Proof.

  • Since

 
  • it follows that  .
  • Hence,  .
  • Similarly,   in which   are i.i.d., and follow   [5].
  • Because of the independence,  

 

Variance of some distributions of a continuous random variable edit

Proposition. (Variance of uniform r.v.'s) Let  . ( ) Then,  .

Proof.

 

 

Proposition. (Variance of gamma, exponential and chi-squared r.v.'s) Let  ,  , and  . Then,  ,  , and  .

Proof.

  • Similarly, it suffices to prove the formula for variance of gamma r.v.'s.
  •  
  • It follows that  
  • Since  ,   by substituting  .
  • Since  ,   by substituting   and  .

 

Proposition. (Variance of beta r.v.'s) Let  . Then,  .

Proof.

  •  
  • It follows that
     

 

Proposition. (Undefined variance of Cauchy r.v.'s) Let  . Then,   is undefined.

Proof. It follows from the proposition about undefined mean of Cauchy r.v.'s and the formula   (arbitrary term minus undefined term is undefined).

 

Proposition. (Variance of normal r.v.'s) Let  . Then,  .

Proof.

  • Let  .
  •  
  • It follows that  .
  • Hence,  .

 

 

Exercise.

Choose correct statement(s).

  for each constant  .
  for each random variable  , and for each constant  .
 
  if  
Standard deviation of random variable  ,  



Coefficient of variation edit

Definition. (Coefficient of variation) The coefficient of variation is the ratio of the standard deviation to the mean, i.e.  .

Remark.

  • It is also known as relative standard deviation, since it measures the dispersion relatively (relative to the mean).
  • Thus, it tells the dispersion more accurately than standard deviation without mean.
  • Also, coefficient of variation has no unit.
  • So, it is useful to compare dispersion between different data sets.
  • It shows the extent of dispersion in relation to the mean.
  • However, if the mean is zero, then the coefficient of variation is undefined. So, this is a limitation for it.

Example. If   and  , then for each  , the coefficient of variation of   is

 
while the coefficient of variation of   is 1/5, which equals that of   if  , equals negative of that of   if   (they are the same in magnitude, i.e. absolute value). This is expected, since the scaling of random variable itself should not affect the extent of its dispersion.
 

Exercise.

1 Assume   is increased to 20. Calculate   such that the coefficient of variation of   remains unchanged.

1
2
4
5
8

2 Calculate   such that the standard deviation of   remains unchanged.

1
2
4
5
8



Remark.

  • In general, when the mean is negative, then the coefficient of variation will be nonpositive, since standard deviation is always nonnegative.

Quantile edit

Then, we will discuss quantile. In particular, median and interqaurtile range are quite related to quantiles.

Definition. (Quantile) Quantile of order   ( th quantile) of random variable   is

 

Remark.

  • Definition of quantile is not unique. There are several alternative definitions, namely

 
  • If   is strictly increasing, all alternative definitions become equivalent and equal the inverse of cdf at    , and thus we can calculate the  th quantile by solving the equation   .
  • Practical applications focus only on  .

The following are some terminologies related to quantiles.

Definition. (Percentile) The  th percentile is  th quantile.

Example. 70th percentile is 0.7th quantile.

Definition. (Median) The median is 0.5th quantile.

Definition. (Quartile) The  th quartile is  th quantile in which   .

Example. 2nd quartile is 0.5th quantile, which is also median.

Definition. (Interquartile range) The interquartile range is 3rd quartile minus 1st quartile.

Median and interquartile range measure centrality and dispersion respectively. Recall that mean and variance measure the same things respectively. One advantage of median and interquartile range is robust, since they are always defined, while mean and variance can be infinite, and they fail to measure centrality and dispersion in those occasions. However, median and interquartile range also have some disadvantages, e.g. they may be more difficult to be computed, and may not be very accurate.

Example. (Quantile of uniform distribution) The  th quantile of uniform distribution with parameters   and   is

 
since
 
and we can see that   if  .

Then, median of uniform distribution is

 
which is the same as its mean, and the interquartile range of uniform distribution is
 
which is different from its variance, namely  .
 

Exercise.

Choose correct statement(s).

20th quantile is 0.2th percentile
4th quartile is 1st quantile
2nd quantile is undefined.
0th quantile = 0th percentile = 0th quartile.
Interquartile range must be nonnegative.
Median must be nonnegative.



Mode edit

Mode is another measure of centrality.

Definition. (Mode)

  • The mode of a pmf (pdf) is the value of   at which the pmf (pdf) takes its maximum value (has its local maximum).

Remark.

  • The mode is the value that is most likely to be sampled (for pmf).
  • Mode is less frequently used than mean.

Example. The modes of the pmf of the numbers coming up from throwing a fair six-faced dice are 1,2,3,4,5 and 6, since the probability for each of these numbers coming up is 1/6, so the pmf takes its maximum value (1/6) at each of these numbers.

 

Exercise.

Suppose the dice is loaded such that the probability for the number six coming up is 1/2, while other numbers are still equally likely to come up. Which of the following is (are) the mode(s) of the pmf now?

1
2
3
4
5
6



Remark.

  • From this example, we can see that the mode is not necessarily unique.

Covariance and correlation coefficients edit

In this section, we will discuss two important properties of joint distributions, namely covariance and correlation coefficients. As we will see, covariance is related to variance in some sense, and correlation coefficient is closed related to correlation.

Definition. (Covariance) For each random variable  , the covariance of   is

 

Definition. (Correlation coefficient) For each random variable   such that  , the correlation coefficient is

 

Both covariance and correlation coefficient measure linear relationship between   and  . As we will see,  ,   are more highly correlated as   increases, and   has a linear relationship with   if  .

Proposition. (Properties of covariance)

(i) (symmetry) for each random variable  ,

 
(ii) for each random variable  ,
 
(iii) (alternative formula of covariance)
 
(iv) for each constant  , and for each random variables  ,
 
(v) for each random variable  ,
 

Proof.

(i)

 
(ii)
 
(iii)
 
(iv)
 
(v)
 

 

Then, we will discuss about correlation coefficients. The following is the definition of correlation between correlation between two random variables.

Definition. (Correlation between two random variables) Random variables   are uncorrelated if  , and are correlated if  

Remark.

  •  , and   if   and  . This explains why we use covariance instead of correlation coefficient. It is because covariance is always defined, but correlation coefficient may be undefined.

Covariance and correlation coefficient are similar, but they have differences. In particular,   depends on variances of   and  , not just their relationship. Thus, this number is affected by the variances, and does not measure their relationship accurately. On the other hand,   adjusts for variances of   and  , and therefore measures their relationships more accurately.

The following is one of the most important properties of correlation coefficient.

Proposition. (Universal measure by correlation coefficient) Correlation coefficient lies between -1 and 1 (inclusively).

Proof. For each random variable  ,

Aim: prove that  . To get rid of the square root to make the proof neater, we square both side of the inequality, and get  .

Recall that  . So, one way to prove the rightmost inequality is expressing its left side as  , as follows:

 
Thus, the result follows.

 

Remark. For each random variable  ,

  • the higher the  , the higher the correlation between  
  • because of this, we can compare the correlation of different pairs of random variables
  • if  ,   increases linearly with  
  • if  ,   decreases linearly with  

Then, we will define several terminologies related to correlation coefficient.

Definition. (Positively correlated, negatively correlated, and uncorrelated) Random variables   are positively (negatively) correlated if  , i.e.   tends to   as  .

They are uncorrelated if  .


Then, we will state an important result that is related to independence and correlation. Intuitively, you may think that 'independent' is the same as 'uncorrelated'. However, this is wrong. Indeed, 'independent' is stronger than 'uncorrelated'.

Proposition. (Relationship between independence and correlation) If two random variables are independent, they are uncorrelated.

Proof. For each independent random variable   with mean   respectively,

 

 

However, converse is not true, as we will see in the following example.

Example. Let   such that they are independent. Set  . Since  ,  , and  , their joint pmf is

 
The covariance
 
and so   are uncorrelated.

On the other hand,

 
and so   are not independent.

This illustrates that 'uncorrelated' does not imply 'independent'.

 

Exercise.

Choose correct statement(s).

Two random variables   are uncorrelated if at least one of them is non-random constant.
For each random variable, it is uncorrelated with itself.
For each random variable, it increases linearly with itself.
Random variables   are more highly correlated than random variables   if  .



  1. Each of the Bernoulli r.v.'s acts as an indicator for the success of the corresponding trial. Since, there are   independent Bernoulli trials, there are   such indicators.
  2. Each geometric r.v. shows the number of failure for the corresponding success.
  3. since this probability is unconditional, because the corresponding mean is also unconditional, so that their sum is also unconditional mean (as in the proposition)
  4.   are dependent, but we can still use the linearity of expectation, since it does not require independence.
  5. Each geometric r.v. shows the number of failure for the corresponding success.