Probability/Print version

Probability

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/Probability

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Local Manual of Style

Purpose of this book

The difficulty level of this book should be similar to that of first university-level probability course. In particular, measure theory and related advanced topic should not be included in this book. Instead, they should be included in the Probability Theory wikibook (for measure-theoretic probability), or Measure Theory wikibook (for measure theory itself).

Applications of probability can be included briefly, but are not the main focus of this wikibook.

Some notations and abbreviations

Notations

In some occasions, these notations may have different meanings compared with those stated in the following. The words explaining the meaning of the following notations in the actual content take precedence.

• ${\displaystyle {\overset {\text{def}}{=}}}$: equals by definition
• CAPITAL letters (possibly with subscripts): sets [1] or random variables [2];
• small letters (possibly with subscripts): variables or elements in sets;
• ${\displaystyle A\cup B}$: the union of ${\displaystyle A}$ and ${\displaystyle B}$;
• ${\displaystyle A\cap B}$: the intersection of ${\displaystyle A}$ and ${\displaystyle B}$;
• ${\displaystyle A\setminus B}$: the relative complement of ${\displaystyle B}$ in ${\displaystyle A}$
• ${\displaystyle A\subseteq B}$: ${\displaystyle A}$ is a subset of ${\displaystyle B}$;
• ${\displaystyle A\subsetneq B}$: ${\displaystyle A}$ is a proper subset of ${\displaystyle B}$;
• ${\displaystyle S^{c}}$: the (absolute) complement of ${\displaystyle S}$;
• ${\displaystyle U}$: a universal set;
• ${\displaystyle \#(S)}$: the cardinality of ${\displaystyle S}$;
• ${\displaystyle {\mathcal {P}}(S)}$: the power set of ${\displaystyle S}$;
• ${\displaystyle {\binom {n}{r}}}$: the binomial coefficient indexed by ${\displaystyle n}$ and ${\displaystyle r}$;
• ${\displaystyle \Omega }$: a sample space;
• ${\displaystyle {\mathcal {F}}}$: a event space;
• ${\displaystyle \mathbb {P} }$: the probability (function);
• ${\displaystyle A\perp \!\!\!\perp B}$: ${\displaystyle A}$ and ${\displaystyle B}$ are independent;
• ${\displaystyle F}$: a cumulative distribution function;
• ${\displaystyle f}$: a probability mass or probability density function;
• ${\displaystyle \operatorname {supp} (X)}$: the support of ${\displaystyle X}$;
• ${\displaystyle \operatorname {Binom} (n,p)}$: the binomial distribution with ${\displaystyle n}$ independent Bernoulli trials with success probability ${\displaystyle p}$;
• ${\displaystyle \operatorname {Ber} (p)}$: the Bernoulli distribution with one Bernoulli trial with success probability ${\displaystyle p}$;
• ${\displaystyle \operatorname {Pois} (\lambda )}$: the Poisson distribution with rate parameter ${\displaystyle \lambda }$;
• ${\displaystyle \operatorname {Geo} (p)}$: the geometric distribution with success probability ${\displaystyle p}$;
• ${\displaystyle \operatorname {NB} (k,p)}$: the negative binomial distribution (number of failures before ${\displaystyle k}$th successes) with success probability ${\displaystyle p}$;
• ${\displaystyle \operatorname {HypGeo} (N,K,n)}$: the hypergeometric distribution with population size ${\displaystyle N}$ containing ${\displaystyle K}$ objects of type 1, ${\displaystyle N-K}$ objects of another type, and ${\displaystyle n}$ objects drawn;
• ${\displaystyle \operatorname {FD} (\mathbf {x} ,\mathbf {p} )}$: the finite discrete distribution with vector ${\displaystyle \mathbf {x} }$ and probability vector ${\displaystyle \mathbf {p} }$;
• ${\displaystyle \operatorname {D} {\mathcal {U}}\{x_{1},\dotsc ,x_{n}\}}$: the discrete uniform distribution;
• ${\displaystyle {\mathcal {U}}[a,b]}$: the uniform distribution over the interval ${\displaystyle [a,b]}$;
• ${\displaystyle \operatorname {Exp} (\lambda )}$ [3]: the exponential distribution with rate parameter ${\displaystyle \lambda }$;
• ${\displaystyle \operatorname {Gamma} (\alpha ,\lambda )}$: the gamma distribution with shape parameter ${\displaystyle \alpha }$ and rate parameter ${\displaystyle \lambda }$;
• ${\displaystyle \operatorname {Beta} (\alpha ,\beta )}$: the beta distribution with shape parameters ${\displaystyle \alpha }$ and ${\displaystyle \beta }$;
• ${\displaystyle \operatorname {Cauchy} (\theta )}$: the Cauchy distribution with location parameter ${\displaystyle \theta }$ (with scale parameter 1);
• ${\displaystyle {\mathcal {N}}(\mu ,\sigma ^{2})}$: the normal distribution with mean ${\displaystyle \mu }$ and variance ${\displaystyle \sigma ^{2}}$;
• ${\displaystyle \chi _{\nu }^{2}}$: the chi-squared distribution with ${\displaystyle \nu }$ degrees of freedom;
• ${\displaystyle t_{\nu }}$: the Student's ${\displaystyle t}$ distribution with ${\displaystyle \nu }$ degrees of freedom;
• ${\displaystyle F_{\nu _{1},\nu _{2}}}$: the ${\displaystyle F}$-distribution with ${\displaystyle \nu _{1}}$ and ${\displaystyle \nu _{2}}$ degrees of freedom;
• ${\displaystyle \operatorname {Multinom} (n,\mathbf {p} )}$: the multinomial distribution with ${\displaystyle n}$ trials and probability vector ${\displaystyle \mathbf {p} }$.
• ${\displaystyle {\mathcal {N}}_{k}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}$: the ${\displaystyle k}$-dimensional multivariate normal distribution with mean vector ${\displaystyle {\boldsymbol {\mu }}}$ and covariance matrix ${\displaystyle {\boldsymbol {\Sigma }}}$;
• ${\displaystyle \mathbb {E} [X]}$ (or ${\displaystyle \mu _{X}}$): the mean of ${\displaystyle X}$;
• ${\displaystyle \operatorname {Var} (X)}$ (or ${\displaystyle \sigma _{X}^{2}}$): the variance of ${\displaystyle X}$;
• ${\displaystyle \sigma _{X}}$: the standard deviation of ${\displaystyle X}$;
• ${\displaystyle \operatorname {Cov} (X,Y)}$: the covariance of ${\displaystyle X}$ and ${\displaystyle Y}$;
• ${\displaystyle \rho (X,Y)}$ (or ${\displaystyle \rho _{XY}}$) : the correlation coefficient of ${\displaystyle X}$ and ${\displaystyle Y}$;
• Bold CAPITAL letters (e.g. ${\displaystyle \mathbf {X} }$, and possibly with subscript): random vectors;
• Bold small letters (e.g. ${\displaystyle \mathbf {x} }$, and possibly with subscript): vectors;
• ${\displaystyle \mathbf {x} ^{T}}$: the transpose of ${\displaystyle \mathbf {x} }$;
• ${\displaystyle \mathbf {x} \cdot \mathbf {y} }$: the dot product of ${\displaystyle \mathbf {x} }$ and ${\displaystyle \mathbf {y} }$.

Abbreviations

• no.: number;
• r.v.: random variable;
• cdf: cumulative distribution function;
• pmf: probability mass function;
• pdf: probability density function;
• s.d.: standard deviation;
• df: degrees of freedom;
• It is usually denoted by ${\displaystyle \nu }$ (stands for 'nu', possibly with subscript).
• i.i.d: independent and identically distributed;
• mgf: moment generating function;
• CLT: Central Limit Theorem.

Conventions

• Use title casing for subpage (called chapter) titles, and use sentence casing for section titles.
• Use LaTeX (instead of HTML) for all math-related variables, formulas, notations etc., to ensure consistency in appearance[4].
• Use  for inline math[5];
• Use <math display=block>[/itex] for display math (i.e. formulas on its own line);
• Use quizzes (if possible) for exercises.
• Try to use mnemonic notations (if possible). E.g., ${\displaystyle S}$ for a set, ${\displaystyle t}$ for time, etc. [6]

Introduction

Overview

Probability theory provides a mathematical model for the study of randomness and uncertainty. Many important decisions, whether from business, government, science, recreation or even one's personal life must be made with incomplete information or some degree of uncertainty. Hence, a formalized study of uncertain or random outcomes occupies an important role in modern society. In situations where one of any number of possible outcomes may occur, the mathematical model of probability theory offers methods for quantifying the likelihoods associated with those outcomes. Probability also provides tools which allow us to move beyond simply describing the information contained within a set of data (descriptive statistics) to actually inferring further information from that data (inferential statistics). Many of the early attempts to model likelihood arose from games of chance. For a brief history of probability see this Wikipedia article.

Although probability theory is now a very formal branch of mathematics, the language of probability is often used informally in everyday speech. We express our beliefs about likelihoods of outcomes in situations involving uncertainty using intuition guided by our experiences and in some cases statistics. Consider the following examples:

• Bill says "Don't buy the avocados here; about half the time, they're rotten". Bill is expressing his belief about the probability of an event — that an avocado will be rotten — based on his personal experience.
• Lisa says "I am 95% certain the capital of Spain is Barcelona". Here, the belief Lisa is expressing is only a probability from her point of view, because only she does not know that the capital of Spain is Madrid (from our point of view, the probability is 100%). However, we can still view this as a subjective probability because it expresses a measure of uncertainty. It is as though Lisa is saying "in 95% of cases where I feel as sure as I do about this, I turn out to be right".
• Susan says "There is a lower chance of being shot in Omaha than in Detroit". Susan is expressing a belief based (presumably) on statistics.
• Dr. Smith says to Christina, "There is a 75% chance that you will live." Dr. Smith is basing this off of his research.
• Nicolas says "It will probably rain tomorrow." In this case the likelihood that it will rain is expressed in vague terms and is subjective, but implies that the speaker believes it is greater than ${\displaystyle {\frac {1}{2}}}$ (or 50%). Subjective probabilities have been extensively studied, especially with regards to gambling and securities markets. While this type of probability is important, it is not the subject of this book. A good reference is "Degrees of Belief" By Steven Vick (2002).

Notice that in the previous examples the likelihood of any particular outcome is expressed as a percentage (between 0% and 100%), as is common in everyday language. However, probabilities in formal probability theory are always expressed as real numbers in the interval ${\displaystyle [0,1]}$ (e.g. a probability of .25 may be expressed as 25%, or a probability of ${\displaystyle {\frac {1}{\pi }}}$ may be expressed as approximately 31.83%). Other differences exist between common expressions of probabilities and formal probability theory. For example, a probability of 0% is typically taken to mean that the event to which that probability is assigned is impossible. However, in probability theory (usually in cases where there are infinitely many possible outcomes) an event ascribed a probability of zero may actually occur. In some situations, it is certain that such an event will occur (e.g. in selecting a real number between 0 and 1, the probability of selecting any given number is zero, but it is certain that one such number will be selected).

Another way to express the probability of an outcome is by its odds: the ratio of the probability of "success" (event occurs) to the probability of "failure" (event does not occur). In gambling odds are expressed as the ratio of the stakes risked by each participant in a wager. For instance: a bookmaker offering odds of 3 to 1 "against" a horse will pay a punter three times their stake (if the horse wins). In fact, the bookmaker (ignoring factors such as his potential need to "lay off" bets which are exposing him to the possibility of an unacceptable overall loss) is announcing that he thinks the horse has a ${\displaystyle {\frac {1}{4}}}$ chance of winning. If we express odds as "chance of winning": "chance of not winning", then 3 to 1 against would be represented as ${\displaystyle 1:3={\frac {1}{4}}:{\frac {3}{4}}}$ or ${\displaystyle {\frac {1}{3}}}$ . So an event with a probability of ${\displaystyle {\frac {1}{4}}}$ or 25% has odds of 33%. This disparity is even more clear where an event has a probability of 50% (e.g., the odds of a coin showing heads is 50%:50% = 1:1 or ${\displaystyle {\frac {1}{2}}}$).

Types of probability

As mentioned earlier, probability can be expressed informally in a variety of different ways, but even formal definitions and approaches vary. The most general and rigorous approach is known as axiomatic probability theory, which will be the focus of later chapters. Here we briefly discuss a few other approaches, their uses and limitations. All of these approaches rely in one way or another on the concept of an experiment. Recall that probability provides means to study randomness and uncertainty.

An experiment is any action or process whose outcome is subject to uncertainty or randomness.

Here the term experiment is used in a wider sense than its usual connotation with controlled laboratory situations. Further clarification on experiments will be given later, but for now the following examples of experiments will suffice:

• observing whether or not a commercial product is defective.
• tossing a coin one or more times or selecting a card from a card deck.
• conducting a survey.
• measuring the wind speed or rainfall in a particular area.

Assuming that an experiment can be repeated under identical conditions, then each repetition of an experiment is called a trial.

Basic Concepts

There are two standard approaches to conceptually interpreting probabilities: the relative frequency approach and the subjective belief (or confidence approach). In the Frequency Theory of Probability, probability is the limit of the relative frequency with which certain outcomes occur in repeated trials (note that the outcome of any single trial cannot depend on the outcome of other trials). The relative frequency approach requires that experiments be random and that all possible outcomes be known before execution of the experiment. The probability of any set of outcomes is expressed as the relative frequency with which those outcomes will occur among many repeated trials.

Physical probabilities fall within the category of objective or frequency probabilities, and are associated with random physical systems such as roulette wheels, rolling dice and radioactive atoms. In such systems, a given outcome (such as a die yielding a six) tends to occur at a persistent rate, or 'relative frequency', in a long run of trials. Physical probabilities either explain, or are invoked to explain these stable frequencies.

Relative frequency probabilities are always expressed as a figure between 0% (the outcome essentially never happens) and 100% (the outcome essentially always happens), or similarly as a figure between 0 and 1. According to the Frequency Theory of Probability, saying that "the probability that A occurs is p%" means that if you repeat the experiment many times under essentially identical conditions, the percentage of time for which A occurs will converge to p. For example, a 50% chance that a coin lands "heads up" means that if you toss the coin over and over again, then the ratio of times the coin lands heads to the total number of tosses approaches a limiting value of 50% as the number of tosses grows. Notice that the outcome of one toss never depends on another toss, and that the ratio of heads to total number of tosses is always between 0% and 100%.

In the Subjective Theory of Probability, probability measures the speaker's "degree of belief" that a set of outcomes will result, on a scale of 0% (complete disbelief that the event will happen) to 100% (certainty that the event will happen). According to the Subjective Theory, saying that "the probability that A occurs is ${\displaystyle {\frac {2}{3}}}$ " means that I believe that A will happen twice as strongly as I believe that A will not happen. The Subjective Theory is particularly useful in assigning meaning to the probability of outcomes that in principle can occur only once. For example, how might one assign meaning to the following statement: "there is a 25% chance of an earthquake on the San Andreas fault with magnitude 8 or larger before 2050"? It would be very hard to qualify this measure in terms of relative frequency.

One way to represent an individual's degree of belief in a statement, given available evidence, is with the Bayesian approach. Evidential probability, also called Bayesian probability, can be assigned to any statement whatsoever, even when no random process is involved. On most accounts evidential probabilities are considered degrees of belief, defined in terms of dispositions to gamble at certain odds. The primary evidential interpretations include the classical interpretation, the subjective interpretation, the epistemic or inductive interpretation, and the logical interpretation.

The next several sections discuss the principal theories within the relative frequency approach to probability.

Classical theory of probability

The classical approach to probability expresses probability as a ratio of the number of favorable outcomes in a series of successive trials to the number of total possible outcomes. Note the immediate implication that the number of total possible outcomes be known. Furthermore, all possible outcomes are assumed to be equally probably and no two possible outcomes can both result from the same trial. Here, the term "favorable" is not subjective, but rather indicates that an outcome belongs to a group of outcomes of interest. This group of outcomes is called an event, which will be formalized with the introduction of axiomatic probability theory.

 Classical definition of probability If the number of outcomes belonging to an event ${\displaystyle E}$ is ${\displaystyle N_{E}}$ , and the total number of outcomes is ${\displaystyle N}$ , then the probability of event ${\displaystyle E}$ is defined as ${\displaystyle p_{E}={\frac {N_{E}}{N}}}$ .

For example, a standard deck of cards (without jokers) has 52 cards. If we randomly draw a card from the deck, we can think of each card as a possible outcome. Therefore, there are 52 total outcomes. We can now look at various events and calculate their probabilities:

• Out of the 52 cards, there are 13 clubs. Therefore, if the event of interest is drawing a club, there are 13 favorable outcomes, and the probability of this event is ${\displaystyle {\frac {13}{52}}={\frac {1}{4}}}$ .
• There are 4 kings (one of each suit). The probability of drawing a king is ${\displaystyle {\frac {4}{52}}={\frac {1}{13}}}$ .
• What is the probability of drawing a king OR a club? This example is slightly more complicated. We cannot simply add together the number of outcomes for each event separately (${\displaystyle 4+13=17}$) as this inadvertently counts one of the outcomes twice (the king of clubs). The correct answer is ${\displaystyle {\frac {16}{52}}}$ from ${\displaystyle {\frac {13}{52}}+{\frac {4}{52}}-{\frac {1}{52}}}$ where this is essentially ${\displaystyle p({\text{club}})+p({\text{king}})-p({\text{king of clubs}})}$ .

Classical probability suffers from a serious limitation. The definition of probability implicitly defines all outcomes to be equiprobable. While this might be useful for drawing cards, rolling dice, or pulling balls from urns, it offers no method for dealing with outcomes with unequal probabilities.

This limitation can even lead to mistaken statements about probabilities. An often given example goes like this:

I could be hit by a meteor tomorrow. There are two possible outcomes: I will be hit, or I will not be hit. Therefore, the probability I will be hit by a meteor tomorrow is ${\displaystyle {\frac {1}{2}}=50\%}$ .

Of course, the problem here is not with the classical theory, merely the attempted application of the theory to a situation to which it is not well adapted.

This limitation does not, however, mean that the classical theory of probability is useless. At many points in the development of the axiomatic approach to probability, classical theory is an important guiding factor.

Empirical or Statistical Probability or Frequency of occurrence

This approach to probability is well-suited to a wide range of scientific disciplines. It is based on the idea that the underlying probability of an event can be measured by repeated trials.

 Empirical or Statistical Probability as a measure of frequency Let ${\displaystyle n_{A}}$ be the number of times event ${\displaystyle A}$ occurs after ${\displaystyle n}$ trials. We define the probability of event ${\displaystyle A}$ as ${\displaystyle p_{A}=\lim _{n\to \infty }{\frac {n_{A}}{n}}}$

It is of course impossible to conduct an infinite number of trials. However, it usually suffices to conduct a large number of trials, where the standard of large depends on the probability being measured and how accurate a measurement we need.

A note on this definition of probability: How do we know the sequence ${\displaystyle {\frac {n_{A}}{n}}}$ in the limit will converge to the same result every time, or that it will converge at all? The unfortunate answer is that we don't. To see this, consider an experiment consisting of flipping a coin an infinite number of times. We are interested in the probability of heads coming up. Imagine the result is the following sequence:

HTHHTTHHHHTTTTHHHHHHHHTTTTTTTTHHHHHHHHHHHHHHHHTTTTTTTTTTTTTTTT...

with each run of ${\displaystyle k}$ heads and ${\displaystyle k}$ tails being followed by another run twice as long. For this example, the sequence ${\displaystyle {\frac {n_{A}}{n}}}$ oscillates between roughly ${\displaystyle {\frac {1}{3}}}$ and ${\displaystyle {\frac {2}{3}}}$ and doesn't converge.

We might expect such sequences to be unlikely, and we would be right. It will be shown later that the probability of such a run is 0, as is a sequence that converges to anything other than the underlying probability of the event. However, such examples make it clear that the limit in the definition above does not express convergence in the more familiar sense, but rather some kind of convergence in probability. The problem of formulating exactly what this means belongs to axiomatic probability theory.

Axiomatic probability theory

Although axiomatic probability theory is often frightening to beginners, it is the most general approach to probability and has been employed in tackling some of the more difficult problems in probability. It begins with a set of axioms which, although not immediately intuitive, are guided by the more familiar classical probability theory. These axioms are discussed in the (as yet unwritten) following chapter.

This book is going to discuss the topic of mathematical probability using Calculus and Linear Algebra. Readers of this book should have a good understanding of both those topics before attempting to read and understand this book completely.

Set Theory

Introduction

The overview of set theory contained herein adopts a naive point of view. A rigorous analysis of the concept belongs to the foundations of mathematics and mathematical logic. Although we shall not initiate a study of these fields, the rules we follow in dealing with sets are derived from them.

Sets

Definition.

A set.

(Set)

• A set is a well-defined collection of distinct object(s), which are called element(s).
• We say that an element belongs to the set.
• If ${\displaystyle x}$ belongs to a set ${\displaystyle S}$ , we write ${\displaystyle x\in S}$.
• If ${\displaystyle x}$ does not belong to the set ${\displaystyle S}$ , we write ${\displaystyle x\notin S}$.

Remark.

• When ${\displaystyle x}$ and ${\displaystyle y}$ are (not) equal, denoted by ${\displaystyle x=(\neq )\;y}$ , ${\displaystyle x}$ and ${\displaystyle y}$ are different symbols denoting the same (different) object(s).

Example. (Collection that is not well-defined)

• The collection of easy school subjects is not a set.

We have different ways to describe a set, e.g.

• word description: e.g., a set ${\displaystyle S}$ is the set containing the 12 months in a year;
• listing: elements in a set are listed within a pair of braces, e.g., ${\displaystyle S{\overset {\text{ def }}{=}}\{{\text{January, March, February, April, May, June, July, August, September, October, November, December}}\}}$;
the ordering of the elements is not important, i.e. even if the elements are listed in different order, the set is still the same. E.g., ${\displaystyle \{{\text{January, February, March, April, May, June, July, August, September, October, November, December}}\}}$ is still referring to the same set.
• set-builder notation:
${\displaystyle \underbrace {\{} _{{\text{The set of}}\;}\underbrace {x} _{{\text{all elements }}x\;}\underbrace {:} _{\text{such that }}\underbrace {P(x)} _{{\text{the property }}P(x){\text{ holds}}}\}}$
in which the closing brace must also be written. E.g., ${\displaystyle S{\overset {\text{ def }}{=}}\{x:x{\text{ is a month in a year}}\}}$.
• In particular, since a set contains distinct objects, the months contained in this set are distinct, and therefore there are only 12 elements in this set.

Example. (Empty set) The set ${\displaystyle \{\}}$ is called an empty set, and it contains no element. It is commonly denoted by ${\displaystyle \varnothing }$ also.

Exercise.

Is ${\displaystyle \{\varnothing \}}$ an empty set?

 Yes. No.

Example.

• ${\displaystyle {\text{apple}}\in \{{\text{apple, orange, banana}}\}}$;
• ${\displaystyle \varnothing \in \{\varnothing \}}$;
• ${\displaystyle \varnothing \notin \varnothing }$.

Exercise.

Select all element(s) belonging to the set ${\displaystyle {\big \{}\varnothing ,\{\varnothing \},\{\{\varnothing \}\}{\big \}}}$.

 ${\displaystyle \varnothing }$ ${\displaystyle \{\varnothing \}}$ ${\displaystyle \{\varnothing ,\{\varnothing \}\}}$ ${\displaystyle \{\{\varnothing \}\}}$ ${\displaystyle \{\{\varnothing ,\{\varnothing \}\}\}}$

Definition. (Set equality) When two sets are equal, they contain the same elements.

Remark.

• Equivalently, two sets ${\displaystyle A}$ and ${\displaystyle B}$ are equal if each element of ${\displaystyle A}$ is also element of ${\displaystyle B}$ and each element of ${\displaystyle B}$ is also element of ${\displaystyle A}$.
• We use ${\displaystyle A=(\neq )\;B}$ to denote sets ${\displaystyle A}$ and ${\displaystyle B}$ are (not) equal.

Example.

• ${\displaystyle \{\varnothing \}}$, ${\displaystyle \{x:x{\text{ is a empty set}}\}}$, and the set that contains only an empty set are pairwise equal.
• ${\displaystyle \varnothing =\{\}\neq {\big \{}\{\}{\big \}}=\{\varnothing \}}$.
• ${\displaystyle \{2,3,5\}=\{x:x{\text{ is a prime number that is not greater than }}5\}}$

Definition. (Universal set) Universal set, denoted by ${\displaystyle U}$, is the set that contains all objects being considered in a particular context.

Remark.

• In the context of probability, a universal set, which is usually denoted by ${\displaystyle \Omega }$ instead, is the set containing all outcomes of a particular random experiment, and is also called a sample space.

Definition. (Cardinality) Cardinality of a finite set, which is a set containing finite number of elements, is the number of its elements.

Remark.

• Cardinality of set ${\displaystyle S}$ can be denoted by ${\displaystyle \#(S)}$ (or ${\displaystyle |S|}$)
• We do not use the ${\displaystyle |\cdot |}$ notation to avoid ambiguity.
• Infinite set is a set containing infinite number of elements.
• We will leave the cardinality of infinite set undefined in this book, but it can be defined, in a more complicated way.

Example.

• ${\displaystyle \#({\big \{}\{1\},2,3{\big \}})=3}$.
• ${\displaystyle \#(\varnothing )=0}$.
• ${\displaystyle \mathbb {N} }$ (the set containing each positive number) is an infinite set.

Exercise.

Calculate ${\displaystyle \#(\{\varnothing ,\{\varnothing \}\})}$.

 0 1 2 3 None of the above.

Subsets

We introduce a relationship between sets in this section.

Definition. (Subset)

• If each element of set ${\displaystyle A}$ is an element of set ${\displaystyle B}$, then ${\displaystyle A}$ is a subset of ${\displaystyle B}$, denoted by ${\displaystyle A\subseteq B}$.
• If ${\displaystyle A}$ is not a subset of ${\displaystyle B}$, then we write ${\displaystyle A\not \subseteq B}$.

Remark.

• By referring to the definitions of subsets and set equality, we can see that ${\displaystyle A=B}$ is equivalent to (or if and only if) ${\displaystyle A\subseteq B}$ and ${\displaystyle B\subseteq A}$.
• The notation ${\displaystyle A\supseteq B}$ means that ${\displaystyle A}$ is a superset of ${\displaystyle B}$, which means that ${\displaystyle B}$ is a subset of ${\displaystyle A}$.
• This notation and terminology are seldom used.

Definition. (Venn diagram) A Venn diagram is a diagram that shows all possible logical relations between finitely many sets.

Remark.

• It is quite useful for illustrating some simple relationships between sets, and making the relationships clear.
• We may also add various annotations in the Venn digram, e.g. cardinality of each set, and the element(s) contained by each set.

Illustration of subset by Venn diagram:

A ⊆ B (A ≠ B):

*-----------------------*
|                       |
|                       |
|   *----------*        | <---- B
|   |          |        |
|   |    A     |        |
|   |          |        |
|   *----------*        |
*-----------------------*


Example.

• ${\displaystyle \{1,3\}\subseteq \{1,2,3\}}$;

Venn digram:

*-----------------------*
|                       |
|                       |
|   *----------*  2     |
|   |          |        |
|   |    1  3  |        |
|   |          |        |
|   *----------*        |
*-----------------------*

• ${\displaystyle \{\{1\}\}\not \subseteq \{1,2,3\}}$;

Venn diagram:

*-----------------------*
|                       |
|                       |
|   *----------*  {1}   |
|   |          |        |
|   |  1 2  3  |        |
|   |          |        |
|   *----------*        |
*-----------------------*

• ${\displaystyle \varnothing \subseteq S}$ for each set ${\displaystyle S}$ ;
• ${\displaystyle S\subseteq S}$ for each set ${\displaystyle S}$.

Example. (Intervals) Intervals are commonly encountered subsets of ${\displaystyle \mathbb {R} }$. If ${\displaystyle a}$ and ${\displaystyle b}$ are (extended) real numbers [7]such that ${\displaystyle a, then

{\displaystyle {\begin{aligned}{\color {Maroon}(}a,b{\color {Maroon})}&{\overset {\text{ def }}{=}}\{x\in \mathbb {R} :a\;{\color {Maroon}<}\;x\;{\color {Maroon}<}\;b\};\\{\color {darkgreen}[}a,b{\color {Maroon})}&{\overset {\text{ def }}{=}}\{x\in \mathbb {R} :a\;{\color {darkgreen}\leq }\;x\;{\color {Maroon}<}\;b\};\\{\color {Maroon}(}a,b{\color {darkgreen}]}&{\overset {\text{ def }}{=}}\{x\in \mathbb {R} :a\;{\color {Maroon}<}\;x\;{\color {darkgreen}\leq }\;b\};\\{\color {darkgreen}[}a,b{\color {darkgreen}]}&{\overset {\text{ def }}{=}}\{x\in \mathbb {R} :a\;{\color {darkgreen}\leq }\;x\;{\color {darkgreen}\leq }\;b\}.\\\end{aligned}}}
In particular, ${\displaystyle (-\infty ,\infty )=\mathbb {R} }$, and ${\displaystyle [-\infty ,\infty ]}$ is the set containing all extended real numbers.

Definition. (Proper subset)

• Set ${\displaystyle A}$ is a proper subset of set ${\displaystyle B}$ if ${\displaystyle A\subseteq B}$ and ${\displaystyle A\neq B}$;. We write ${\displaystyle AsubsetneqB}$ in this case.
• If set ${\displaystyle A}$ is not a proper subset of ${\displaystyle B}$, then we write ${\displaystyle A\not \subsetneq B}$ (but we rarely write this).

Remark.

• The notation ${\displaystyle A\supsetneq }$ means that ${\displaystyle A}$ is a proper superset of ${\displaystyle B}$, which means that ${\displaystyle B}$ is a proper subset of ${\displaystyle A}$.
• This notation and terminology are seldom used.

Example.

• ${\displaystyle \{1,2\}\not \subsetneq \{1,2\}}$.

Definition. (Complement) Let ${\displaystyle A}$ be a subset of universal set ${\displaystyle U}$. The (absolute) complement of ${\displaystyle A}$, denoted by ${\displaystyle A^{c}}$, is the set ${\displaystyle \{x\in U:x\notin A\}}$.

Example. If ${\displaystyle A=\{1,2,3\}}$ and ${\displaystyle U=\{1,2,3,4,5\}}$, then ${\displaystyle A^{c}=\{4,5\}}$.

Venn diagram:

*-----------------------*
|                       |
|                4  5   |
|   *----------*        |
|   |          |        | <---- U
|   |  1 2  3  |        |
|   |          |        |
|   *----------*        |
*--------^--------------*
|
A


Exercise.

Find ${\displaystyle \varnothing ^{c}}$.

 ${\displaystyle U}$ ${\displaystyle \{U\}}$ ${\displaystyle \varnothing }$ ${\displaystyle \{\varnothing \}}$ None of the above.

Set operations

Probability theory makes extensive use of some set operations, and we will discuss them in this section.

Definition.

Union of two sets is indicated by the red region.

(Union of sets) Union of set ${\displaystyle A}$ and set ${\displaystyle B}$, denoted by ${\displaystyle A\cup B}$, is the set ${\displaystyle \{x:x\in A{\text{ or }}x\in B\}}$.

Remark.

• ${\displaystyle A\cup B}$ is read 'A cup B'.
• We can denote ${\displaystyle A_{1}\cup A_{2}\cup \dotsb }$ by ${\displaystyle \bigcup _{i=1}^{n}A_{i}}$ (if the sequence of unions stops at ${\displaystyle A_{n}}$), or ${\displaystyle \bigcup _{i=1}^{\infty }A_{i}}$ (if the sequence of unions does not stop).

Example.

• ${\displaystyle \{{\text{apple}},{\text{orange}}\}\cup \{{\text{orange}},{\text{red}}\}=\{{\text{apple}},{\text{orange}},{\text{red}}\}}$.

Venn diagram:

*----------------*
|                |
|  red   *-------*--------*
|        | orange|        |
*--------*-------*        |
|       apple    |
*----------------*


Proposition. (Properties of union of sets) Let ${\displaystyle A}$,${\displaystyle B}$ and ${\displaystyle C}$ be sets. Then, the following statements hold.

(a) ${\displaystyle A\cup A=A}$;
(b) ${\displaystyle A\cup B=B\cup A}$ (commutative law);
(c) ${\displaystyle A\cup (B\cup C)=(A\cup B)\cup C}$ (associative law);
(d) ${\displaystyle A\subseteq A\cup B}$ and ${\displaystyle B\subseteq A\cup B}$;
(e) ${\displaystyle A\cup \varnothing =A}$;
(f) ${\displaystyle A\cup B=B}$ if and only if ${\displaystyle A\subseteq B}$.

Proof. Informally, consider the following Venn diagrams:

(a)
*----*
|    | <---- A ∪ A (a set overlaps itself completely)
|    | <--- A
*----*

(b)
*----------------*
|////////////////| <---- A
|////////*-------*--------*
|////////|///////|////////|
*--------*-------*////////| <--- B
|////////////////|
*----------------*
(shaded region refers to both A ∪ B and B ∪ A)

(c)
*----------*
|//////////| <--- A
|//////////|
|/////*----|----*
|/////|////|////| <---- C
*-----*----*----*------------*
|/////|////|////|////////////| <--- B
|/////*----|----*////////////|
*----------*-----------------*
(shaded region refers to both A∪(B∪C) and (A∪B)∪C)

(d)
*----------------*
|                | <---- A
|        *-------*--------*
|        |       |        |
*--------*-------*        | <--- B
|                |
*----------------*

${\displaystyle A}$ and ${\displaystyle B}$ are both inside the whole region, which represents ${\displaystyle A\cup B}$.
(e)
*----*
|    |
|    | .
*----*

(f)
*-----------------------*
|                       |
|                       |
|   *----------*        | <---- B
|   |          |        |
|   |    A     |        |
|   |          |        |
|   *----------*        |
*-----------------------*
(the whole region is both A u B and B)

We use a dot, which has zero area, to represent ${\displaystyle \varnothing }$. Then, we can see that the union of the region and the dot is the region itself.

${\displaystyle \Box }$

Remark.

• Formal proofs of propositions and theorems about sets will not be emphasized in this book.
• Instead, we will usually prove the propositions and theorems informally, e.g. using Venn diagram.

Definition. (Intersection of sets)

Intersection of two sets.

Intersection of set ${\displaystyle A}$ and set ${\displaystyle B}$, denoted by ${\displaystyle A\cap B}$, is the set ${\displaystyle \{x:x\in A{\text{ and }}x\in B\}}$.

Remark.

• ${\displaystyle A\cap B}$ is read 'A cap B'.
• We can denote ${\displaystyle A_{1}\cap A_{2}\cap \dotsb }$ by ${\displaystyle \bigcap _{i=1}^{n}A_{i}}$ (if the sequence of intersections stops at ${\displaystyle A_{n}}$), or ${\displaystyle \bigcap _{i=1}^{\infty }A_{i}}$ (if the sequence of intersection does not stop).

Example.

• ${\displaystyle \{1,2,3\}\cap \{2,3,4\}=\{2,3\}}$;
• ${\displaystyle \{1,2,3\}\cap \{4,5,6\}=\varnothing }$.

Definition. (Disjoint sets) Set ${\displaystyle A}$ and set ${\displaystyle B}$ are disjoint (or mutually exclusive) if ${\displaystyle A\cap B=\varnothing }$.

Remark.

• I.e., ${\displaystyle A}$ and ${\displaystyle B}$ are disjoint if they have no element in common.
• More than two events are disjoint if they are pairwise disjoint.

Venn diagram

*-----*       *-----*       *-----*
|     |       |     |       |     |
|  A  |       |  B  |       |  C  |
*-----*       *-----*       *-----*

(A, B and C are disjoint)

*----------------*
|                | <---- D
| *--*   *-------*--------*
| |  |   |       |        |
*-*--*---*-------*        | <--- E
|  |   |                |
*--*   *----------------*
^
|
F

(D, E and F are not disjoint, but E and F are disjoint)


Definition. (Partition of a set) A collection of sets form a partition of a set ${\displaystyle S}$ if the sets in the collection are disjoint and their union is ${\displaystyle S}$.

Venn diagram

*-----------------------*
| \        A            |
|  \                    |
|B  *-------------------*
|   /                   |
|  /       C            |
| /         *-----------*  <----- S
|/         /            |
*\        /             |
| *------*              |
|        |    E         |
|  D     |              |
*--------*--------------*

(A,B,C,D and E form a partition of S)


Proposition. (Properties of intersection of sets) Let ${\displaystyle A}$,${\displaystyle B}$ and ${\displaystyle C}$ be sets. Then, the following statements hold.

(a) ${\displaystyle A\cap A=A}$;
(b) ${\displaystyle A\cap B=B\cap A}$ (commutative law);
(c) ${\displaystyle A\cap (B\cap C)=(A\cap B)\cap C}$ (associative law);
(d) ${\displaystyle A\cap B\subseteq A}$ and ${\displaystyle A\cap B\subseteq B}$;
(e) ${\displaystyle A\cap \varnothing =\varnothing }$;
(f) ${\displaystyle A\cap B=B}$ if and only if ${\displaystyle B\subseteq A}$.

Proof. Informally, consider the following Venn diagrams:

(a)
*----*
|    | <---- A ∩ A (a set overlaps itself completely)
|    | <--- A
*----*

(b)
*----------------*
|                | <---- A
|        *-------*--------*
|        |A∩B=B∩A|        |
*--------*-------*        | <--- B
|                |
*----------------*

(c)
*----------*
|          | <--- A
|          |
|     *----*----*
|     |    |    | <---- C
*-----*----*----*------------*
|     |////|    |            | <--- B
|     *----*----*            |
|          |                 |
*----------*-----------------*

*----*
|////| : A∩(B∩C)=(A∩B)∩C
*----*

(d)
*----------------*
|                | <---- A
|        *-------*--------*
|        | A ∩ B |        |
*--------*-------*        | <--- B
|                |
*----------------*
(A ∩ B is inside A, and A ∩ B is inside B)

(e)
*----*
|    |
| .  |
*----*

We use a dot, which has zero area, to represent ${\displaystyle \varnothing }$. Then, we can see that the intersection of the region and the dot is the dot.

${\displaystyle \Box }$

Proposition. (Distributive law) Let ${\displaystyle A}$,${\displaystyle B}$ and ${\displaystyle C}$ be sets. Then, the following statements hold.

(a) ${\displaystyle A\cap ({\color {darkgreen}B}\cup {\color {maroon}C})=(A\cap {\color {darkgreen}B})\cup (A\cap {\color {maroon}C})}$;
(b) ${\displaystyle A\cup ({\color {darkgreen}B}\cap {\color {maroon}C})=(A\cup {\color {darkgreen}B})\cap (A\cup {\color {maroon}C})}$.

Proof.

(a)
*----------*
|          | <--- A
|          |
|     *----*----*
|     |    |    | <---- C
*-----*----*----*------------*
|/////|////|    |            | <--- B
|/////*----*----*            |
|//////////|                 |
*----------*-----------------*

*----*
|////| : AnB
*----*

*----------*
|          | <--- A
|          |
|     *----*----*
|     |////|    | <---- C
*-----*----*----*------------*
|     |////|    |            | <--- B
|     *----*----*            |
|          |                 |
*----------*-----------------*

*----*
|////| : AnC
*----*

*----------*
|          | <--- A
|          |
|     *----*----*
|     |////|    | <---- C
*-----*----*----*------------*
|/////|////|    |            | <--- B
|/////*----*----*            |
|//////////|                 |
*----------*-----------------*

*----*
|////| : (AnB)u(AnC)
*----*

*----------*
|          | <--- A
|          |
|     *----*----*
|     |////|////| <---- C
*-----*----*----*------------*
|/////|////|////|////////////| <--- B
|/////*----*----*////////////|
|//////////|/////////////////|
*----------*-----------------*

*----*
|////| : BuC
*----*

*----------*
|          | <--- A
|          |
|     *----*----*
|     |////|    | <---- C
*-----*----*----*------------*
|/////|////|    |            | <--- B
|/////*----*----*            |
|//////////|                 |
*----------*-----------------*

*----*
|////| : An(BuC)
*----*

(b)
*----------*
|//////////| <--- A
|//////////|
|/////*----*----*
|/////|////|    | <---- C
*-----*----*----*------------*
|/////|////|////|////////////| <--- B
|/////*----*----*////////////|
|//////////|/////////////////|
*----------*-----------------*

*----*
|////| : AuB
*----*

*----------*
|//////////| <--- A
|//////////|
|/////*----*----*
|/////|////|////| <---- C
*-----*----*----*------------*
|/////|////|////|            | <--- B
|/////*----*----*            |
|//////////|                 |
*----------*-----------------*

*----*
|////| : AuC
*----*

*----------*
|//////////| <--- A
|//////////|
|/////*----*----*
|/////|////|    | <---- C
*-----*----*----*------------*
|/////|////|////|            | <--- B
|/////*----*----*            |
|//////////|                 |
*----------*-----------------*

*----*
|////| : (AuB)n(AuC)
*----*

*----------*
|          | <--- A
|          |
|     *----*----*
|     |    |    | <---- C
*-----*----*----*------------*
|     |////|////|            | <--- B
|     *----*----*            |
|          |                 |
*----------*-----------------*

*----*
|////| : B∩C
*----*

*----------*
|//////////| <--- A
|//////////|
|/////*----*----*
|/////|////|    | <---- C
*-----*----*----*------------*
|/////|////|////|            | <--- B
|/////*----*----*            |
|//////////|                 |
*----------*-----------------*

*----*
|////| : Au(B∩C)
*----*


${\displaystyle \Box }$

Definition. (Relative complement)

Relative complement of ${\displaystyle A}$ (left) in ${\displaystyle B}$ (right).

Relative complement of set ${\displaystyle A}$ in set ${\displaystyle B}$, denoted by ${\displaystyle B\setminus A}$, is the set ${\displaystyle \{x:x\in B{\text{ and }}x\notin A\}}$.

Remark.

• If ${\displaystyle U}$ is the universal set and ${\displaystyle A}$ is a subset of ${\displaystyle U}$, then ${\displaystyle A^{c}=U\setminus A}$.
• ${\displaystyle B\setminus A}$ is read 'B minus A'.

Example.

• ${\displaystyle \{1,2,3\}\setminus \{1,2\}=\{3\}}$;
• ${\displaystyle \{1,2,3\}\setminus \{1,2,3\}=\varnothing }$;
• ${\displaystyle \{1,2,3\}\setminus \{4,5,6\}=\{1,2,3\}}$.

Proposition. (Properties of relative complement) Let ${\displaystyle A}$ and ${\displaystyle B}$ be sets. Then, the following statements hold.

(a) ${\displaystyle A\setminus A=\varnothing }$;
(b) ${\displaystyle A\setminus \varnothing =A}$;
(c) ${\displaystyle \varnothing \setminus A=\varnothing }$;
(d) ${\displaystyle A\setminus B=\varnothing }$ if and only if ${\displaystyle A\subseteq B}$;
(e) ${\displaystyle (A\setminus B)\cap (B\setminus A)=\varnothing }$;
(f) ${\displaystyle A\cap (B\setminus A)=\varnothing }$.

Proof.

• ${\displaystyle A\setminus B}$ can be viewed as 'removing' the region of ${\displaystyle B}$ from the region of ${\displaystyle A}$.
(a)
*--*
|A |  removing the whole region <=> empty region left
*--*

(b)
*--*
|A |  removing empty region <=> whole region left
*--*

(c)
.  removing anything from an empty region <=> still an empty region

(d)
*----*
|    | <-- B
*--* |
|A | |    removing B from A becomes empty region <=> region B is not smaller than A
*--*-*

(e)
*-----*
| A\B |
*-----*-----*
|     | B\A | <--- B
*-----*-----*
^
|
A
(A\B and B\A are always disjoint)

(f)
*-----*
|     |
*-----*-----*
|     | B\A | <--- B
*-----*-----*
^
|
A

(A and B\A are always disjoint)


${\displaystyle \Box }$

Theorem. (De Morgan's laws) Let ${\displaystyle B,A_{1},A_{2},\dotsc }$ be sets. Then,

${\displaystyle B\setminus (A_{1}\cup A_{2}\cup \dotsb )=(B\setminus A_{1})\cap (B\setminus A_{2})\cap \dotsb {\text{ and }}B\setminus (A_{1}\cap A_{2}\cap \dotsb )=(B\setminus A_{1})\cup (B\setminus A_{2})\cup \dotsb }$

Proof.

(only 3 sets involved)
*-------------------------------*
|                    IV         |
|   *---------*                 |
|   |    I    | <--- A_1        | <--- B
|   *---------*-------*         |
|   |    II   | III   |<--- A_2 |
|   *---------*-------*         |
*-------------------------------*
B\(A_1uA_2)=IV
(B\A_1)=III u IV \
----> intersection: IV
(B\A_2)=I u IV   /

B\(A_1nA_2)=I u III u IV
(B\A_1)=III u IV \
----> union: I u III u IV
(B\A_2)=I u IV   /


${\displaystyle \Box }$

Remark.

• If ${\displaystyle B=U}$, then the equations become ${\displaystyle (A_{1}\cup A_{2}\cup \dotsb )^{c}=A_{1}^{c}\cap A_{2}^{c}\cap \dotsb {\text{ and }}(A_{1}\cap A_{2}\cap \dotsb )^{c}=A_{1}^{c}\cup A_{2}^{c}\cup \dotsb }$.

Example. Let ${\displaystyle A=\{1,2,3\},B=\{1,3\},C=\{1,2,3,4\}}$ and ${\displaystyle U=\{1,2,3,4,5\}}$. Then,

• ${\displaystyle A\setminus (\underbrace {B\cup C} _{=C})=\varnothing =(A\setminus B)\cap (\underbrace {A\setminus C} _{=\varnothing })}$;
• ${\displaystyle \overbrace {C\setminus (\underbrace {A\cap B} _{=\{1,3\}})} ^{\{2,4\}}=(\underbrace {C\setminus A} _{=\{4\}})\cup (\underbrace {C\setminus B} _{=\{2,4\}})}$;
*--------------------------*
|   *-------------------*  |
| 5 |///////////////////|  |
|   |/4/*------------*//|  |
|   |///|//////2/////|//|  |
|   |///|/*-------*//|//|  |
|   |///|/| 1    3|//|//|  |
|   |///|/|  AnB  |//|//| <-------- C
|   |///|/*-------*//|//|  |
|   |///|////////////|//|  |
|   |///*------------*//|  |
|   |///////////////////|  |
|   *-------------------*  |
*--------------------------*

*---*
|///| : C\(AnB)
*---*

*--------------------------*
|   *-------------------*  |
| 5 |\.\.\.\.\.\.\.\.\.\|  |
|   |\4\*------------*\.|  |
|   |\.\|.A\B..2.....|\.|  |
|   |\.\|.*-------*..|\.|  |
|   |\.\|.| 1    3|..|\.|  |
|   |\.\|.|   B   |..|\.| <-------- C
|   |\.\|.*-------*..|\.|  |
|   |\.\|............|\.|  |
|   |\.\*------------*\.|  |
|   |\.\\.\.\.\.\.\.\.\.|  |
|   *-------------------*  |
*--------------------------*

*---*
|...| : C\B
*---*
*---*
|\\\| : C\A
*---*

• ${\displaystyle \overbrace {(\underbrace {A\cup B\cup C} _{=C})^{c}} ^{=\{5\}}=\underbrace {A^{c}} _{=\{4,5\}}\cap \underbrace {B^{c}} _{=\{2,4,5\}}\cap \underbrace {C^{c}} _{=\{5\}}}$.

Definition. (Power set) Power set ${\displaystyle {\mathcal {P}}(A)}$ (or ${\displaystyle 2^{A}}$) of set ${\displaystyle A}$ is the set of all subsets of ${\displaystyle A}$, i.e., ${\displaystyle \{S:S\subseteq A\}}$.

Example.

• ${\displaystyle {\mathcal {P}}(\{1,2\})=\{\varnothing ,\{1\},\{2\},\{1,2\}\}}$;
• ${\displaystyle {\mathcal {P}}(\varnothing )=\{\varnothing \}}$ (power set of an empty set is not an empty set).

Remark.

• Power set of a set containing ${\displaystyle n}$ elements contains ${\displaystyle 2^{n}}$ elements.

Definition. (${\displaystyle n}$-ary Cartesian product) The ${\displaystyle n}$-ary Cartesian product over ${\displaystyle n}$ sets ${\displaystyle S_{1},\dotsc ,S_{n}}$, denoted by ${\displaystyle S_{1}\times \dotsb \times S_{n}}$, is

${\displaystyle {\big \{}s_{1},\dotsc ,s_{n}:s_{i}\in S_{i}{\text{ for each }}i\in \{1,\dotsc ,n\}{\big \}}.}$

Example. Let ${\displaystyle A=\{1,2,3\},B=\{2,3,4\}}$ and ${\displaystyle C=\{3,4,5\}}$. Then,

• ${\displaystyle A\times A=\{(1,1),(2,2),(3,3)\}}$;
• ${\displaystyle A\times B=\{(1,2),(2,3),(3,4)\}}$;
• ${\displaystyle A\times B\times C=\{(1,2,3),(2,3,4),(3,4,5)\}}$.

Combinatorics

What is combinatorics?

Combinatorics involves the counting and enumeration of elements of sets and similar structures such as sequences and multisets. We have discussed set theory in the chapter about set theory, and we will briefly discuss what is sequence and multiset.

Roughly speaking, a sequence is like a set, but ordering of elements matters, and a multiset is also like a set, but repetition of an element is allowed.

Sequence corresponds to the discussion about ordered selection without replacement, while multiset corresponds to the discussion about unordered selection with replacement.

Fundamental counting principles

Theorem.

Venn diagram illustrating inclusion-exclusion principle when ${\displaystyle n=3}$ (area of each (intersection of) set can be interpreted as its cardinality).

(Inclusion-exclusion principle) For each finite set ${\displaystyle S_{1},S_{2},\dotsc ,S_{n}}$,

{\displaystyle {\begin{aligned}\#(S_{1}\cup \dotsb \cup S_{n})&=\#(S_{1})+\dotsb +\#(S_{n})\\&\;-{\big (}\#(S_{1}\cap S_{2})+\#(S_{1}\cap S_{3})+\dotsb +\#(S_{n-1}\cap S_{n}){\big )}\\&\;+{\big (}\#(S_{1}\cap S_{2}\cap S_{3})+\#(S_{1}\cap S_{2}\cap S_{4})+\dotsb +\#(S_{n-2}\cap S_{n-1}\cap S_{n}){\big )}\\&\;-\dotsb \\&\;+(-1)^{n+1}\#(S_{1}\cap \dotsb \cap S_{n}).\end{aligned}}}

Proof. Idea:

• To find the cardinality of the union of ${\displaystyle n}$ sets:
1. Include the cardinalities of each of the ${\displaystyle n}$ sets.
2. Exclude the cardinalities of the pairwise intersections (if needed).
3. Include the cardinalities of the triple-wise intersections (if needed).
4. Exclude the cardinalities of the quadruple-wise intersections (if needed).
5. Include the cardinalities of the quintuple-wise intersections (if needed).
6. Continue, until the cardinality of the ${\displaystyle n}$-tuple-wise intersection is included (if ${\displaystyle n}$ is odd) or excluded (if ${\displaystyle n}$ is even).

${\displaystyle \Box }$

Remark.

• The formula can be written more compactly as
${\displaystyle \sum _{j=1}^{n}(-1)^{j+1}\underbrace {\sum _{i_{1}
• The definition of ${\displaystyle {\binom {n}{j}}}$ will be discussed later in this chapter.
• The formula is usually used for the case ${\displaystyle n=2}$ and ${\displaystyle n=3}$.
• When ${\displaystyle n=2}$, the formula becomes ${\displaystyle \#(S_{1}\cup S_{2})=\#(S_{1})+\#(S_{2})-\#(S_{1}\cap S_{2})}$.
• When ${\displaystyle n=3}$, the formula becomes ${\displaystyle \#(S_{1}\cup S_{2}\cup S_{3})=\#(S_{1})+\#(S_{2})+\#(S_{3})-\#(S_{1}\cap S_{2})-\#(S_{1}\cap S_{3})-\#(S_{2}\cap S_{3})+\#(S_{1}\cap S_{2}\cap S_{3})}$.
• The name 'inclusion-exclusion principle' comes from the idea that the principle is based on over-generous inclusion, and then followed by compensating exclusion.

Example. Among 140 people, 110 of them speak at least one of English, French and German. Given that

• 90, 30, 42 of them speak English, French, German respectively;
• 23 speak English and French;
• 25 speak English and German;
• 16 speak French and German.

Then, the no. of people that speak English, French and German is ${\displaystyle 110-90-30-42+23+25+16=12}$.

Proof. Let ${\displaystyle E}$, ${\displaystyle F}$, ${\displaystyle G}$ be the set containing people speaking English, French and German respectively. Then, by inclusion-exclusion principle,

${\displaystyle \#(E\cup F\cup G)=\#(E)+\#(F)\#(G)-\#(E\cap F)-\#(F\cap G)-\#(E\cap G)+\#(E\cap F\cap G)\Rightarrow 110=90+30+42-23-29-16+\#(E\cap F\cap G),}$
and the result follows.

Venn diagram

*----------------*
|90-13-12-11=54  | <---- E
|      *---------*--------------*
|      |25-12=13 |42-13-12-4=13 | <--- G
*------*---------*--------------*-----*
|      |   12    |16-12=4       |     |
|      *---------*--------------*     | <--- F
|  23-12=11      |  30-11-12-4=3      |
*----------------*--------------------*
140-110=30


${\displaystyle \Box }$

Exercise.

1 Calculate the no. of people that speak (a) English only; (b) French only; (c) German only.

 (a) 90; (b) 30; (c) 42 (a) 54; (b) 13; (c) 3 (a) 54; (b) 11; (c) 3 (a) 54; (b) 3; (c) 11 None of the above.

2 Suppose ${\displaystyle k}$ people among the 140 people now learn to speak English. Calculate ${\displaystyle k}$ such that 123 of them speak at least one of English, French and German, and 20 people speak English, French and German now.

 20 21 22 23 None of the above.

3 Continue from previous question. Calculate the no. of people who speak English and French now.

 23 31 36 44 None of the above.

Theorem. (Multiplication counting principle) If trial ${\displaystyle 1,\dotsc ,k}$ has ${\displaystyle n_{1},\dots ,n_{k}}$ possible outcomes respectively, then the ${\displaystyle k}$ trials have ${\displaystyle n_{1}\times \dotsb \times n_{k}}$ possible outcomes.

Proof. First, consider the case for ${\displaystyle k=2}$: we can enumerate each possible outcomes using ordered pair, as follows:

${\displaystyle {\begin{array}{ccccc}{\text{trial 1}}\backslash {\text{trial 2}}&{\text{1st outcome}}&{\text{2nd outcome}}&\dotsb &n_{2}{\text{th outcome}}\\\hline {\text{1st outcome}}&(1,1)&(1,2)&\dotsb &(1,n_{2})\\{\text{2nd outcome}}&(2,1)&(2,2)&\dotsb &(2,n_{2})\\\vdots &\vdots &\vdots &\vdots &\vdots \\n_{1}{\text{th outcome}}&(n_{1},1)&(n_{1},2)&\dotsb &(n_{1},n_{2})\\\end{array}}}$
Then, we can count that there are ${\displaystyle n_{1}n_{2}}$ possible outcomes (by considering the rows (or columns) one by one).

After establishing the case for ${\displaystyle k=2}$, we can establish the case for positive integer ${\displaystyle k\geq 3}$ inductively, e.g.:

${\displaystyle {\begin{array}{ccccc}{\text{trial 1 }}\&{\text{ trial 2}}\backslash {\text{trial 3}}&{\text{1st outcome}}&{\text{2nd outcome}}&\dotsb &n_{3}{\text{th outcome}}\\\hline {\text{1st outcome}}&(1,1,1)&(1,1,2)&\dotsb &(1,1,n_{3})\\{\text{2nd outcome}}&(1,2,1)&(1,2,2)&\dotsb &(1,2,n_{3})\\\vdots &\vdots &\vdots &\vdots &\vdots \\n_{1}n_{2}{\text{th outcome}}&(n_{1},n_{2},1)&(n_{1},n_{2},2)&\dotsb &(n_{1},n_{2},n_{3})\\\end{array}}}$
then we can count that there are ${\displaystyle n_{1}n_{2}n_{3}}$ outcomes (by considering the rows (or columns) one by one), and we can prove the remaining cases inductively.

${\displaystyle \Box }$

Remark.

• It is also known as rule of product.

Example.

Figure 3. In set theory, this is called the Cartesian product
of two sets, with cardinality, ${\displaystyle {\mathcal {N}}=3\times 2}$]]

The tree diagram of Figure 3 illustrates this for ${\displaystyle k=2}$, ${\displaystyle n_{1}=3}$, and ${\displaystyle n_{2}=2}$. The number of possible outcomes is ${\displaystyle {\mathcal {N}}=n_{1}\times n_{2}=6.}$

Exercise.

1 Determine the number of possible outcomes if ${\displaystyle n_{1}=2}$ and ${\displaystyle n_{2}=3}$ instead.

 1 2 3 6 12

2 Suppose ${\displaystyle k=3}$ now. Given that the number of possible outcomes is still 6 without changing other conditions given in the example, calculate ${\displaystyle n_{3}}$.

 1 2 3 6 12

Remark.

• this might be visualized by imagining a flip of three-sided die (with three outcomes, e.g. 1,2,3), followed by a flip of a two-sided coin (with two outcomes, e.g. A,B).

Counting the number of elements in a power set

Example. (Number of elements in a power set) The number of elements in a power set of set ${\displaystyle S}$ with ${\displaystyle n}$ elements is ${\displaystyle 2^{n}}$.

Proof. Consider the ${\displaystyle n}$ elements in ${\displaystyle S}$ one by one. For each of them, we can either include or do not include it in a subset of ${\displaystyle S}$. Then, there are ${\displaystyle n}$ steps involved to construct a subset of ${\displaystyle S}$, and each step has two outcomes. It follows from the multiplication counting principle that the ${\displaystyle n}$ steps have ${\displaystyle \underbrace {2\times 2\times \dotsb \times 2} _{n{\text{ times}}}=2^{n}}$ outcomes. That is, there are ${\displaystyle 2^{n}}$ possible (distinct) subsets of ${\displaystyle S}$. Since power set contains all subsets of ${\displaystyle S}$ by definition, it follows that the power set has ${\displaystyle 2^{n}}$ elements.

${\displaystyle \Box }$

Exercise.

Determine the number of elements in ${\displaystyle {\mathcal {P}}(\varnothing )}$ (i.e. power set of empty set).

 0 1 2 4 It is undefined.

Remark.

• ${\displaystyle n}$ is arbitrary nonnegative integer

The counting principle misused

Figure 4. The counting principle is not useful when the number of choices at each step is not unique.
Figure 5. The counting principle only applies if the outcomes are defined such that ${\displaystyle HT\neq TH}$ (i.e. order matters).

Figures 4 and 5 illustrate the fact that the counting principle is not always useful. Figure 4 calculates the ways the three integers can be added to five, if the integers are restricted to the set ${\displaystyle \{1,2,3\}}$. Since these three integers are choices (decisions), it is convenient to label the choice indices with capital letters: ${\displaystyle T_{A},T_{B},T_{C}.}$

E.g., ${\displaystyle T_{B}=3}$ means the second choice is the integer 3.

We cannot apply the counting principle to figure 4 because ${\displaystyle n_{B}}$ depends on ${\displaystyle T_{A}.}$ In our case, ${\displaystyle T_{A}=1\Rightarrow n_{B}=3,}$ and ${\displaystyle T_{A}=2\Rightarrow n_{B}=2,}$ and ${\displaystyle T_{A}=3\Rightarrow n_{B}=1.}$ This leads us to an important caveat about using the counting principle:

• the counting principle cannot be used if the number of outcomes at each step cannot be uniquely defined

Figure 5 exams two flips of a coin. It calculates the correct number of outcomes to be, ${\displaystyle {\mathcal {N}}=2\times 2}$${\displaystyle =4,}$ but only if we carefully define the outcome. The counting principle is valid only if heads followed by a tails (HT) is a different outcome than tails followed by heads (TH). In other words:

• when counting outcomes it is important to understand the role that order (enumeration) plays in defining outcomes

But, if we instead are counting the outcomes in a fashion such that HT and TH are considered to be the same, then a formula such as ${\displaystyle {\mathcal {N}}=n_{A}\times n_{B}}$ cannot be used:

• the counting principle does not hold if two different decision paths lead to the same final outcome (in the theorem, we say 'trial ${\displaystyle 1,\dotsc ,k}$, which implicitly assumes that the order matters in the outcomes for the ${\displaystyle k}$ trials)

Example. Suppose we throw two six-faced dice, with colors red and blue respectively. The number of possible distinct pairs of number facing up is ${\displaystyle 6\cdot 6=36}$.

Proof. Since the dice are distinguishable, we can use multiplication principle of counting. To be more precise, we can let the possible numbers facing up of red dice to be the possible outcomes in 'trial 1', and that of blue dice to be the possible outcomes in 'trial 2'. Since each trial has six outcomes, it follows that the number of outcomes (i.e. possible distinct pairs) is ${\displaystyle 6\cdot 6=36}$.

${\displaystyle \Box }$

Exercise.

Suppose the red dice becomes a blue dice, such that the two dices are not distinguishable anymore. Calculate the number of possible distinct pairs of number facing up.

 6 15 18 21 36

Number of ways to select some objects from distinguishable objects

In this section, we will discuss number (no.) of ways to select some objects from distinguishable objects, in four types, classified by whether the selection is ordered, and whether the selection is with replacement.

Before discussing these four types of selection, we will introduce some preliminary mathematical concepts used in the following.

Preliminary mathematical concepts

Definition. (Factorial) For each nonnegative integer ${\displaystyle n}$, the factorial of ${\displaystyle n}$, denoted by ${\displaystyle n!}$, is

${\displaystyle n!={\begin{cases}1,&n=0;\\n(n-1)\dotsb (1),&n\geq 1.\end{cases}}}$

More generally, we have gamma function.

Definition. (Gamma function) The gamma function is

${\displaystyle \Gamma (x)=\int _{0}^{\infty }y^{x-1}e^{-y}\,dy,}$
in which ${\displaystyle x>0}$.

Proposition. (Relationship between gamma function and factorial) For each nonnegative integer ${\displaystyle n}$, ${\displaystyle \Gamma (n+1)=n!}$.

Proof. Using integration by parts,

{\displaystyle {\begin{aligned}\Gamma (x+1)&=\int _{0}^{\infty }y^{x}e^{-y}\,dy\\&=\int _{0}^{\infty }y^{x}\,d(-e^{-y})\\&=\left[-y^{x}e^{-y}\right]_{y=0}^{y=\infty }+\int _{0}^{\infty }e^{-y}\,d(y^{x})\\&=\left[-y^{x}e^{-y}\right]_{y=0}^{y=\infty }+x\underbrace {\int _{0}^{\infty }y^{x-1}e^{-y}\,dy} _{\Gamma (x)}\\&=0-0+x\Gamma (x)\qquad {\text{since }}e^{-y}\to 0{\text{ as }}y\to \infty \\&=x\Gamma (x).\end{aligned}}}
Since
${\displaystyle \Gamma (1)=\int _{0}^{\infty }y^{1-1}e^{-y}\,dy=\left[-e^{-y}\right]_{0}^{\infty }=0-(-1)=1,}$
${\displaystyle \Gamma (n+1)=n\Gamma (n)=\dotsb =n(n-1)\cdots (1)\Gamma (1)=n(n-1)\cdots (1)=n!}$
for each nonnegative integer ${\displaystyle n}$.

${\displaystyle \Box }$

Remark.

• The infinity in the proof can be regarded as extended real number, or be in limit sense.
• Another more general result shown in the proof is that ${\displaystyle \Gamma (x+1)=x\Gamma (x)}$ for each positive ${\displaystyle x}$.

Definition. (Binomial coefficient) The binomial coefficient, indexed by nonegative integers ${\displaystyle n}$ and ${\displaystyle r}$ such that ${\displaystyle n\geq r}$. denoted by ${\displaystyle {\binom {n}{r}}}$, is

${\displaystyle {\binom {n}{r}}={\frac {n!}{r!(n-r)!}}.}$

Theorem. (Binomial series theorem) For each real number ${\displaystyle \alpha }$,

${\displaystyle (1+x)^{\alpha }=1+\alpha x+{\frac {\alpha (\alpha -1)}{2!}}x^{2}+{\frac {\alpha (\alpha -1)(\alpha -2)}{3!}}x^{3}+\dotsb ,}$
in which ${\displaystyle |x|<1}$.

Remark. The following are some special cases of this theorem:

• ${\displaystyle (1+x)^{-1}=1-x+x^{2}-x^{3}+\dotsb }$;
• ${\displaystyle (1-x)^{-1}=1+x+x^{2}+x^{3}+\dotsb }$;
• ${\displaystyle (1+x)^{-n}=\sum _{r=0}^{\infty }{\binom {n+r-1}{r}}x^{r}}$ (negative binomial series);
• ${\displaystyle (1+x)^{n}=1+{\binom {n}{1}}x+{\binom {n}{2}}x^{2}+\dotsb +{\binom {n}{n}}x^{n}+\underbrace {{\frac {n(n-1)\dotsb (\overbrace {\color {darkgreen}n-n)} ^{=0}}{n!}}+\dotsb } _{=0}=\sum _{r=0}^{n}{\binom {n}{r}}x^{r}}$ (binomial series).

Theorem. (Binomial theorem) For each nonegative integer ${\displaystyle n}$,

${\displaystyle (x+y)^{n}={\binom {n}{\color {darkgreen}0}}x^{n}y^{\color {darkgreen}0}+{\binom {n}{\color {darkgreen}1}}x^{n-{\color {darkgreen}1}}y^{\color {darkgreen}1}+\dotsb +{\binom {n}{\color {darkgreen}n}}x^{n-{\color {darkgreen}n}}y^{\color {darkgreen}n}.}$

Proof. It can be proved combinatorially or inductively. Complete proof is omitted.

${\displaystyle \Box }$

The binomial theorem can be illustrated by Pascal's triangle:

${\displaystyle {\begin{array}{lc}(a+b)^{0}=&{\color {Red}{\boldsymbol {1}}}\\(a+b)^{1}=&{\color {Red}{\boldsymbol {1}}}a+{\color {Red}{\boldsymbol {1}}}b\\(a+b)^{2}=&{\color {Red}{\boldsymbol {1}}}a^{2}+{\color {Red}{\boldsymbol {2}}}ab+{\color {Red}{\boldsymbol {1}}}b^{2}\\(a+b)^{3}=&{\color {Red}{\boldsymbol {1}}}a^{3}+{\color {Red}{\boldsymbol {3}}}a^{2}b+{\color {Red}{\boldsymbol {3}}}ab^{2}+{\color {Red}{\boldsymbol {1}}}b^{3}\\(a+b)^{4}=&{\color {Red}{\boldsymbol {1}}}a^{4}+{\color {Red}{\boldsymbol {4}}}a^{3}b+{\color {Red}{\boldsymbol {6}}}a^{2}b^{2}+{\color {Red}{\boldsymbol {4}}}ab^{3}+{\color {Red}{\boldsymbol {1}}}b^{4}\\\end{array}}}$ ${\displaystyle {\begin{array}{c}{\color {Red}{\boldsymbol {1}}}={\binom {0}{0}}\\{\color {Red}{\boldsymbol {1}}}={\binom {1}{0}}\quad \quad {\color {Red}{\boldsymbol {1}}}={\binom {1}{1}}\\{\color {Red}{\boldsymbol {1}}}={\binom {2}{0}}\quad \quad {\color {Red}{\boldsymbol {2}}}={\binom {2}{1}}\quad \quad {\color {Red}{\boldsymbol {1}}}={\binom {2}{2}}\\{\color {Red}{\boldsymbol {1}}}={\binom {3}{0}}\quad \quad {\color {Red}{\boldsymbol {3}}}={\binom {3}{1}}\quad \quad {\color {Red}{\boldsymbol {3}}}={\binom {3}{2}}\quad \quad \quad \quad {\color {Red}{\boldsymbol {1}}}={\binom {3}{3}}\\{\color {Red}{\boldsymbol {1}}}={\binom {4}{0}}\quad \quad {\color {Red}{\boldsymbol {4}}}={\binom {4}{1}}\quad \quad {\color {Red}{\boldsymbol {6}}}={\binom {4}{2}}\quad \quad {\color {Red}{\boldsymbol {4}}}={\binom {4}{3}}\quad \quad {\color {Red}{\boldsymbol {1}}}={\binom {4}{4}}\\\end{array}}}$

Ordered selection without replacement

Theorem. The no. of ways for ordered selection of ${\displaystyle r\leq n}$ objects from ${\displaystyle n}$ distinguishable objects without replacement is ${\displaystyle n!/(n-r)!}$

Proof. Consider an equivalent situation: selecting ${\displaystyle r\leq n}$ objects from ${\displaystyle n}$ distinguishable objects to be put into ${\displaystyle r}$ ordered boxes, labelled box ${\displaystyle 1,2,\dotsc ,r}$, in which each box contains at most one object. By considering the ${\displaystyle r}$ boxes from box 1 to box ${\displaystyle r}$ ,

• for box 1, there are ${\displaystyle n}$ choices of object to be put into it
• for box 2, there are ${\displaystyle n-1}$ choices of object to be put into it, since the object put into box 1 cannot be simultaneously put into box 2
• ...
• for box ${\displaystyle r}$, there are ${\displaystyle n-(r-1)=n-r+1}$ choices of object to be put into it, since each of the ${\displaystyle r-1}$ objects put into box ${\displaystyle 1,2,\dotsc ,r-1}$ cannot be simultaneously put into box ${\displaystyle r}$

Thus, by multiplication principle of counting, the desired no. of ways is

${\displaystyle \underbrace {n} _{\text{box 1}}\times \underbrace {(n-1)} _{\text{box 2}}\times \dotsb \times \underbrace {(n-r+1)} _{{\text{box }}r}={\frac {n\times (n-1)\times \dotsb \times (n-r+1)\times {\color {darkgreen}(n-r)\times \cdots \times 1}}{\color {darkgreen}(n-r)\times \cdots \times 1}}={\frac {n!}{(n-r)!}}}$

${\displaystyle \Box }$

Remark.

• ${\displaystyle n!/(n-r)!}$ is often denoted by ${\displaystyle _{n}P_{r}}$ (read n p r).

Example. The no. of distinct ways to select 3 objects to be put into 3 boxes, labelled ${\displaystyle B_{1},B_{2}}$ and ${\displaystyle B_{3}}$ from 5 objects, labelled ${\displaystyle O_{1},O_{2},O_{3},O_{4}}$ and ${\displaystyle O_{5}}$ is ${\displaystyle {\frac {5!}{(5-3)!}}=\underbrace {5} _{B_{1}}\times \underbrace {4} _{B_{2}}\times \underbrace {3} _{B_{3}}=60.}$

Exercise.

1 After putting the 3 objects into the 3 boxes, 2 of them are taken out and put into 2 boxes, labelled ${\displaystyle B_{4}}$ and ${\displaystyle B_{5}}$. Calculate the no. of distinct ways to do this.

 3 6 60 180 360

2 Continue from previous question. Suppose only 1 of them is taken out and put into 1 box, labelled ${\displaystyle B_{4}}$, now. Calculate the no. of distinct ways to do this.

 3 6 60 180 360

Example. (Competition) There are ${\displaystyle 16}$ candidates for a competition. The no. of ways to award winner, 1st and 2nd runners-up is ${\displaystyle {\frac {16!}{(16-3)!}}=\underbrace {16} _{\text{winner}}\times \underbrace {15} _{\text{1st runner-up}}\times \underbrace {14} _{\text{2nd runner-up}}=3360.}$

If, Amy and Bob are among the ${\displaystyle 16}$ candidates, and it is given that Amy is awarded 1st runner-up, while Bob does not receive any award, the no. of ways to award winner, 1st and 2nd runners-up becomes ${\displaystyle {\frac {14!}{(14-2)!}}\times 1=\underbrace {14} _{\text{winner}}\times \underbrace {1} _{\text{1st runner-up: Amy}}\times \underbrace {13} _{\text{2nd runner-up}}=182}$. In particular, Amy and Bob cannot be awarded winner or 2nd runner-up.

Exercise.

1 Suppose Chris is also among the ${\displaystyle 16}$ candidates. Given that Amy, Bob and Chris receive an award from the competition, calculate the no. of ways to award winner, 1st and 2nd-runners up.

 1 3 6 32 96

2 Continue from previous question. Given that Amy, Bob and Chris do not receive any award from the competition, calculate the no. of ways to award winner, 1st and 2nd-runners up.

 1716 2496 3354 3357 3359

A special case of ordered selection without replacement is when the no. of selected objects equals the no. of objects to be selected. In this case, this selection is called permutation, and the no. of ways for permutation of ${\displaystyle n}$ objects (i.e. ordered selection of ${\displaystyle n}$ objects from ${\displaystyle n}$ objects) is ${\displaystyle n!/(n-n)!=n!}$.

Example.

Figure 6. 3!=3·2·1= 6 permutations of {1,2,3}.

The 6 ways to permute the string 123 are shown in Figure 6.

Unordered selection of distinguishable objects without replacement

Theorem. The no. of ways for unordered selection to select ${\displaystyle r\leq n}$ objects from ${\displaystyle n}$ distinguishable objects without replacement is ${\displaystyle {\frac {n!}{r!(n-r)!}}={\binom {n}{r}}}$.

Proof. There are two ways to prove this.

First, consider an equivalent situation: selecting ${\displaystyle r\leq n}$ objects from ${\displaystyle n}$ distinguishable objects without replacement to be put into one box [8]. Then, we consider the no. of ways to do this in order, and then remove some ways that are regarded to be the same for unordered selection (i.e. regarded as the same when we put the objects into one box). The no. of ways to do this in order is ${\displaystyle \underbrace {n} _{\text{choice 1}}\times \underbrace {n-1} _{\text{choice 2}}\times \dotsb \times \underbrace {n-r+1} _{{\text{choice }}r}={\frac {n!}{(n-r)!}}}$ (choice ${\displaystyle k}$ means the ${\displaystyle k}$th selection of objects to be put into the box)

Among these ways, putting the same ${\displaystyle r}$ objects into the box in different orders counts as different ways, and we need to merge them together, into one way. To merge them, we consider how many different ways are counted for putting the same ${\displaystyle r}$ objects into the box in different orders. Indeed, this is permutation (ordered selection of ${\displaystyle r}$ objects from ${\displaystyle r}$ distinguishable objects), so the no. of different ways is ${\displaystyle r!}$. So, we count ${\displaystyle r!}$ extra times of no. of ways (i.e. scale up the no. of ways by a factor ${\displaystyle r!}$) , and thus we need to scale down the no. of ways, by dividing the no. by ${\displaystyle r!}$. Thus, the desired no. of ways is ${\displaystyle {\frac {n!}{r!(n-r)!}}={\binom {n}{r}}}$.

Second, we use the notion of generating function, by encoding the selection process into a binomial series, and then use the coefficients to determine the desired no. of ways. To be more precise, recall a special case of binomial series theorem:

${\displaystyle \underbrace {(1+x)(1+x)\dotsb (1+x)} _{n{\text{ times}}}(1+x)^{n}=1+\cdots +{\binom {n}{r}}x^{r}+\cdots +{\binom {n}{n}}x^{n}.}$
By encoding each selection to each of ${\displaystyle (1+x)}$, through treating the ${\displaystyle x^{0}=1}$ and ${\displaystyle x^{1}=x}$ in the ${\displaystyle k}$th ${\displaystyle (1+x)}$ as not selecting the object and selecting the object respectively, the coefficient of ${\displaystyle x^{r}}$ is the desired no. of ways, since it is the no. of ways to build ${\displaystyle x^{r}}$, through selecting ${\displaystyle x}$ in ${\displaystyle r}$ ${\displaystyle (1+x)}$'s, and selecting 1 in other ${\displaystyle (1+x)}$'s (i.e. selecting ${\displaystyle r}$ objects, regardless of the order). Thus, the desired no. of ways is ${\displaystyle {\binom {n}{r}}}$.

${\displaystyle \Box }$

Remark.

• The unordered selection without replacement is also known as combination.
• ${\displaystyle {\binom {n}{r}}}$ is read as 'n choose r', or 'n c r'.

Example.

Figure 8: ${\displaystyle C(n,k)}$ is shown for ${\displaystyle n=3,}$ and ${\displaystyle k=0,1,2,3.}$ The dotted ellipses remind us that these are sets, where order is not important.

For combination, the order in which the items are selected are not important, so each selection from a set can be regarded as a subset of the original set. Figure 8 illustrates for the set ${\displaystyle \{1,2,3\}.}$ The number of elements in this set is ${\displaystyle n=3.}$ From our earlier discussion of the power set, we know that the total number of subsets is ${\displaystyle 2^{3}=8}$. All 8 subsets are shown in the figure, organized by how many items are in each subset (for example, the subset in the upper-left corner contains 3 elements, while all subsets with 2 elements occupy the lower-right corner.) Let ${\displaystyle k}$ denote the number of elements "chosen" to be in each of the 8 subsets of set ${\displaystyle S=\{1,2,3\}}$ (where the number of elements in ${\displaystyle S}$ is, ${\displaystyle n=3}$.)

• ${\displaystyle {\binom {3}{0}}=1}$ set has ${\displaystyle k=0}$ elements. It is the empty set: ${\displaystyle \varnothing }$.
• ${\displaystyle {\binom {3}{1}}=3}$ sets have ${\displaystyle k=1}$ element. They are ${\displaystyle \{1\}}$,${\displaystyle \{2\}}$,and ${\displaystyle \{3\}}$.
• ${\displaystyle {\binom {3}{2}}=3}$ sets have ${\displaystyle k=2}$ elements. They are ${\displaystyle \{2,3\}}$,${\displaystyle \{1,3\}}$,and ${\displaystyle \{1,2\}}$.
• ${\displaystyle {\binom {3}{3}}=1}$ set has ${\displaystyle k=3}$ elements. It is the set itself: ${\displaystyle \{1,2,3\}}$.

Example.

Figure 9: combinations. ${\displaystyle {\textit {4-choose-2}}}$ ${\displaystyle =6}$

No. of ways to select 2 objects from 4 distinguishable objects without considering the order is ${\displaystyle {\binom {4}{2}}=6}$

Example. (Competition) There are 16 candidates for a competition. The no. of ways to select 3 candidates to enter final is ${\displaystyle {\binom {16}{3}}={\frac {16\times 15\times 14}{3\times 2\times 1}}=560.}$

Exercise.

1 Amy, Bob and Chris are among the ${\displaystyle 16}$ candidates. Calculate the no. of ways to select them to enter final.

 1 3 6 32 96

2 Continue from the previous question. Calculate the no. of ways to select candidates other than Amy, Bob and Chris to enter final.

 220 286 554 557 559

Special cases worth remembering

The formula for counting combinations has special cases that are worth remembering:

• ${\displaystyle {\binom {n}{0}}={\binom {n}{n}}=1}$ (There is only one way to pick no thing and only one way to pick all ${\displaystyle n}$ things.)
• ${\displaystyle {\binom {n}{1}}={\binom {n}{n-1}}=n}$ (there are n ways to pick one thing or to leave one thing out)
• ${\displaystyle {\binom {n}{k}}={\binom {n}{n-k}}}$ (There are the same number of ways of picking ${\displaystyle k}$ of ${\displaystyle n}$ things as there are of leaving out ${\displaystyle k}$ of ${\displaystyle n}$ things)

Ordered selection of distinguishable objects with replacement

Theorem. The no. of ways for selecting ${\displaystyle r}$ objects from ${\displaystyle n}$ distinguishable objects in order, with replacement is ${\displaystyle n^{r}}$.

Proof. Consider the equivalent situation: selecting ${\displaystyle r}$ objects from ${\displaystyle n}$ types of objects, in which each type of the objects has unlimited stock, to be put into ${\displaystyle r}$ ordered boxes (the same object may be selected more than once). Then, the no. of ways is ${\displaystyle \underbrace {n} _{\text{box 1}}\times \underbrace {n} _{\text{box 2}}\times \dotsb \times \underbrace {n} _{{\text{box }}r}=n^{r}}$, since for each box, there are ${\displaystyle n}$ types of objects that can be selected to be put into it.

${\displaystyle \Box }$

Remark.

• ${\displaystyle r}$ can be greater than ${\displaystyle n}$.

Example. (Setting password) The number of ways to set a password with 6 characters, with the following rules:

(R1) numbers are allowed
(R2) alphabets are allowed, and they are case-sensitive [9]
(R3) special characters (i.e. all characters other than numbers and alphabets) are not allowed

is ${\displaystyle \underbrace {62} _{\text{1st position}}\times \underbrace {62} _{\text{2nd position}}\times \dotsb \times \underbrace {62} _{\text{6th position}}=62^{6}=56800235584.}$

Proof. For each of the 6 positions available for the password, there are ${\displaystyle \underbrace {2} _{\text{cases}}\times \underbrace {26} _{\text{alphabets}}+\underbrace {10} _{\text{numbers}}=62}$ choices of characters. Also, the characters can be repeated in more than one positions, and order matters. So, this is a case of ordered selection of distinguishable objects with replacement. Thus, the desired number is ${\displaystyle \underbrace {62} _{\text{1st position}}\times \underbrace {62} _{\text{2nd position}}\times \dotsb \times \underbrace {62} _{\text{6th position}}=62^{6}=56800235584.}$

${\displaystyle \Box }$

Exercise.

1 Suppose a machine can have ${\displaystyle 3.5\times 10^{11}}$ password guesses per second. Approximate the maximum time needed for the machine to guess the six-character password correctly using the formula

${\displaystyle {\text{the maximum time needed}}\approx {\frac {\text{number of ways to set the password}}{\text{guessing speed}}}.}$
(correct to two decimal places)

 0.00 seconds 0.15 seconds 0.16 seconds 6.16 seconds 369.72 seconds

2 Suppose the password is safe if the maximum time needed for the machine to guess the password correctly, using the same formula in the previous question, is greater than or equal to 100 years (i.e. ${\displaystyle 100\times 365.25\times 24\times 3600=3155760000}$ seconds). The minimum number of characters needed for the password (with the same rules) to be safe is

 not greater than 10. greater than 10 but not greater than 15. greater than 10 but not greater than 20. greater than 20 but not greater than 25. greater than 25.

Unordered selection of distinguishable objects with replacement

This type of selection is probably the most complicated.

Theorem. The number of ways for unordered selection of ${\displaystyle r}$ objects from ${\displaystyle n}$ distinguishable objects with replacement is ${\displaystyle {\binom {n+r-1}{r}}={\frac {(n+r-1)!}{r!(n-1)!}}}$.

Proof. There are two ways to prove this.

First, consider an equivalent situation: selecting ${\displaystyle r}$ objects from ${\displaystyle n}$ types of objects, in which each type of the objects has unlimited stock, to be put into one box (the same object may be selected more than once). Then, we use the stars and bars notation: e.g.

*|**|*|...|*||**|*


in which ${\displaystyle k}$th gap created by the bars corresponds to the ${\displaystyle k}$th type of object (the leftmost gap made by one bar is the 1st gap, the rightmost gap made by one bar is the last gap), and the number of * in each gaps represents the number of objects selected for the corresponding type of objects. E.g., 2 * in 2nd gap represents the 2 objects are selected from the 2nd type of objects. Then, the desired no. of ways is the no. of arrangements of ${\displaystyle r}$ * and ${\displaystyle n-1}$ bars [10], which is the no. of ways to select ${\displaystyle r}$ from ${\displaystyle n-1+r}$ positions for * [11] (order does not matter), calculated by ${\displaystyle {\binom {n+r-1}{r}}}$.

Second, we use the notion of generating function, by encoding the selection as follows:

• encoding the selection of each type of objects to ${\displaystyle (1+x+x^{2}+\dotsb )}$, by treating ${\displaystyle x^{0}=1}$, ${\displaystyle x^{1}=1}$, ${\displaystyle x^{2}}$, etc. (up to ${\displaystyle x^{r})}$ in the ${\displaystyle k}$ th ${\displaystyle (1+x+x^{2}+\cdots )}$ as selecting 0, 1, 2, etc. (up to ${\displaystyle r}$) objects from the ${\displaystyle k}$ th type respectively

Then, the desired no. is the coefficient of ${\displaystyle x^{r}}$ in

${\displaystyle \underbrace {(1+x+x^{2}+\dotsb )(1+x+x^{2}+\cdots )\cdots (1+x+x^{2}+\cdots )} _{n{\text{ times}}}=\underbrace {(1-x)^{-1}(1-x)^{-1}\dotsb (1-x)^{-1}} _{n{\text{ times}}}=(1-x)^{-n}.}$
[12] By binomial series theorem, the coefficient of ${\displaystyle x^{r}}$ is
${\displaystyle {\frac {(-n)(-n-1)\dotsb (-n-r+1)}{r!}}(-1)^{r}={\frac {(-1)^{r}(n)(n+1)\dotsb (n+r-1)}{r!}}(-1)^{r}={\frac {(n+r-1)\dotsb (n+1)(n)}{r!}}={\binom {n+r-1}{r}}.}$

${\displaystyle \Box }$

Remark.

• ${\displaystyle r}$ can be greater than ${\displaystyle n}$.
• ${\displaystyle {\binom {n+r-1}{r}}}$ is often denoted by ${\displaystyle _{n}H_{r}}$ (read 'n h r')

Example. There are 8 distinct food or drink items, namely hamburger, egg, fries, cake, apple pie, apple juice, orange juice and coke. The number of distinct 4-item combos that must consist of distinct items (unordered selection without replacement) is ${\displaystyle {\binom {8}{4}}=70}$, and that without restrictions (particularly, may consist of more than one same item) (unordered selection with replacement) is ${\displaystyle {\binom {8+4-1}{4}}=330}$.

Exercise.

1 Calculate the no. of distinct 4-item combos that must consist of 3 food items (which may be the same) and 1 drink item.

 12 35 38 105 330

2 Calculate the no. of distinct 4-item combos that contain no drinks.

 5 70 126 280 330

3 Suppose each food or drink item only has 2 left in the stock. Calculate the no. of distinct 4-item combos without restrictions. (Hint: ${\displaystyle (1+x+x^{2})^{4}=1+4x+10x^{2}+16x^{3}+19x^{4}+{\text{terms with order higher than }}x^{4}}$)

 19 45 183 266 330

4 Suppose each food costs $10, while each drink costs$5. Calculate the no. of distinct \$20 combos without restrictions. (Hint: calculator should be used to ease the calculation)

 15 30 60 121 225

5 Amy loves eating hamburger very much, so she must choose two hamburgers when she chooses the items for the 4-item combos. Calculate the no. of distinct ways for Amy to order a 4-item combo without restriction.

 6 15 28 36 210

Example. (Number of integer solutions of a equation) The number of solutions to

${\displaystyle x_{1}+x_{2}+\dotsb +x_{7}=10}$
in which ${\displaystyle x_{1},\dotsc ,x_{7}}$ are nonegative integers, is ${\displaystyle {\binom {7+10-1}{10}}=8008}$.

Proof. Consider the following stars and bars graph:

|**|*|**|*|**|**


in which the no. of stars is 10, corresponding to the number at RHS of the equation, and no. of gaps created by the bars is 7, corresponding to the number of unknowns at LHS of the equation. The no. of stars in each gap represents the (nonnegative) number assigned to that unknown. So, the number of solutions is the no. of arrangements of these stars and bars, namely ${\displaystyle {\binom {7+10-1}{10}}=8008.}$

Alternatively, we can interpret there are 10 (no. at RHS) balls selected from 7 (no. of unknowns at LHS) types of balls, labelled ${\displaystyle x_{1},\dotsc ,x_{7}}$, with unlimited stock, to be put into a box, in which the number of balls labelled ${\displaystyle x_{1},\dotsc ,x_{7}}$ in the box represents the number assigned for the unknowns ${\displaystyle x_{1},\dotsc ,x_{7}}$ respectively. Then, the no. of solutions is the no. of ways to do this, namely ${\displaystyle {\binom {7+10-1}{10}}}$.

${\displaystyle \Box }$

Exercise.

1 Calculate the number of solutions if ${\displaystyle x_{1},\dotsc ,x_{7}}$ are positive integers instead. (Hint: letting ${\displaystyle x_{1}'=x_{1}+1}$, then ${\displaystyle x_{1}'}$ is a nonnegative integer)

 28 56 84 560 5005

2 Calculate the number of solutions if the '${\displaystyle =}$' sign is changed to '${\displaystyle <}$' sign, i.e. the number of solutions to ${\displaystyle x_{1}+\dotsb +x_{7}<10}$ in which ${\displaystyle x_{1},\dotsc ,x_{7}}$ are nonnegative integers. (Hint: add one more positive integer unknown to LHS, so that the '${\displaystyle <}$' sign becomes '${\displaystyle =}$' sign)

 3003 5005 6435 11440 19448

Summary

Selecting ${\displaystyle r}$[13] objects from ${\displaystyle n}$ distinguishable objects
with replacement without replacement
ordered
${\displaystyle n^{r}}$
${\displaystyle {\frac {n!}{(n-r)!}}}$
unordered
${\displaystyle {\binom {n+r-1}{r}}}$
${\displaystyle {\binom {n}{r}}}$

Exercise. Try to prove each of the above formulas, without looking the previous subsections. After that, you can compare your proofs against the proofs in the previous subsections.

Partitions

Theorem. The number of ways to partition ${\displaystyle n}$ distinguishable objects into ${\displaystyle k}$ groups with group ${\displaystyle 1,\dotsc ,k}$ containing exactly ${\displaystyle n_{1},\dotsc ,n_{k}}$ objects respectively (order does not matter) is ${\displaystyle {\frac {n!}{n_{1}!n_{2}!\dotsb n_{k}!}}}$.

Proof. There are two ways to prove this.

First, consider an equivalent situation: putting ${\displaystyle n_{1},\dotsc ,n_{k}}$ objects selected from ${\displaystyle n}$ distinguishable objects into box ${\displaystyle 1,\dotsc ,k}$ respectively.

Then, consider the boxes one by one:

• box 1: ${\displaystyle n_{1}}$ objects selected from ${\displaystyle n}$ distinguishable objects to be put into it, so no. of ways is ${\displaystyle {\binom {n}{n_{1}}}}$
• box 2: ${\displaystyle n_{2}}$ objects selected from ${\displaystyle n-n_{1}}$ distinguishable objects [14], so no. of ways is ${\displaystyle {\binom {n-n_{1}}{n_{2}}}}$
• ...
• box ${\displaystyle k}$: ${\displaystyle n_{k}}$ objects selected from ${\displaystyle n-n_{1}-\dotsb -n_{k-1}}$ [15] to be put into it, so no. of ways is ${\displaystyle {\binom {n-n_{1}-\cdots -n_{k-1}}{n_{k}}}}$

By multiplication principle of counting, the no. of ways for the whole process is

{\displaystyle {\begin{aligned}{\binom {n}{n_{1}}}{\binom {n-n_{1}}{n_{2}}}\dotsb {\binom {n-n_{1}-\cdots -n_{k-1}}{n_{1}}}&={\frac {n!}{n_{1}!{\color {darkgreen}{\cancel {(n-n_{1})!}}}}}\cdot {\frac {\color {darkgreen}{\cancel {(n-n_{1})!}}}{n_{2}!{\color {blue}{\cancel {(n-n_{1}-n_{2})!}}}}}\cdot {\frac {\color {blue}{\cancel {(n-n_{1}-n_{2})!}}}{n_{3}!{\color {red}{\cancel {(n-n_{1}-n_{2}-n_{3})!}}}}}\cdot {\frac {\color {purple}{\cancel {(n-n_{1}-n_{2}-\dotsb -n_{k-1})!}}}{n_{k}!(n-\underbrace {n_{1}-n_{2}-n_{3}-\cdots -n_{k}} _{=n})!}}\\&={\frac {n!}{n_{1}!n_{2}!n_{3}!\dotsb n_{k}!\underbrace {0!} _{=1}}}\\&={\frac {n!}{n_{1}!n_{2}!\dotsb n_{k}!}}.\end{aligned}}}

Second, we use the notion of generating function, by encoding the partition process as follows:

• in the ${\displaystyle m}$th ${\displaystyle (x_{1}+x_{2}+\dotsb +x_{k})}$, ${\displaystyle x_{1},\dotsc ,x_{k}}$ represents the ${\displaystyle m}$th object is put into box ${\displaystyle 1,\dotsc ,k}$ respectively

Then, the desired no. of ways is the coefficient of ${\displaystyle x_{1}^{n_{1}}\dotsb x_{k}^{n_{k}}}$ in ${\displaystyle (x_{1}+x_{2}+\cdots +x_{k})(x_{1}+x_{2}+\cdots +x_{k})\cdots }$, which is ${\displaystyle {\frac {n!}{n_{1}!n_{2}!\dotsb n_{k}!}}}$, by multinomial theorem (generalized version of binomial theorem) [16].

${\displaystyle \Box }$

Remark.

• partitioning objects into two groups is the same as unordered selection without replacement [17]
• ${\displaystyle {\frac {n!}{n_{1}!n_{2}!\dotsb n_{k}!}}}$ is called the multinomial coefficient, and is denoted by ${\displaystyle {\binom {n}{n_{1},n_{2},\dotsc ,n_{k}}}}$

Example. (Sequence of dice outcomes) A six-faced dice is rolled nine times. The number of distinct sequences in which 1,3 and 5 each comes up three times is${\displaystyle {\frac {9!}{3!3!3!}}=1680}$.

Proof. Consider this situation as the partition of the nine (ordered) outcomes from the die to three groups, which represents 1,3 and 5 comes up in that outcome respectively. The three groups contains 3 outcomes each, so that each odd number comes up three times. It follows that the number of ways to partition the outcomes is ${\displaystyle {\frac {9!}{3!3!3!}}}$.

For each partition of outcomes into different groups, we obtain a unique sequence of outcomes. [18]

${\displaystyle \Box }$

Exercise.

1 Calculate the number of distinct sequences in which 2,4 and 6 each comes up three times instead.

 840 1680 3360 6720 361200

2 Suppose we throw the die 12 times instead. Calculate the number of distinct sequences in which each number comes up two times.

 2520 5040 113400 369600 7484400

Example. (Arrangement of letters) The number of letter arrangements of the word PROBABILITY is ${\displaystyle {\binom {11}{2,2,1,1,1,1,1,1,1}}=9979200}$.

Proof. The word PROBABILITY has 2 letter B's and 2 letter I's. For other letters, they appear only once. So, we partition the 11 letter positions in the word into 9 groups, representing letter P,R,O,B,A,I,L,T and Y, respectively, and the group representing letter B and I contain 2 letter positions each, and other groups contain 1 letter position each.

${\displaystyle \Box }$

Exercise.

1 Calculate the number of letter arrangements of the word EXERCISE.

 28 56 3360 6720 20160

2 Calculate number of number arrangements of the number 171237615, such that the number formed is a odd number.

 6720 11760 13440 23520 161280

Example. (Walking path) Consider the following diagram.

${\displaystyle {\begin{array}{c|c|c|c|c}A&&&&B\\\hline &&E&&\\\hline C&&&&D\\\end{array}}}$
Suppose we are initially located at ${\displaystyle A}$, and that we can only walk either one cell rightward or one cell downward for each step. The number of distinct sequence of steps such that we can walk from ${\displaystyle A}$ to ${\displaystyle D}$ is ${\displaystyle \underbrace {6} _{\text{steps}}!/(\underbrace {2} _{\downarrow }!\underbrace {4} _{\rightarrow }!)=15.}$

Proof. First, observe that we need 6 and only 6 steps to walk from ${\displaystyle A}$ to ${\displaystyle D}$ [19], consisting 4 steps of walking rightward (${\displaystyle \rightarrow }$