# Introduction

## Overview

Probability theory provides a mathematical model for the study of randomness and uncertainty. Many important decisions, whether from business, government, science, recreation or even one's personal life must be made with incomplete information or some degree of uncertainty. Hence, a formalized study of uncertain or random outcomes occupies an important role in modern society. In situations where one of any number of possible outcomes may occur, the mathematical model of probability theory offers methods for quantifying the likelihoods associated with those outcomes. Probability also provides tools which allow us to move beyond simply describing the information contained within a set of data (descriptive statistics) to actually inferring further information from that data (inferential statistics). Many of the early attempts to model likelihood arose from games of chance. For a brief history of probability see this Wikipedia article.

Although probability theory is now a very formal branch of mathematics, the language of probability is often used informally in everyday speech. We express our beliefs about likelihoods of outcomes in situations involving uncertainty using intuition guided by our experiences and in some cases statistics. Consider the following examples:

• Bill says "Don't buy the avocados here; about half the time, they're rotten". Bill is expressing his belief about the probability of an event — that an avocado will be rotten — based on his personal experience.
• Lisa says "I am 95% certain the capital of Spain is Barcelona". Here, the belief Lisa is expressing is only a probability from her point of view, because only she does not know that the capital of Spain is Madrid (from our point of view, the probability is 100%). However, we can still view this as a subjective probability because it expresses a measure of uncertainty. It is as though Lisa is saying "in 95% of cases where I feel as sure as I do about this, I turn out to be right".
• Susan says "There is a lower chance of being shot in Omaha than in Detroit". Susan is expressing a belief based (presumably) on statistics.
• Dr. Smith says to Christina, "There is a 75% chance that you will live." Dr. Smith is basing this off of his research.
• Nicolas says "It will probably rain tomorrow." In this case the likelihood that it will rain is expressed in vague terms and is subjective, but implies that the speaker believes it is greater than ${\frac {1}{2}}$  (or 50%). Subjective probabilities have been extensively studied, especially with regards to gambling and securities markets. While this type of probability is important, it is not the subject of this book. A good reference is "Degrees of Belief" By Steven Vick (2002).

Notice that in the previous examples the likelihood of any particular outcome is expressed as a percentage (between 0% and 100%), as is common in everyday language. However, probabilities in formal probability theory are always expressed as real numbers in the interval $[0,1]$  (e.g. a probability of .25 may be expressed as 25%, or a probability of ${\frac {1}{\pi }}$  may be expressed as approximately 31.83%). Other differences exist between common expressions of probabilities and formal probability theory. For example, a probability of 0% is typically taken to mean that the event to which that probability is assigned is impossible. However, in probability theory (usually in cases where there are infinitely many possible outcomes) an event ascribed a probability of zero may actually occur. In some situations, it is certain that such an event will occur (e.g. in selecting a real number between 0 and 1, the probability of selecting any given number is zero, but it is certain that one such number will be selected).

Another way to express the probability of an outcome is by its odds: the ratio of the probability of "success" (event occurs) to the probability of "failure" (event does not occur). In gambling odds are expressed as the ratio of the stakes risked by each participant in a wager. For instance: a bookmaker offering odds of 3 to 1 "against" a horse will pay a punter three times their stake (if the horse wins). In fact, the bookmaker (ignoring factors such as his potential need to "lay off" bets which are exposing him to the possibility of an unacceptable overall loss) is announcing that he thinks the horse has a ${\frac {1}{4}}$  chance of winning. If we express odds as "chance of winning": "chance of not winning", then 3 to 1 against would be represented as $1:3={\frac {1}{4}}:{\frac {3}{4}}$  or ${\frac {1}{3}}$  . So an event with a probability of ${\frac {1}{4}}$  or 25% has odds of 33%. This disparity is even more clear where an event has a probability of 50% (e.g., the odds of a coin showing heads is 50%:50% = 1:1 or ${\frac {1}{2}}$ ).

## Types of probability

As mentioned earlier, probability can be expressed informally in a variety of different ways, but even formal definitions and approaches vary. The most general and rigorous approach is known as axiomatic probability theory, which will be the focus of later chapters. Here we briefly discuss a few other approaches, their uses and limitations. All of these approaches rely in one way or another on the concept of an experiment. Recall that probability provides means to study randomness and uncertainty.

An experiment is any action or process whose outcome is subject to uncertainty or randomness.

Here the term experiment is used in a wider sense than its usual connotation with controlled laboratory situations. Further clarification on experiments will be given later, but for now the following examples of experiments will suffice:

• observing whether or not a commercial product is defective.
• tossing a coin one or more times or selecting a card from a card deck.
• conducting a survey.
• measuring the wind speed or rainfall in a particular area.

Assuming that an experiment can be repeated under identical conditions, then each repetition of an experiment is called a trial.

### Basic Concepts

There are two standard approaches to conceptually interpreting probabilities: the relative frequency approach and the subjective belief (or confidence approach). In the Frequency Theory of Probability, probability is the limit of the relative frequency with which certain outcomes occur in repeated trials (note that the outcome of any single trial cannot depend on the outcome of other trials). The relative frequency approach requires that experiments be random and that all possible outcomes be known before execution of the experiment. The probability of any set of outcomes is expressed as the relative frequency with which those outcomes will occur among many repeated trials.

Physical probabilities fall within the category of objective or frequency probabilities, and are associated with random physical systems such as roulette wheels, rolling dice and radioactive atoms. In such systems, a given outcome (such as a die yielding a six) tends to occur at a persistent rate, or 'relative frequency', in a long run of trials. Physical probabilities either explain, or are invoked to explain these stable frequencies.

Relative frequency probabilities are always expressed as a figure between 0% (the outcome essentially never happens) and 100% (the outcome essentially always happens), or similarly as a figure between 0 and 1. According to the Frequency Theory of Probability, saying that "the probability that A occurs is p%" means that if you repeat the experiment many times under essentially identical conditions, the percentage of time for which A occurs will converge to p. For example, a 50% chance that a coin lands "heads up" means that if you toss the coin over and over again, then the ratio of times the coin lands heads to the total number of tosses approaches a limiting value of 50% as the number of tosses grows. Notice that the outcome of one toss never depends on another toss, and that the ratio of heads to total number of tosses is always between 0% and 100%.

In the Subjective Theory of Probability, probability measures the speaker's "degree of belief" that a set of outcomes will result, on a scale of 0% (complete disbelief that the event will happen) to 100% (certainty that the event will happen). According to the Subjective Theory, saying that "the probability that A occurs is ${\frac {2}{3}}$  " means that I believe that A will happen twice as strongly as I believe that A will not happen. The Subjective Theory is particularly useful in assigning meaning to the probability of outcomes that in principle can occur only once. For example, how might one assign meaning to the following statement: "there is a 25% chance of an earthquake on the San Andreas fault with magnitude 8 or larger before 2050"? It would be very hard to qualify this measure in terms of relative frequency.

One way to represent an individual's degree of belief in a statement, given available evidence, is with the Bayesian approach. Evidential probability, also called Bayesian probability, can be assigned to any statement whatsoever, even when no random process is involved. On most accounts evidential probabilities are considered degrees of belief, defined in terms of dispositions to gamble at certain odds. The primary evidential interpretations include the classical interpretation, the subjective interpretation, the epistemic or inductive interpretation, and the logical interpretation.

The next several sections discuss the principal theories within the relative frequency approach to probability.

### Classical theory of probability

The classical approach to probability expresses probability as a ratio of the number of favorable outcomes in a series of successive trials to the number of total possible outcomes. Note the immediate implication that the number of total possible outcomes be known. Furthermore, all possible outcomes are assumed to be equally probably and no two possible outcomes can both result from the same trial. Here, the term "favorable" is not subjective, but rather indicates that an outcome belongs to a group of outcomes of interest. This group of outcomes is called an event, which will be formalized with the introduction of axiomatic probability theory.

 Classical definition of probability If the number of outcomes belonging to an event $E$ is $N_{E}$ , and the total number of outcomes is $N$ , then the probability of event $E$ is defined as $p_{E}={\frac {N_{E}}{N}}$ .

For example, a standard deck of cards (without jokers) has 52 cards. If we randomly draw a card from the deck, we can think of each card as a possible outcome. Therefore, there are 52 total outcomes. We can now look at various events and calculate their probabilities:

• Out of the 52 cards, there are 13 clubs. Therefore, if the event of interest is drawing a club, there are 13 favorable outcomes, and the probability of this event is ${\frac {13}{52}}={\frac {1}{4}}$  .
• There are 4 kings (one of each suit). The probability of drawing a king is ${\frac {4}{52}}={\frac {1}{13}}$  .
• What is the probability of drawing a king OR a club? This example is slightly more complicated. We cannot simply add together the number of outcomes for each event separately ($4+13=17$ ) as this inadvertently counts one of the outcomes twice (the king of clubs). The correct answer is ${\frac {16}{52}}$  from ${\frac {13}{52}}+{\frac {4}{52}}-{\frac {1}{52}}$  where this is essentially $p({\text{club}})+p({\text{king}})-p({\text{king of clubs}})$  .

Classical probability suffers from a serious limitation. The definition of probability implicitly defines all outcomes to be equiprobable. While this might be useful for drawing cards, rolling dice, or pulling balls from urns, it offers no method for dealing with outcomes with unequal probabilities.

This limitation can even lead to mistaken statements about probabilities. An often given example goes like this:

I could be hit by a meteor tomorrow. There are two possible outcomes: I will be hit, or I will not be hit. Therefore, the probability I will be hit by a meteor tomorrow is ${\frac {1}{2}}=50\%$  .

Of course, the problem here is not with the classical theory, merely the attempted application of the theory to a situation to which it is not well adapted.

This limitation does not, however, mean that the classical theory of probability is useless. At many points in the development of the axiomatic approach to probability, classical theory is an important guiding factor.

### Empirical or Statistical Probability or Frequency of occurrence

This approach to probability is well-suited to a wide range of scientific disciplines. It is based on the idea that the underlying probability of an event can be measured by repeated trials.

 Empirical or Statistical Probability as a measure of frequency Let $n_{A}$ be the number of times event $A$ occurs after $n$ trials. We define the probability of event $A$ as $p_{A}=\lim _{n\to \infty }{\frac {n_{A}}{n}}$ It is of course impossible to conduct an infinite number of trials. However, it usually suffices to conduct a large number of trials, where the standard of large depends on the probability being measured and how accurate a measurement we need.

A note on this definition of probability: How do we know the sequence ${\frac {n_{A}}{n}}$  in the limit will converge to the same result every time, or that it will converge at all? The unfortunate answer is that we don't. To see this, consider an experiment consisting of flipping a coin an infinite number of times. We are interested in the probability of heads coming up. Imagine the result is the following sequence:

HTHHTTHHHHTTTTHHHHHHHHTTTTTTTTHHHHHHHHHHHHHHHHTTTTTTTTTTTTTTTT...

with each run of $k$  heads and $k$  tails being followed by another run twice as long. For this example, the sequence ${\frac {n_{A}}{n}}$  oscillates between roughly ${\frac {1}{3}}$  and ${\frac {2}{3}}$  and doesn't converge.

We might expect such sequences to be unlikely, and we would be right. It will be shown later that the probability of such a run is 0, as is a sequence that converges to anything other than the underlying probability of the event. However, such examples make it clear that the limit in the definition above does not express convergence in the more familiar sense, but rather some kind of convergence in probability. The problem of formulating exactly what this means belongs to axiomatic probability theory.

### Axiomatic probability theory

Although axiomatic probability theory is often frightening to beginners, it is the most general approach to probability and has been employed in tackling some of the more difficult problems in probability. It begins with a set of axioms which, although not immediately intuitive, are guided by the more familiar classical probability theory. These axioms are discussed in the (as yet unwritten) following chapter.

## About This Book

This book is going to discuss the topic of mathematical probability using Calculus and Linear Algebra. Readers of this book should have a good understanding of both those topics before attempting to read and understand this book completely.

# The Counting Principle

## The Counting Principle

Before we can delve into the properties of probability and odds, we need to understand the Counting Principle. We use the Counting Principle to determine how many different ways one can choose/do certain events. It is easier to define the Principle through examples:

### Independent Events

Let's say that John is at a deli sub restaurant. There are 3 different breads, 3 different cheeses, 3 different condiments, and 3 different vegetables he can place on his sandwich, assuming that he can only place one from each category on to his sandwich. How many different ways can he set up his sandwich?

Since choosing a cheese doesn't affect the number of choices of vegetables, condiments, or bread, these events are called independent events. For this problem, we will multiply $3\times 3\times 3\times 3$  , so $3^{4}$  , which is 81. So there are 81 different possible combinations to form this sandwich.

#### Practice Problems

1) Thomas goes to a McDonalds' restaurant and chooses to create his own burger. He has 2 different kinds of cheese, 3 different breads, and 3 different sauces he can choose from, but he can only choose one of each category. How many different ways can he create this burger?

2) Diane is ordering pizza for her family. There are 4 different possible sizes of the pizza. Also, she has to choose one of 5 toppings to place on the pizza and one of 3 different types of cheese for the pizza. In addition, she must choose one of 3 different kinds of crust. How many different ways can she have her pizza?

3)

a) How many 3-digit numbers can be formed from the digits 2, 3, 4, 5, 7 and 9?
b) How many of these numbers are less than 400?

#### Answers

1) $(2)(3)(3)=18$

2) $(4)(5)(3)(3)=180$

3)

a) Since there are six available digits, the answer is $(6)(6)(6)=216$
b) For the value of the 3-digit number to be less than 400, we only have two choices for the first digit, namely 2 or 3. After that we can choose the other two digits freely. The answer is thus $(2)(6)(6)=72$ .

### Dependent Events

Assume that John is now working at a library. He must put 5 books on a shelf in any order. How many different ways can he order the books on the shelf? Unlike the independent events, when John puts a book on the shelf, that eliminates one book from the remaining choices of books to put next on the shelf; thus these are referred to as dependent events. At first, he has 5 different choices, so our first number in the multiplication problem will be 5. Now that one is missing, the number is reduced to 4. Next, it's down to 3, and so on. So, the problem will be

$(5)(4)(3)(2)(1)$

However, there is a symbol for this very idea. A $!$  represents the term factorial. So, for example, $3!=(3)(2)(1)$  . Factorials are very useful in statistics and probability.

Therefore, the problem can be rewritten as $5!$  , which ends up being equal to 120.

However, not all dependent event problems are that simple. Let's say that there are 10 dogs at a dog competition. How many different ways can one select the Champion AND the Runner-Up? This problem could actually be considered simpler than the last one, but it doesn't involve a factorial. So, how many different ways can the judge determine the Champion? Of course, there are 10 different dogs, so 10 different ways. Next, how many dogs are left to select the Runner-Up from? Well, you removed one, so it's down to 9. Instead of placing a factorial, though, you will only multiply 10 and 9, resulting in 90.

### Independent Or Dependent?

To help you differentiate between the two, we will do a few more examples, but we will have to decide if it is dependent or independent before solving the problem.

Let's say that you are creating a 5-digit garage door opener code (the digits would include 0-9). If there were absolutely no restrictions, would the events be independent of each other or dependent on each other? Of course, there are no restrictions, since you could have five 4's for the code, so to solve the problem, you multiply 10 by itself 5 times, resulting in 100000.

Alternatively, suppose that the first number of the code cannot be 0, and that there can be no repeated numbers whatsoever. Obviously these events are dependent on each other, because there cannot be any repetitions. Let's look at this problem one number at a time.

The first number can be all the numbers except 0, reducing the possible numbers to 9. The second number can be 0 this time, so the possible numbers returns to 10. However, it cannot be a repeat of the previous number, so there are 9 possible choices again. After that, the numbers will reduce by one each time, due to the fact that there cannot be repeats, so the problem would look like this

$(9)(9)(8)(7)(6)={\frac {(9)(9!)}{(5!)}}=27216$

Now, just one more example. Let's say that you were choosing your schedule for school. There are 8 periods each day, and there are 7 classes to choose from. Nonetheless, you must have a lunch period during the 4th period. We can think of 4th period as non existent because it is in a constant position and therefore does not affect the possible choices. With 7 slots and 7 options, the answer is simply $7!$  .

$(7!)=5040$

### Review Of The Counting Principle

So, we use the Counting Principle to determine the different unique ways we can do something, such as a sandwich or a selection of classes. Sometimes, these events will affect each other, such as when you can't choose the same number twice for your garage door code, so they are dependent events. However, other times, one event has no effect on the next event, such as when you have different cheeses and breads to choose for your sandwich, so they are independent events. The Counting Principle is a fundamental mathematical idea and an essential part of probability.

## Counting Rules

Rule 1: If any one of $k$  mutually exclusive and exhaustive events can occur on each of $n$  trials, there are $k^{n}$  different sequences that may result from a set of such trials.

• Example: Flip a coin three times, finding the number of possible sequences. $n=3\ ,\ k=2$  , therefore, $k^{n}=2^{3}=8$

Rule 2: If $k_{1},\dots ,k_{n}$  are the numbers of distinct events that can occur on trials $1,\dots ,n$  in a series, the number of different sequences of $n$  events that can occur is $k_{1}\times \cdots \times k_{n}$  .

• Example: Flip a coin and roll a die, finding the number of possible sequences. Therefore, $(k_{1})(k_{2})=(2)(6)=12$

Rule 3: The number of different ways that $n$  distinct things may be arranged in order is $n!=(1)(2)(3)\times \cdots \times (n-1)(n)$  , where $0!=1$  . An arrangement in order is called a permutation, so that the total number of permutations of $n$  objects is $n!$  (the symbol $n!$  Is called n-factorial).

• Example: Arrange 10 items in order, finding the number of possible ways. Therefore, $10!=10\cdot 9\cdot 8\cdot 7\cdot 6\cdot 5\cdot 4\cdot 3\cdot 2\cdot 1=3,628,800$

Rule 4: The number of ways of selecting and arranging $k$  objects from among $n$  distinct objects is: ${\frac {n!}{(n-k)!}}$  , or as seen on calculators, [nPr].

• Example: pick 3 things from 10 items, and arrange them in order. Therefore $n=10\ ,\ k=3$  , therefore ${\frac {10!}{(10-3)!}}={\frac {10!}{7!}}=720$

Rule 5: The total number of ways of selecting $k$  distinct combinations of $n$  objects, irrespective of order (ie order NOT important), is: ${\binom {n}{k}}={\frac {n!}{k!(n-k)!}}$  or as seen on calculators, [nCr].

• Example: Pick 3 items from 10 in any order, where $n=10\ ,\ k=3$  , Therefore, ${\binom {10}{3}}={\frac {10!}{3!(7!)}}={\frac {720}{6}}=120$  .

# Probability Spaces

## Concept

We will now proceed to develop a more axiomatic theory of probability, allowing for a simpler mathematical formalism. We shall proceed by developing the concept of a probability space, which will allow us to harness many theorems in mathematical analysis.

Recall that an experiment is any action or process with an outcome that is subject to uncertainty or randomness. A probability space or a probability triple is a mathematical construct that models an experiment and its set of possible outcomes.

## Probability Space

A probability space is a mathematical triplet $(\Omega ,{\mathcal {F}},P)$  consisting of the sample space $\Omega$ , a set of events ${\mathcal {F}}$ , and a probability function $P$  that presents a model for a particular class of real-world situations. A probability space is arbitrary, in that its author ultimately defines which elements $\Omega$ , ${\mathcal {F}}$ , and $P$  will contain.

### Sample Space

The sample space, $\Omega$ , is the non-empty set whose elements are all possible outcomes of an experiment. Without an assignment of a sample space and without knowing its size, no conclusions could be made about the probabilities of outcomes, or collections of outcomes. It is common to refer to a sample space by the labels $S$ , $\Omega$ , or $U$  (for "universal set"). An Outcome $\omega$  may be a state of nature, a possibility, an experimental result and the like. Any instance or execution of a real-world situation modeled by a probability space must produce exactly one outcome. If outcomes of different trials of an experiment differ in any way that matters, they are considered distinct outcomes. Which differences matter depends on the kind of analysis we wish to perform: This leads to different choices of a sample space. A common example consists of a random experiment involving a single coin toss. Here, it seems appropriate to define the sample space as the set $\Omega =\{{\text{H}},{\text{T}}\}$ .

### Events

Since individual outcomes might be of little practical use, more complex events are used to characterize groups of outcomes. An event is any subset of zero or more outcomes contained in a given sample space. A simple event consists of exactly one outcome and a compound event consists of more than one outcome. For example, when tossing a single coin $\Omega =\{{\text{H}},{\text{T}}\}$ , possible events are $\{\}$ , $\{H\}$ , $\{T\}$ , and $\{H,T\}$ . The collection of all such events is a σ-algebra ${\mathcal {F}}$ . Intuitively, the probability of each of these sets is the chance that one of the events in the set will happen; $P(\{{\text{H}}\})$  is the chance of tossing a head, $P(\{{\text{H}},{\text{T}}\})$  is the chance of the coin landing either heads or tails, and $P(\{\})$  is the probability of the coin landing neither heads nor tails, etc. An event is said to have happened or occurred during an experiment when the outcome of the experiment is an element of the event. Since the same outcome may be a member of many events, it is possible for many events to have happened given a single outcome. For example, when the trial consists of throwing two dice, the set of all outcomes with a sum of 7 pips may constitute an event, whereas outcomes with an odd number of pips may constitute another event. If the outcome is the element of the elementary event of two pips on the first die and five on the second, then both of the events, "7 pips" and "odd number of pips", are said to have occurred. Modeling events as sets of outcomes in a sample space $\Omega$  allows us to leverage all of the regular set operations:

Given two events $A$  and $B$ :

• The null subset $\emptyset$  in a sample space $\Omega$  is called an impossible event.
• The union of two events $A\cup B$  consists of all outcomes that are in $A$  or in $B$  or in both,
• The intersection $A\cap B$  consists of all outcomes that are both in $A$  and $B$ .
• The complement $A^{c}$  of an event $A$  in a sample space $\Omega$  consists of all outcomes not in $A$ , but in $\Omega$ , i.e. $A\cup A^{c}=\Omega$ .

### Probability Measure Function

Finally, there is a need to specify each event's likelihood of happening. This is done using the probability measure function, $P$  which maps events to probabilities. Recall that probability is expressed as a real number between zero (typically an impossible event, though zero-probability events are not necessarily impossible) and one (an event happens almost surely, with total certainty). Thus $P$  is a function $P:{\mathcal {F}}\rightarrow [0,1]$ .

Once the probability space is established, it is assumed that “nature” makes its move and selects a single outcome (also called a sample point), $\omega$ , from the sample space $\Omega$ . All the events in ${\mathcal {F}}$  that contain the selected outcome $\omega$  (recall that each event is a subset of $\Omega$ ) will have "happened" or "occurred". The selection performed by nature is done in such a way that if the experiment were to be repeated an infinite number of times, the relative frequencies of occurrence of each of the events would coincide with the probabilities prescribed by the function $P$ .

## Probability Definition

Having properly defined a probability space, we may now provide the following definitions for probability:

### Informal Definition of Probability

The probability of an event is a measure of the chance with which we can expect the event to occur. We assign a number between 0 and 1 inclusive to the probability of an event. A probability of 1 means that we are certain the event will occur, and a probability of 0 means that we certain the event will not occur. The probability of any event A in a sample space $\Omega$  is denoted by $P(A)$ .

### Classical Definition of Probability

If there are n equally likely outcomes for an experiment, of which one must occur, and m of these belong to an event of interest $A$ , then the probability of the event or a "success" is given by $P(A)$  = m/n.

#### Computing Probability for Classical Approach

1. When all outcomes are equally likely
1. Count the number of outcomes n in the sample space.
2. Count the number of outcomes m in the event of interest, $A$ .
3. $P(A)$  = m/n.
2. When all outcomes are not equally likely
1. Let $\omega _{1},\omega _{2},...,\omega _{n}$  be the outcomes of a sample space $\Omega$ . Let $P(\omega _{i})=p_{i},i=1,2,...,n$ . In this case, the probability of each outcome, $p_{i}$ , is assumed to be known.
2. List all the outcomes in $A$ , say, $\omega _{i},\omega _{j},...,\omega _{m}$ .
3. $P(A)=P(\omega _{i})+P(\omega _{j})+\dotsb +P(\omega _{m})=p_{i}+p_{j}+\dotsb +p_{m}$ , the sum of the probabilities of the outcomes in $A$ .

The classical concept of probability works well when all the outcomes of an experiment can be regarded as equally likely, like when we roll a die or toss a coin, but it falls short in scenarios where outcomes are not equally likely and their probabilities are not known beforehand, like when we are interested in the probability of a sports team winning a championship game. For these situations, the frequency interpretation of probability (developed as a result of the work by R. Von Mises in 1936) is fitting.

### Frequency Definition of Probability

The probability of an event or outcome is the proportion of times the event or outcome would occur in a long run of trials of repeated experiments.

For example, in tossing a coin, if $n(H)$  is the number of times heads appears in $n$  trials, then the probability of getting heads is $P(H)=lim_{n\rightarrow \infty }(n(H)/n)$ . This is just a natural extension of the classical approach of probability, but which works for both a non-biased (equally likely outcomes) and a biased (not equally likely outcomes) coin. This approach is quite useful, but requires that an experiment be able to be repeated many times under identical circumstances, which is not always possible. For a more complete picture, we define probabilities axiomatically (from the 1933 studies of A. N. Kolmogorov).

### Axiomatic Definition of Probability

Let $\Omega$  be a sample space of an experiment. Probability P() is a real-valued function that assigns to each event $E$  in $\Omega$  a number $P(E)$ , called the probability of $E$ , with the following conditions satisfied:

1. For every event $E\in {\mathcal {F}},P(E)\geq 0$
2. It is unity for any event. That is, $P(\Omega )=1$
3. It is additive over the union of an infinite number of pairwise disjoint events, that is, if $E_{i}\cap E_{j}=\emptyset$ , for $i\neq j$  in $\Omega$ , then $P\left(\cup _{i=1}^{\infty }E_{i}\right)=\Sigma _{i=1}^{\infty }P(E_{i})$

## Other Definitions and Properties

### Basic Properties of Probability

• $P(\emptyset )=0$ .
• The property of probability in the third axiom of probability is valid for a finite collection of events.
• For any event $A,P(A)=1-P(A^{c})$
• For any event $A,P(A)\leq 1$
• If $A\subseteq B$ , then $P(A)\leq P(B)$
• For any events $A$  and $B$ , $P(A\cup B)=P(A)+P(B)-P(A\cap B)$ .

### Mutually Exclusive Events

Two events $A$  and $B$  are said to be mutually exclusive or disjoint if $A\cap B=\emptyset$ . This means that mutually exclusive events cannot happen together. A collection of n events are mutually exclusive when the occurrence of any one event implies that the remaining n-1 events will not occur. Any two mutually exclusive events $A$  and $B$  have the following probability properties:

1. $P(A\cap B)=0$
2. $P(A\cap B)=P(A)+P(B)$ . Note that this is just a special case of the general property where $P(A\cup B)=P(A)+P(B)-P(A\cap B)$ .

### (Collectively) Exhaustive

A set of events is jointly or collectively exhaustive if at least one of the events must occur. For example, when rolling a six-sided die, the outcomes 1, 2, 3, 4, 5, and 6 are collectively exhaustive, because they encompass the entire range of possible outcomes.

Another way to describe collectively exhaustive events is that their union must cover the entire sample space, or $A\cup B=\Omega$ , where $\Omega$  is the sample space.

### Test of Independence

If for two events $A$  and $B$ , $p(A\cap B)$  is not equal to $p(A)p(B)$ , then $A$  and $B$  are said to be associated or dependent. If $p(A\cap B)>p(A)p(B)$ , so that $p(A|B)>p(A)$  and $p(B|A)>p(B)$ , then the events are said to be positively associated. If, however, $p(A\cap B) , so that $p(A|B)  and $p(B|A) , one says that the events are negatively associated.

### Simple Random Sampling

In a simple random sample, one person must take a random sample from a population, and not have any order in which one chooses the specific individual. In statistics, a simple random sample is a subset of individuals (a sample) chosen from a larger set (a population). Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals. Simple random sampling can be done with or without replacement, though it is typically done without, i.e., one deliberately avoids choosing any member of the population more than once. When the sample is drawn with replacement, the same individual can be chosen more than once. When the sample is drawn without replacement, the same individual can be chosen no more than once in a given sample. Therefore, random sampling of one individual at a time means that every possible individual in the large group has an equal probability of being drawn.

### Independence of random variables

If X is a real-valued random variable and a is a number then the event {X ≤ a} is the set of outcomes that correspond to X being less than or equal to a. Since these are sets of outcomes that have probabilities, it makes sense to refer to events of this sort being independent of other events of this sort.

Why are the p and q Bernoulli trial probabilities multiplied together in the binomial formula? The probability of an event can be expressed as a binomial probability if its outcomes can be broken down into two probabilities p and q, where p and q are complementary (i.e. p + q = 1). Binomial probability typically deals with the probability of several successive decisions, each of which has two possible outcomes. The binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when n = 1, the binomial distribution is a Bernoulli distribution. Bernoulli process is a discrete-time stochastic process consisting of a sequence of independent random variables taking values over two symbols. Prosaically, a Bernoulli process is coin flipping several times, possibly with an unfair coin. A variable in such a sequence may be called a Bernoulli variable. In other words, a Bernoulli process is a sequence of independent identically distributed Bernoulli trials. Independence of Bernoulli trials implies memorylessness property: past trials do not provide any information regarding future outcomes. From any given time, future trials is also a Bernoulli process independent of the past (fresh-start property). a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability distribution as the others and all are mutually independent. Two events A and B are independent if and only if Pr(A ∩ B) = Pr(A)Pr(B). Here A ∩ B is the intersection of A and B, that is, it is the event that both events A and B occur. This is called the multiplication rule for independent events.

### Further Concepts in Set Theory

Union and intersection: We can indicate the union and intersection of two sets with the symbols ∪ ∩, respectively. If two sets, then ∪ the set “”―the set that consists of all the elements that are either in set in set . Similarly, ∩ read “”, which is the set of all elements that are in both .

Set difference. We can “subtract” one set from another using the symbol “－”. If sets, then － the set consisting of all the elements that are members of are not members of . This is a little different from the subtraction you may be used to in arithmetic.

Whole numbers: 1. Non-negative integers (aka real numbers) – {0, 1, 2, 3...} 2. Positive integers – {1, 2, 3, 4....} 3. All possible integers – {-3, -2, -1, 0, 1, 2, 3...}

AND --> 2 or more things/events happen. Therefore, we MULTIPLY (product) probabilities together -- INTERSECTION OR --> 1 of the 2 events/things occur. Therefore we ADD (sum) probabilities together -- UNION

Events that are NOT mutually exclusive means we could draw them both (for example, we want to draw a 2 OR an 'M'...these events are NOT mutually exclusive, because they cannot occur together at the same time).

Events that are NOT mutually exclusive, but rather are independent, we must subtract: 1-p(A∩B) from p(A) + p(B). So that, p(A) + p(B) -1 p(A∩B).

Probabilities for independent events often involve exponents, and probabilities for dependent events (conditional probability) often involve factorials.

### Permutations and Combinations

Permutations
arrangement of objects without repetition where order is important. Permutations using all the objects: n objects, arranged into group size of n without repetition, and order being important – P(n, n) = N!
Example: Find all permutations of A, B, C.

Permutations of some of the objects: n objects, group size r, order is important. P(n, r) = N!/(n-r)! Example: Find all 2-letter combinations using letters A, B, C.

Distinguished permutations
if a word has n letters, k of which are unique, let n (n1, n2, n3....nk) be the frequency of each of the k letters. N!/(n1!)(n2!)(n3!)
Combinations
arrangement of objects without repetition, where order is NOT important. A combination of n objects, arranged in groups of size r, without repetition, and order being important. C(n, r) = N!/r!(n-r)!

Another way to write a combination of n things, r at a time, is using the Binomial notation (Binomial Distribution), sometimes described as "n choose r".

### Counting Rules

Rule 1
If any one of K mutually exclusive and exhaustive events can occur on each of N trials, there are KN different sequences that may result from a set of such trials
Example: Flip a coin three times, finding the number of possible sequences. N=3, K=2, therefore, KN =(2)(3)=6
Rule 2
If K1, K2, ....KN are the numbers of distinct events that can occur on trials 1,....N in a series, the number of different sequences of N events that can occur is (K1)(K2)...(KN)
Example: Flip a coin and roll a die, finding the number of possible sequences. Therefore, (K1)(K2) = (2)(6) = 12
Rule 3
The number of different ways that N distinct things may be arranged in order is N! = (1)(2)(3)....(N-1)(N), where 0! = 1. An arrangement in order is called a permutation, so that the total number of permutations of N objects is N! (the symbol N! Is called N-factorial)
Example: Arrange 10 items in order, finding the number of possible ways. Therefore, 10! = 10x9x8x7x6x5x4x3x2x1 = 3628800
Rule 4
The number of ways of selecting and arranging r objects from among N distinct objects is: N!/(N-r)! [nPr]
Example: pick 3 things from 10 items, and arrange them in order. Therefore N=10, r=3, so 10!/(10-3)! = 10!/7! = 720
Rule 5
The total number of ways of selecting r distinct combinations of N objects, irrespective of order (ie order NOT important), is: N!/r!(N-r)! [nCr]
Example: Pick 3 items from 10 in any order, where N=10, r=3. Therefore, 10!/3!(7!) = 720/6 = 120

## Consequences

We can now give some basic theorems using our axiomatic probability space.

### Theorem 1

Given a probability space (Ω,S,P), for events $A,B\in S$ :

$P(A\cup B)=P(A)+P(B)-P(A\cap B)$

## Axioms of Probability

A probability function has to satisfy the following three basic axioms or constraints: To each event a a measure (number) P(a) which is called the probability of event a is assigned. P(a) is subjected to the following three axioms:

1. P(a) ≥ 0
2. P(S) = 1
3. If a ∩ b = 0 , then P(a ∪ b) = P(a) + P(b)
Corollaries
• P(0) = 0
• P(a) = 1 － P(a') ≤ 1
• If a ∩ b ≠ 0 , then P(a ∪ b) = P(a) + P(b) － P(a ∩ b)
• If b ⊂ a , P(a) = P(b) + P(a ∩ b)≥ P(b)

# Mathematical Review

The review of set theory contained herein adopts a naive point of view. We assume that the meaning of a set as a collection of objects is intuitively clear. A rigorous analysis of the concept belongs to the foundations of mathematics and mathematical logic. Although we shall not initiate a study of these fields, the rules we follow in dealing with sets are derived from them. A set is a collection of objects, which are the elements of the set.

If an element $x$  belongs to a set $S$  , we express this fact by writing $x\in S$ . If $x$  does not belong to $S$  , we write $x\notin S$  . We use the equality symbol to denote logical identity. For instance, $x=y$  means that $x,y$  are symbols denoting the same object. Similarly, the equation $S=T$  states that $S,T$  are two symbols for the same set. In particular, the sets $S,T$  contain precisely the same elements. If $x,y$  are different objects then we write $x\neq y$  . Also, we can express the fact that $S,T$  are different sets by writing $S\neq T$  .

A set $S$  is a subset of $T$  if every element of $S$  is also contained in $T$  . We express this relation by writing $S\subset T$  . Note that this definition does not require $S$  to be different from $T$  . In fact, $S=T$  if and only if $S\subset T$  and $T\subset S$  . If $S\subset T$  and $S$  is different from $T$  , then $S$  is a proper subset of $T$  and we write $S\subsetneq T$  .

There are many ways to specify a set. If the set contains only a few elements, one can simply list the objects in the set;

$S=\{x_{1},x_{2},x_{3}\}$

The content of a set can also be enumerated whenever $S$  has a countable number of elements,

$S=\{x_{1},x_{2},\dots \}$

Usually, the way to specify a set is to take some collection $S$  of objects and some property that elements of $S$  may or may not possess, and to form the set consisting of all elements of $S$  having that property. For example, starting with the integers $\mathbb {Z}$  , we can form the subset of S consisting of all even numbers

$S=\{x\in \mathbb {Z} |x{\text{ is an even number}}\}$

More generally, we denote the set of all elements that have a certain property $P$  by

$S=\{x|x{\text{ satisfies }}P\}$

The braces are to be read as the words "the set of" whereas the symbol | stands for the words "such that." It is convenient to introduce two special sets. The empty set, denoted by $\varnothing$  , is a set that contains no elements. The universal set is the collection of all objects of interest in a particular context, and it is denoted by $\Omega$  . Once a universal set $\Omega$  is specified, we need only consider sets that are subsets of $\Omega$  . In the context of probability, $\Omega$  is often called the sample space.

The complement of a set $S$  , with respect to the universal set $\Omega$  , is the collection of all objects in $\Omega$  that do not belong to $S$  ,

$S^{c}={\bar {S}}=\{x\in \Omega |x\notin S\}$

We note that $\Omega ^{c}=\varnothing$  .

## Elementary Set Operations

Probability theory makes extensive use of elementary set operations. Below, we review the ideas of set theory, and establish the basic terminology and notation. Consider two sets $S,T$  .

The union of sets $S,T$  is the collection of all elements that belong to $S$  or $T$  (or both), and it is denoted by $S\cup T$  . Formally, we define the union of these two sets by

$S\cup T=\{x|x\in S{\text{ or }}x\in T\}$

The intersection of sets $S,T$  is the collection of all elements that are common to both $S$  and $T$  . It is denoted by $S\cap T$ , and it can be expressed mathematically as

$S\cap T=\{x|x\in S{\text{ and }}x\in T\}$

When $S,T$  have no elements in common, we write $S\cap T=\varnothing$  . We also express this fact by saying that $S,T$  are disjoint. More generally, a collection of sets is said to be disjoint if no two sets have a common element. A collection of sets is said to form a partition of $S$  if the sets in the collection are disjoint and their union is $S$  .

The difference of two sets, denoted by $S-T$  or $S\setminus T$  , is defined as the set consisting of those elements of $S$  that are not in $T$  ,

$S-T=S\setminus T=\{x|x\in S{\text{ and }}x\notin T\}$

This set is sometimes called the complement of $T$  relative to $T$  , or the complement of $T$  in $S$  .

We have already looked at the definition of the union and the intersection of two sets. We can also form the union or the intersection of arbitrarily many sets. This is defined in the obvious way,

$\bigcup _{\alpha \in I}S_{\alpha }=\{x|x\in S_{\alpha }{\text{ for some }}\alpha \in I\}$
$\bigcap _{\alpha \in I}S_{\alpha }=\{x|x\in S_{\alpha }{\text{ for all }}\alpha \in I\}$

The index set I can be finite or even infinite.

## Rules of Set Theory

Given a collection of sets, it is possible to form new ones by applying elementary set operations to them. As in algebra, one uses parentheses to indicate precedence. For instance, $R\cup (S\cap T)$  denotes the union of two sets $R,S\cap T$  , while $(R\cup S)\cap T$  represents the intersection of two sets $R\cup S,T$  .The sets thus formed are quite different.

Sometimes different combinations of operations lead to the same set. For instance, we have the two distributive laws

$R\cap (S\cup T)=(R\cap S)\cup (R\cap T)$
$R\cup (S\cap T)=(R\cup S)\cap (R\cup T)$

Two particularly useful equivalent combinations of operations are given by De Morgan's laws, which state that

$R-(S\cup T)=(R-S)\cap (R-T)$
$R-(S\cap T)=(R-S)\cup (R-T)$

These two laws can be generalized to

$\left(\bigcup _{\alpha \in I}S_{\alpha }\right)^{c}=\bigcap _{\alpha \in I}S_{\alpha }^{c}$
$\left(\bigcap _{\alpha \in I}S_{\alpha }\right)^{c}=\bigcup _{\alpha \in I}S_{\alpha }^{c}$

when multiple sets are involved. To establish the first equality, suppose that $x$  belongs to $\left(\bigcup _{\alpha \in I}S_{\alpha }\right)^{c}$  . Then $x$  is not contained in $\bigcup _{\alpha \in I}S_{\alpha }$  . That is, $x$  is not an element of $S_{\alpha }$  for any $\alpha \in I$  . This implies that $x$  belongs to $S_{\alpha }^{c}$  for all $\alpha \in I$  , and therefore $x\in \bigcap _{\alpha \in I}S_{\alpha }^{c}$  . We have shown that $\left(\bigcup _{\alpha \in I}S_{\alpha }\right)^{c}\subset \bigcap _{\alpha \in I}S_{\alpha }^{c}$  . The converse inclusion is obtained by reversing the above argument. The second law can be obtained in a similar fashion.

## Cartesian Products

There is yet another way to create new sets form existing ones. It involves the notion of an ordered pair of objects. Given sets $S,T$  , the cartesian product $S\times T$  is the set of all ordered pairs $x,y$  for which $x$  is an element of $S$  and $y$  is an element of $T$  ,

$S\times T=\{(x,y)|x\in S{\text{ and }}y\in T\}$

## Summary of Probability & Set Theory

• $A=P(A)\cdot \sum [0,1]$
• ${\text{not }}A=P({\bar {A}})=P(A^{c})=1-P(A)$
• $A{\text{ or }}B=P(A\cup B)=P(A)+P(B)-P(A\cap B)=P(A)+P(B)$  [*if and only if $A$  and $B$  are mutually exclusive]
• $A{\text{ and }}B=P(A\cap B)=P(A\setminus B)\cdot P(B)=P(A)\cdot P(B)$  [*if and only if $A$  and $B$  are independent]
• $A{\text{ given }}B=P(A|B)={\frac {P(A\cap B)}{P(B)}}$  [*conditional]
Basic Set Theory

UNION: combined area of set $A,B$  , known as $A\cup B$  . The set of all items which are members of either $A$  or $B$

• Union of $A,B$  are added together
• Some basic properties of unions:
• $A\cup B=B\cup A$
• $A\cup (B\cup C)=(A\cup B)\cup C$
• $A\subset (A\cup B)$
• $A\cup A=A$
• $A\cup \varnothing =A$  , where $\varnothing ={\text{empty set}}$
• $A\subset B$  , if and only if $A\cup B=B$

INTERSECTION: area where both $A,B$  overlap, known as $A\cap B$  . It represents which elements the two sets $A,B$  have in common

• If $A\cap B=\varnothing$  , then A and B are said to be DISJOINT.
• Some basic properties of intersections:
• $A\cap B=B\cap A$
• $A\cap (B\cap C)=(A\cap B)\cap C$
• $(A\cap B)\subset A$
• $A\cap A=A$
• $A\cap \varnothing =\varnothing$
• $A\subset B$  , if and only if $A\cap B=A$
• UNIVERSAL SET: space of all things possible, which contains ALL of the elements or elementary events.
• $U\setminus A$  is called the absolute complement of $A$

Complement (set): 2 sets can be subtracted. The relative complement (set theoretic difference of $B$  and $A$ ). Denoted by $B\setminus A$  (or $B-A$ ) is the set of all elements which are members of $B$  , but not members of $A$

Some basic properties of complements (${\overline {A}}\ ,\ A^{c}\ ,\ ~A\ ,\ A'$ ):

• $A\cup {\overline {A}}=U$
• $A\cap {\overline {A}}=\varnothing$
• ${\overline {({\overline {A}})}}=A$
• $A\setminus A=\varnothing$
• ${\overline {U}}=\varnothing$  , and ${\overline {\varnothing }}=U$
• $A\setminus B=A\cap {\overline {B}}$
Summary
• Intersection $A\cap B$  --> AND – both events occur together at the same time
• Union $A\cup B$  --> OR – everything about both events, $A,B$
• Complement ${\overline {A}}$  --> NOT $A$  – everything else except $A$  (or the event in question)
• $A\cup {\overline {A}}=S$  (sample space)
• $A\cap {\overline {A}}=\varnothing$  (impossible event)

Union and Intersection are:

Commutative
• $A\cup B=B\cup A$
• $A\cap B=B\cap A$
Associative
• $A\cup (B\cup C)=(A\cup B)\cup C$
• $A\cap (B\cap C)=(A\cap B)\cap C$
Distributive
• $A\cup (B\cap C)=(A\cup B)\cap (A\cup C)$
• $A\cap (B\cup C)=(A\cap B)\cup (A\cap C)$

# Combinatorics

Often, in experiments with finite sample spaces, the outcomes are equiprobable. In such cases, the probability of an event amounts to the number of outcomes comprising this event divided by the total number of outcomes in the sample space. While counting outcomes may appear straightforward, it is in many circumstances a daunting task. For example, consider the number of distinct subsets of the integers $\{1,\dots ,n\}$  that do not contain two consecutive integers. This number is equal to

${\frac {\phi ^{n+2}-(1-\phi )^{n+2}}{\sqrt {5}}}$

where $\phi ={\frac {1+{\sqrt {5}}}{2}}$  is the golden ratio. It can also be obtained recursively through the Fibonacci recurrence relation.

Calculating the number of ways that certain patterns can be formed is part of the field of combinatorics. In this section, we introduce useful counting techniques that can be applied to situations pertinent to probability theory.

## The Counting Principle

The Fundamental Rule of Counting: If a set of choices or trials, $T_{1},\dots ,T_{k}$  , could result, respectively, in $n_{1},\dots ,n_{k}$  possible outcomes, the entire set of $k$  choices or trials has $n_{1}\times \cdots \times n_{k}$  possible outcomes. (The numbers $n_{1},\dots ,n_{k}$  cannot depend on which outcomes actually occur.)

By the Fundamental Rule of Counting, the total number of possible sequences of choices is $5\times 4\times 3\times 2\times 1=120$  sequences. Each sequence is called a permutation of the five items. A permutation of items is an ordering of the items. More generally, by the Fundamental Rule of Counting, in ordering $n$  things, there are $n$  choices for the first, $n-1$  choices for the second, etc., so the total number of ways of ordering a total of $n$  things is $n\times (n-1)\times (n-2)\times \cdots \times 3\times 2\times 1$  . This product is written $n!$  , which is pronounced "n-factorial." By convention, $0!=1$

The counting principle is the guiding rule for computing the number of elements in a cartesian product as well.

$S\times T=\{(x,y)|x\in S\land y\in T\}$

The number of elements in the cartesian product $S\times T$  is equal to mn. Note that the total number of outcomes does not depend on the order in which the experiments are realized.

Example

Consider an experiment consisting of flipping a coin and rolling a die. There are two possibilities for the coin, heads or tails, and the die has six sides. The total number of outcomes for this experiment is $2\cdot 6=12$  . That is, there are twelve outcomes for the roll of a die followed by the toss of a coin:

$1H,1T,2H,2T,3H,3T,4H,4T,5H,5T,6H,6T$  .

The counting principle can be extended to compute the number of elements in the cartesian product of multiple sets. Consider the finite sets $S_{1},\dots ,S_{r}$  and their cartesian product

$S_{1}\times \cdots \times S_{r}=\{(s_{1},\dots ,s_{r})|s_{p}\in S_{p}\}$

If we denote the cardinality of $S_{p}$  by $n_{p}=|S_{p}|$  , then the number of distinct ordered r-tuples of the form $(s_{1},\dots ,s_{r})$  is $n=n_{1}\times \cdots \times n_{r}$  .

Example - Sampling with Replacement and Ordering

An urn contains $n$  balls numbered $1,\dots ,n$  . A ball is drawn from the urn, and its number is recorded on an ordered list. The ball is then replaced in the urn. This procedure is repeated $k$  times. We wish to compute the number of possible sequences that results from this experiment. There are $k$  drawings and n possibilities per drawing. Using the counting principle, we conclude that the number of distinct sequences is $n^{k}$  .

Example

The power set of S, denoted by $2^{S}$  , is the set of all subsets of $S$  . In set theory, $2^{S}$  represents the set of all functions $S\to \{0,1\}$  . By identifying a function in $2^{S}$  with the corresponding preimage of one, we obtain a bijection between $2^{S}$  and the subsets of $S$ . In particular, each function in $2^{S}$  is the characteristic function of a subset of $S$  .

Suppose that $S$  is finite with $n=|S|$  elements. For every element of $S$  , a characteristic function in $2^{S}$  is either 0 or 1. There are therefore $2^{n}$  distinct characteristic functions $S\to \{0,1\}$  . Hence, the number of distinct subsets of $S$  is given by $2^{n}$  .

Formulae

The formula for counting combinations has special cases that are worth memorizing:

${\binom {n}{0}}={\binom {n}{n}}=1$  (There is only one way to pick no thing and only one way to pick all $n$  things.)
${\binom {n}{1}}={\binom {n}{n-1}}=n$  (there are n ways to pick one thing or to leave one thing out)
${\binom {n}{k}}={\binom {n}{n-k}}$  (There are the same number of ways of picking $k$  of $n$  things as there are of leaving out $k$  of $n$  things)

## Permutations

In this section, we will discuss groups of unique objects where ordering is important.
Permutation: An arrangement of objects without repetition where order is important.
Importance of Order: We say that the order is important when sets $S$  and $S'$  contain the exact same objects, but in a different order. In this case, we treat them differently. For example, $S=\{H,T\}$ , and $S'=\{T,H\}$  are different.

Example: Find all 2-letter permutations of $A,B,C,D$

Solution: Start with the letter $A$ . It can pair up with $B$ , $C$ , or $D$ , but NOT $A$ . Therefore, all the two letter permutation that start with $A$  are:
$A,B$
$A,C$
$A,D$

Similarly, for B as the starting letter,
$B,A$
$B,C$
$B,D$

Notice that since the order is important, $A,B$  and $B,A$  are NOT the same thing!

For C,
$C,A$
$C,B$
$C,D$

Finally, for D,
$D,A$
$D,B$
$D,C$

Permutations using all the objects: $n$  objects, arranged into group size of $n$  without repetition, and order being important. $P(n,n)=n!$  Example: Find all permutations of $A,B,C,D,F,G,H$  .

Permutations of some of the objects: $n$  objects, group size $k$  , order is important. $P(n,k)={\frac {n!}{(n-k)!}}$  Example: Find all 2-letter combinations using letters $A,B,C,D,F,G,H$

Distinguished permutation: if a word has $n$  letters, $k$  of which are unique, let $n$  $(n_{1},\dots ,n_{k})$  be the frequency of each of the $k$  letters: ${\frac {n!}{(n_{1}!)(n_{2}!)(n_{3}!)}}$  .

Consider the integer set $S=\{1,\dots ,n\}$  . A permutation of S is an ordered arrangement of its elements, a list without repetitions. The number of permutations of S can be computed as follows. The number of distinct possibilities for the first item is $n$  . The number of possibilities for the second item is $n-1$  , namely all the integers in $S$  except the first element in the list. Similarly, the number of distinct possibilities for the $m$ -th item is $n-m+1$  . This pattern continues until all the elements in $S$  are listed. Summarizing, we find that the total number of permutations of $S$  is $n$ -factorial, $n!=n\times (n-1)\times \cdots \times 2\times 1$  .

Example

We wish to compute the number of permutations of $S=\{1,2,3\}$  . Since the set S contains three elements, it has $3!=6$  permutations. They can be listed as 123, 132, 213, 231, 312, 321.

### Stirling's Formula

The number n! grows very rapidly as a function of $n$  . A good approximation for $n!$  when n is large is given by Stirling's formula,

$n!\sim \left({\frac {n}{e}}\right)^{n}{\sqrt {2\pi n}}$

as $n\to \infty$  . The notation $a(n)\sim b(n)$  signifies that the ratio ${\frac {a(n)}{b(n)}}$  approaches 1 as $n$  tends to infinity.

### $k$ -Permutations

Suppose that we list only k elements out of the set S = {1, 2, ... , n}, where $k\leq n$ . We wish to count the number of distinct k-permutations of S. Following our previous argument, we can choose one of n elements to be the first item listed, one of n-1 elements for the second item, and so on. The procedure terminates when k items have been listed. The number of possible sequences is

${\frac {n!}{(n-k)!}}=n(n-1)\cdots (n-k+1).$

Example: A recently formed music group has four original songs they can play. They are asked to perform two songs at a music festival. We wish to compute the number of song arrangements the group can offer in concert. Abstractly, this is equivalent to computing the number of 2-permutations of four songs. Thus, the number of distinct arrangements is 4!/2! = 12.

Example - Sampling without Replacement, with Ordering: An urn contains n balls numbered one through n. A ball is picked from the urn, and its number is recorded on an ordered list. The ball is not replaced in the urn. This procedure is repeated until k balls are selected from the urn, where $k\leq n$ . We wish to compute the number of possible sequences that results from this experiment. The number of possibilities is equivalent to the number of k-permutations of n elements, which is given by n!/(n-k)!.

## Combinations

Combinations: arrangement of objects without repetition, where order is NOT important. A combination of n objects, arranged in groups of size r, without repetition, and order being important. C(n, r) = n!/r!(n-r)! Another way to write a combination of n things, r at a time, is using the Binomial notation (Binomial Distribution). Combinations of this type are referred to verbally as "n choose r".

Consider the integer set S = {1, 2, ... , n}. A combination is a subset of S. We emphasize that a combination differs from a permutation in that elements in a combination have no specific ordering. The 2-element subsets of S = {1, 2, 3, 4} are {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4} , whereas the 2-permutations of S are more numerous with (1, 2), (1, 3), (1, 4), (2, 1), (2, 3), (2, 4), (3, 1), (3, 2), (3, 4), (4, 1), (4, 2), (4, 3) . There are therefore fewer 2-element subsets of S than 2-permutations of S.

We can compute the number of k-element combinations of S = {1, 2, ... , n} as follows. First, we note that a k-permutation can be formed by first selecting k objects from S and then ordering them. There are k! distinct ways of ordering k objects. The number of k-permutations must then equal the number of k-element combinations multiplied by k!. Since the total number of k-permutations of S is n! / (n-k)!, the number of k-element combinations is found to be

${\frac {n!}{k!(n-k)!}}={\frac {n(n-1)\cdots (n-k+1)}{k!}}.$

This expression is called a binomial coefficient. Observe that selecting a k-element subset of S is equivalent to choosing the n-k elements that belong to its complement.

Example - Sampling without Replacement or Ordering: An urn contains n balls numbered one through n. A ball is drawn from the urn and placed in a jar. This process is repeated until the jar contains k balls, where $k\leq n$ . We wish to compute the number of distinct combinations the jar can hold after the completion of this experiment. Because there is no ordering in the jar, this amounts to counting the number of k-element subsets of a given n-element set, which is given by

${\frac {n!}{k!(n-k)!}}.$

Again, let S = {1, 2, ... , n}. Since a combination is also a subset and the number of k-element combinations of S is ${\frac {n!}{k!(n-k)!}}$ , the sum of the binomial coefficients over all values of k must be equal to the number of elements in the power set of S. Thus, we get

$\sum _{k=0}^{n}{\frac {n!}{k!(n-k)!}}=2^{n}.$

## Partitions

Abstractly, a combination is equivalent to partitioning a set into two subsets, one containing k objects and the other containing the n-k remaining objects. In general, the set S = {1, 2, ... , n } can be partitioned into r subsets. Let $n_{1},n_{2},\ldots ,n_{r}$  be nonnegative integers such that $\sum _{p=1}^{r}n_{p}=n.$

Consider the following iterative algorithm that leads to a partition of S. First, we select a subset of $n_{1}$  elements from S. Having chosen the first subset, we select a second subset containing $n_{2}$  elements from the remaining $n-n_{1}$  elements. We continue this procedure by successively choosing subsets of $n_{p}$  elements from the remaining $n-n_{1}-\cdots -n_{p-1}$  elements, until no element remains. This algorithm yields a partition of S into r subsets, with the p-th subset containing exactly $n_{p}$  elements.

We wish to count the number of such partitions. We know that there are ${\frac {n!}{n_{1}!(n-n_{1})!}}$  ways to form the first subset. Examining our algorithm, we see that there are

${\frac {(n-n_{1}-\cdots -n_{p-1})!}{n_{p}!(n-n_{1}-\cdots -n_{p})!}}$

ways to form the p-th subset. Using the counting principle, the total number of partitions is then given by

${\frac {n!}{n_{1}!n_{2}!\cdots n_{r}!}}.$

This expression is called a multinomial coefficient.

Example: A die is rolled nine times. We wish to compute the number of outcomes for which every odd number appears three times. The number of distinct sequences in which one, three, and five each appear three times is equal to the number of partitions of {1, 2, ... , 9} into three subsets of size three, namely

${\frac {9!}{3!3!3!}}=1680.$

In the above analysis, we assume that the size of each subset is fixed, three subsets of size three. Suppose instead that we are interested in counting the number of ways of picking the sizes of the subsets, r subsets of $n_{1},n_{2},\ldots ,n_{r}$  size, where the sum of the sizes is a constant. Specifically, we wish to compute the number of ways integers $n_{1},n_{2},\ldots ,n_{r}$  can be selected such that every integer is nonnegative and

$\sum _{p=1}^{r}n_{p}=n$ .

We can visualize the number of possible assignments as follows. Picture n balls spaced out on a straight line and consider r-1 vertical markers, each of which can be put between two consecutive balls, before the first ball, or after the last ball. For instance, if there are five balls and two markers then one possible assignement is illustrated below.

The number of objects in the first subset corresponds to the number balls before the first marker. Similarly, the number of objects in the p-th subset is equal to the number of balls between the p-th marker and the preceding one. Finally, the number of objects in the last subset is simply the number of balls after the last marker. In the figure, the integer assignment is

$(n_{1},n_{2},n_{3})=(0,2,3).$

Two consecutive markers imply that the corresponding subset is empty. There is a natural bijection between an integer assignment and the graphical representation depicted above. To count the number of possible integer assignments, it then suffices to compute the number of ways to position the markers and the balls. In particular, there are n + r - 1 positions, n balls, and r - 1 markers. The number of ways to assign the markers is equal to the number of n-combinations of n + r - 1 elements,

${\frac {(n+r-1)!}{n!(r-1)!}}.$

Example - Sampling with Replacement, without Ordering: An urn contains r balls numbered one through r. A ball is drawn from the urn and its number is recorded. The ball is then returned to the urn. This procedure is repeated a total of n times. The outcome of this experiment is a table that contains the number of times each ball has come in sight. We are interested in computing the number of distinct outcomes. This is equivalent to counting the ways a set with n elements can be partitioned into r subsets. The number of possible outcomes is therefore given by

${\frac {(n+r-1)!}{n!(r-1)!}}.$

## Computing Probabilities

In this section, we present two straightforward applications of combinatorics to computing the probability of winning the lottery.

Example - Pick 3 Texas Lottery: The Texas Lottery game Pick 3 is easy to play. A player must pick three numbers from zero to nine, and choose how to play them: exact order, or any order. The Pick 3 balls are drawn using three air-driven machines. These machines use compressed air to mix and select each ball.

The probability of winning while playing the exact order is

${\frac {1}{10}}{\frac {1}{10}}{\frac {1}{10}}={\frac {1}{1000}}.$

The probability of winning while playing any order depends on the numbers selected. If three distinct numbers are selected then the probability of winning is 3/500. If a number is repeated twice, the probability of winning is 3/1000. While, if the same number is selected three times, the probability of winning becomes 1/1000.

Example - Mega Millions Texas Lottery: To play the Mega Millions game, a player must select five numbers from 1 to 56 in the upper white play area of the play board, and one Mega Ball number from 1 to 46 in the lower yellow play area of the play board. All drawing equipment is stored in a secured on-site storage room. Only authorized drawings department personnel have keys to this door. Upon entry of the secured room to begin the drawing process, a lottery drawing specialist examines the security seal to determine if any unauthorized access has occurred. For each drawing, the Lotto Texas balls are mixed by four acrylic mixing paddles rotating clockwise. High speed is used for mixing and low speed for ball selection. As each ball is selected, it rolls down a chute into an official number display area. We wish to compute the probability of winning the Mega Millions Grand Prize, which require the correct selection of the five white balls plus the gold Mega ball. The probability of winning the Mega Millions Grand Prize is

${\frac {51!5!}{56!}}{\frac {1}{46}}={\frac {1}{175711536}}.$

# Conditional Probability

In some situations we need a new kind of probability. You are standing in front of three closed doors, and you know that there is a tiger behind one of them. To your relief, door #1 is opened and there is no tiger. What is the conditional probability that the tiger is behind door #2, given that there was no tiger behind door #1?

## Definition

The conditional probability of an event A, given the occurrence event B, is written as P(A|B). What Wikipedia calls Bayes' theorem is,

$P(A|B)={\frac {P(A\cap B)}{P(B)}}\,\Leftrightarrow$    $P(A\cap B)=P(A|B)P(B)=P(B|A)P(A)$

provided P(B) > 0. In the case where P(B) = 0, the conditional probability of A given B is meaningless, but for completeness we define P(A|B) = 0 for P(B) = 0. This formula is equivalent to: P(A|B) P(B) = P(A and B). The expression,P(A and B),is often written as P(A & B)

Independent events

Intuitively, we would like two events, A and B, to be independent if P(A|B) = P(A), that is, if A is as likely to occur whether or not B has occurred. In this case, the occurrence of A is independent of the occurrence of B.

However, P(A|B) = P(A) $\iff$  P(A and B)/P(B) = P(A) $\iff$  P(A and B) = P(A)P(B).

Therefore, A is independent from B $\iff$  B is independent from A $\iff$  P(A and B) = P(A)P(B).

### Examples

#### Even and prime numbers

Consider the integers {1,2,3,4,5}. Using the notation shown in the chart below, the set of odd numbers { 1, 3, 5 } = e'. The integer "1" not considered a prime number, while "2" is prime, so that: pe = { 2 }.

Category elements set compliment
Prime numbers 2,3,5 p denote "prime" p' denote "not prime"
Even number 2,4,6 e denotes "even" e' denotes "not even"

The set {1,2,3,4,5} contains three prime numbers, but only one of them ("2") is even. Therefore, the conditional probability that an element of the set {1,2,3,4,5} is even, given that it is prime, is P(even | prime) = 1/3.

The set {1,2,3,4,5} contains two even numbers, but only one of them is prime. Therefore, the conditional probability that an element of the set {1,2,3,4,5} is prime, given that it is even, is P(prime | even) = 1/2.

What is the probability that a randomly selected member of this set is both prime and even? There are three ways to answer this question. The first is direct and makes no use of conditional probability. Since the only number in the set that is both prime and even is "2", we have:

1)   $P({\text{even }}\&{\text{ prime}})=P(2)={\frac {1}{5}}$

The other two methods use P(A & B) = P(A ∩ B) = P(A)P(B|A) = P(B)P(A|B):

2)   $P({\text{even }}\&{\text{ prime}})=P({\text{even}})\cdot P({\text{prime}}|{\text{even}})={\frac {2}{5}}\cdot {\frac {1}{2}}={\frac {1}{5}}$

3)   $P({\text{even }}\&{\text{ prime}})=P({\text{prime}})\cdot P({\text{even}}|{\text{prime}})={\frac {3}{5}}\cdot {\frac {1}{3}}={\frac {1}{5}}$

Can you use the three different methods to calculate P(even & prime) for elements in the set {1,2,3,4,5,6,7,8,9,10}?

#### Dice

We shall throw two dice: the probability of getting double six is 1/36. We have thrown the dice, but they are still under the cup: the probability of double six still is considered to be 1/36. Now we have someone else look under the cup at the result, and they tell us that at least one of the dice is a 6. What will the probability be of double six? Although we ask for the "probability" of double six, it's not the original probability we're talking about. It's a new kind of probability of the event "double six" under the condition that we know that another event has occurred, i.e. the event that one of the dice shows six. We call it "conditional probability". It indicates how likely the event "double six" is amongst all events in which (at least) one of the dice shows six. In 11 of the 36 possibilities at least one of the dice shows six. Thus, the probability of one of the dice showing six is 11/36 and the probability of double six is 1/36. Hence, the conditional probability of double six, given one die shows six is (1/36)/(11/36) = 1/11. We write this conditional probability as:

P( "double six" | "at least one six" ),

using the same letter P as for the original probability, and separating the event of which the conditional probability is indicated from the event that states the condition by a vertical line ("|").

#### Gender and occupation

As another example we pick at random an inhabitant of the UK. The probability of the chosen person to be a woman (W) is 1/2, it's nothing else than the fraction women amongst the British. What if we are informed that the chosen person is working in a hospital (H)? It now comes down to the fraction of women amongst hospital staff, let's say 0.7. So we have:

P(W) = 0.5

and

P(W|H) = 0.7.

# Random Variables

## Random Variables: Definitions

Formally, a random variable on a probability space $(\Omega ,\Sigma ,P)$  is a measurable real function X defined on $\Omega$  (the set of possible outcomes)

$X:\Omega \ \to \ \mathbb {R}$ ,

where the property of measurability means that for all real x the set

$\{X\leq x\}:=\{\omega \in \Omega |X(\omega )\leq x\}\in \Sigma$ , i.e. is an event in the probability space.

### Discrete variables

If X can take a finite or countable number of different values, then we say that X is a discrete random variable and we define the mass function of X, p($x_{i}$ ) = P(X = $x_{i}$ ), which has the following properties:

• p($x_{i}$ ) $\geq$  0
• $\sum _{i}p(x_{i})=1$

Any function which satisfies these properties can be a mass function.

Variables
We need some way to talk about the objects of interest. In set theory, these objects will be sets; in number theory, they will be integers; in functional analysis, they will be functions. For these objects, we will use lower-case letters: a, b, c, etc. If we need more than 26 of them, we’ll use subscripts.
Random Variable
an unknown value that may change everytime it is inspected. Thus, a random variable can be thought of as a function mapping the sample space of a random process to the real numbers. A random variable has either a associated probability distribution (discrete random variable) or a probability density function (continuous random variable).
Random Variable "X"
formally defined as a measurable function (probability space over the real numbers).
Discrete variable
takes on one of a set of specific values, each with some probability greater than zero (0). It is a finite or countable set whose probability is equal to 1.0.
Continuous variable
can be realized with any of a range of values (ie a real number, between negative infinity and positive infinity) that have a probability greater than zero (0) of occurring. Pr(X=x)=0 for all X in R. Non-zero probability is said to be finite or countably infinite.

### Continuous variables

If X can take an uncountable number of values, and X is such that for all (measurable) A:

$P(X\in A)=\int _{A}f(x)dx$ ,

we say that X is a continuous variable. The function f is called the (probability) density of X. It satisfies:

• $f(x)\geq \ 0\ \forall x\in \ \mathbb {R}$
• $\int _{-\infty }^{\infty }f(x)dx=1$

### Cumulative Distribution Function

The (cumulative) distribution function (c.d.f.) of the r.v. X, $F_{X}$  is defined for any real number x as:

$F_{X}(x)=P(X\leq x)={\begin{cases}\sum _{i:x_{i}\leq \ x}p(x_{i}),&{\mbox{if }}X{\mbox{ is discrete}}\\\,\\\int _{-\infty }^{x}f(y)dy,&{\mbox{if }}X{\mbox{ is continuous}}\end{cases}}$

The distribution function has a number of properties, including:

• $\lim _{x\to -\infty }F(x)=0$  and $\lim _{x\to \infty }F(x)=1$
• if x < y, then F(x) ≤ F(y) -- that is, F(x) is a non-decreasing function.
• F is right-continuous, meaning that F(x+h) approaches F(x) as h approaches zero from the right.

# Important Distributions

## Uniform Distribution

The uniform distribution is a model for "no preference". For example, if we roll a fair dice numbered from 1 to 6, each side has equal probability of facing up, and the probability of each number would be 1/6. For the continuous analog on of such a distribution, there is an equal probability of picking any range of a given size. For example, for a uniform distribution from 0 to 10, the probability of picking a real number between 1 and 2 is 1/10, the same as for picking a real number between 5.43 and 6.43.

This simple distribution forms the basis of much of the study of probability. It mirrors the type of activity observed from rolling a fair dice, or flipping a coin. This intuitively relates to many standard problems. A number of other distributions derive from this distribution. For instance, while any one roll of a dice has a uniform distribution, summing up the totals of rolling a dice lots of time, or taking their average, does not have a uniform distribution, but approximates a Gaussian distribution, which we will discuss later.

The uniform distribution on the interval [a,b] is a continuous distribution with probability density f given by:

$f(x)=\left\{{\begin{matrix}{\frac {1}{b-a}}&\ \ \ {\mbox{for }}ab,\\\\\mathrm {see} \ \mathrm {below} &\ \ \ {\mbox{for }}x=a{\mbox{ or }}x=b.\end{matrix}}\right.$

The value of the density at a or b can be 0, ${\frac {1}{b-a}}$ , or any other finite value without changing finite probabilities. For a more complete discussion of this point and for more information on the uniform distribution, see the Uniform distribution (continuous) page at Wikipedia.

A random variable X with this distribution is called a random number between a and b.

## Binomial Distribution

Many random experiments are of the type where one counts the number of successes in a series of a fixed number of independently repeated trials which may result in either success or failure. The distribution of the number of successes is a binomial distribution. It is a discrete probability distribution with two parameters, traditionally indicated by n, the number of trials, and p, the probability of success.

A well known example of such an experiment is the repeated tossing of a coin and counting the number of times "heads" comes up.

One says that the random variable X follows the binomial distribution with parameters n and p, and writes X ~ B(n, p) if the probability of getting exactly k successes is given by the probability mass function:

$p(k)={n \choose k}p^{k}(1-p)^{n-k}\,$

for $k=0,1,2,\dots ,n$  and where

${n \choose k}={\frac {n!}{k!(n-k)!}}$

is the binomial coefficient.

The formula is easily derived from the fact that from n trials k should be successful, having probability $p^{k}$  and nk failures, with probability $(1-p)^{n-k}$ . This gives a specific outcome a probability of $p^{k}(1-p)^{n-k}$  An outcome with exactly k successes, however, may occur in a number of specific orders, i.e. one has to choose k of the n outcomes to be successful. This leads to the binomial coefficient as factor to be multiplied.

For more information see the binomial distribution article at Wikipedia.

## The Poisson Distribution

The Poisson distribution with parameter $\lambda >0\,$  is a discrete probability distribution with mass function given by:

$p(k)={\frac {\lambda ^{k}}{k!}}e^{-\lambda }$

where k is a nonnegative integer.

The Poisson distribution is sometimes called: "the distribution of rare events", meaning it indicates the distribution of the number of events occurring in a fixed time interval, or fixed spatial area, where the occurrence of an event is "rare" in the sense that no two events may happen together.

The random variable X indicating this number of occurrences is called a Poisson random variable. The Poisson random variable is of fundamental importance in communication and queuing theory.

It is possible to obtain the Poisson distribution as the limit of a sequence of binomial distributions.

Let $\lambda$  be fixed and consider the binomial distributions with parameters n and $\lambda /n$ :

$p_{n}(k)={\frac {n!}{k!(n-k)!}}\left({\frac {\lambda }{n}}\right)^{k}\left(1-{\frac {\lambda }{n}}\right)^{n-k}={\frac {n!}{n^{k}(n-k)!}}{\frac {\lambda ^{k}}{k!}}\left(1-{\frac {\lambda }{n}}\right)^{n-k}.$

In the limit, as n approaches infinity, the binomial probabilities tend to a Poisson probability with parameter λ:

$\lim _{n\rightarrow \infty }p_{n}(k)=\lim _{n\rightarrow \infty }{\frac {n!}{n^{k}(n-k)!}}{\frac {\lambda ^{k}}{k!}}\left(1-{\frac {\lambda }{n}}\right)^{n-k}={\frac {\lambda ^{k}}{k!}}e^{-\lambda }.$

This result shows that the Poisson PMF can be used to approximate the PMF of a binomial distribution. Suppose that Y is a binomial random variable with parameters n and p. If n is large and p is small, the probability that Y equals k is not easily calculated, but may be approximated by a Poisson probability with parameter np:

$P(Y=k)={n \choose k}p^{k}\left(1-p\right)^{n-k}\approx {\frac {(np)^{k}}{k!}}e^{-np},$ .

The above-mentioned limiting process may be illustrated with the following example. Count the yearly number X of car accidents on an important roundabout. Let us assume on the average 8 accidents happen yearly. To calculate the distribution, we consider the 12 months and count a success when an accident has happened that month and else a failure. The number of successes will constitute a binomial random variable $X_{12}$  with parameters n=12 and p=8/12. As the average number is 8, it is still likely some months will meet more than one accident, so we consider weeks and count the number of weeks with an accident as successful. This number $X_{52}$  may be considered binomial distributed with parameters n=52 and p=8/52. While there may still be weeks with two accidents, we turn to days. The number $X_{365}$  of days with success may be considered binomial distributed with parameters n=365 and p=8/365. The limiting distribution by considering hours, minutes, seconds, etc. gives the Poisson distribution of X with parameter 8. We also note an important condition on the occurrence of the events (accidents): we only count them correctly if in the end we may separate them in disjoint time intervals. So it is not allowed that two accidents happen the same moment.

## Normal Distribution

The normal or Gaussian distribution is a thing of beauty, appearing in many places in nature. This probably is a result of the normal distribution resulting from the law of large numbers, by which a sum of many random variables (with finite variance) becomes a normally distributed random variable according to the central limit theorem.

Also known as the bell curve, the normal distribution has been applied to many social situations, but it should be noted that its applicability is generally related to how well or how poorly the situation satisfies the property mentioned above, whereby many finitely varying, random inputs result in a normal output.

The formula for the density f of a normal distribution with mean $\mu$  and standard deviation $\sigma$  is:

$f(x)={\frac {1}{\sigma {\sqrt {2\pi }}}}\,e^{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2}}$ .

A rather thorough article in Wikipedia could be summarized to extend the usefulness of this book: Normal distribution from Wikipedia.

# Links

Hyperlinks to external resources on probability:

1. Note the mixed notations, P(A & B) = P(A ∩ B). The former means the element is in A 'and the element is in B, while the latter means the element is in the intersection of sets A and B.