This Quantum World/Appendix/Probability

Probability

Basic Concepts

Probability is a numerical measure of likelihood. If an event has a probability equal to 1 (or 100%), then it is certain to occur. If it has a probability equal to 0, then it will definitely not occur. And if it has a probability equal to 1/2 (or 50%), then it is as likely as not to occur.

You will know that tossing a fair coin has probability 1/2 to yield heads, and that casting a fair die has probability 1/6 to yield a 1. How do we know this?

There is a principle known as the principle of indifference, which states: if there are n mutually exclusive and jointly exhaustive possibilities, and if, as far as we know, there are no differences between the n possibilities apart from their names (such as "heads" or "tails"), then each possibility should be assigned a probability equal to 1/n. (Mutually exclusive: only one possibility can be realized in a single trial. Jointly exhaustive: at least one possibility is realized in a single trial. Mutually exclusive and jointly exhaustive: exactly ony possibility is realized in a single trial.)

Since this principle appeals to what we know, it concerns epistemic probabilities (a.k.a. subjective probabilities) or degrees of belief. If you are certain of the truth of a proposition, then you assign to it a probability equal to 1. If you are certain that a proposition is false, then you assign to it a probability equal to 0. And if you have no information that makes you believe that the truth of a proposition is more likely (or less likely) than its falsity, then you assign to it probability 1/2. Subjective probabilities are therefore also known as ignorance probabilities: if you are ignorant of any differences between the possibilities, you assign to them equal probabilities.

If we assign probability 1 to a proposition because we believe that it is true, we assign a subjective probability, and if we assign probability 1 to an event because it is certain that it will occur, we assign an objective probability. Until the advent of quantum mechanics, the only objective probabilities known were relative frequencies.

The advantage of the frequentist definition of probability is that it allows us to measure probabilities, at least approximately. The trouble with it is that it refers to ensembles. You can't measure the probability of heads by tossing a single coin. You get better and better approximations to the probability of heads by tossing a larger and larger number $N$ of coins and dividing the number $N_{H}$ of heads by $N.$ The exact probability of heads is the limit

p(H)=\lim _{N\rightarrow \infty }{\frac {N_{H}}{N}}.

The meaning of this formula is that for any positive number $\epsilon ,$ however small, you can find a (sufficiently large but finite) number $N$ such that

\left|p(H)-{\frac {N_{H}}{N}}\right|<\epsilon .

The probability that $m$ events from a mutually exclusive and jointly exhaustive set of $n$ possible events happen is the sum of the probabilities of the $m$ events. Suppose, for example, you win if you cast either a 1 or a 6. The probability of winning is

p(1{\hbox{ or }}6)=p(1)+p(6)={\frac {1}{6}}+{\frac {1}{6}}={\frac {1}{3}}.

In frequentist terms, this is virtually self-evident. $N(1)/N$ approximates $p(1),$ $N(6)/N$ approximates $p(6),$ and $[N(1)+N(6)]/N$ approximates $p(1{\hbox{ or }}6).$

The probability that two independent events happen is the product of the probabilities of the individual events. Suppose, for example, you cast two dice and you win if the total is 12. Then

p(6{\hbox{ and }}6)=p(6)\times p(6)={\frac {1}{6}}\times {\frac {1}{6}}={\frac {1}{36}}.

By the principle of indifference, there are now $6\times 6=36$ equiprobable possibilities, and casting a total of 12 with two dice is one of them.

It is important to remember that the joint probability $p(A,B)=p(A{\hbox{ and }}B)$ of two events $A,B$ equals the product of the individual probabilities $p(A)$ and $p(B)$ only if the two events are independent, meaning that the probability of one does not depend on whether or not the other happens. In terms of propositions: the probability that the conjunction $P_{1}{\hbox{ and }}P_{2}$ is true is the probability that $P_{1}$ is true times the probability that $P_{2}$ is true only if the probability that either proposition is true does not depend on whether the other is true or false. Ignoring this can have the most tragic consequences.

The general rule for the joint probability of two events is

p(A,B)=p(B|A)\,p(A)=p(A|B)\,p(B).

$p(B|A)$ is a conditional probability: the probability of $B$ given that $A.$

To see this, let $N(A,B)$ be the number of trials in which both $A$ and $B$ happen or are true. $N(A,B)/N$ approximates $p(A,B),$ $N(A,B)/N(A)$ approximates $p(B|A),$ and $N(A)/N$ approximates $p(A).$ But

p(A,B)\;{\stackrel {N\rightarrow \infty }{\longleftarrow }}\;{\frac {N(A,B)}{N}}={\frac {N(A,B)}{N(A)}}\times {\frac {N(A)}{N}}\;{\stackrel {N\rightarrow \infty }{\longrightarrow }}\;p(B|A)\,p(A).

An immediate consequence of this is Bayes' theorem:

p(B|A)={\frac {p(A|B)}{p(A)}}p(B).

The following is just as readily established:

p(X)=p(X|Y)\,p(Y)+p(X|{\overline {Y}})\,p({\overline {Y}}),

where ${\overline {Y}}$ happens or is true whenever $Y$ does not happen or is false. The generalization to $n>2$ mutually exclusive and jointly exhaustive possibilities should be obvious.

Given a random variable, which is a set $X=\{x_{1},\dots ,x_{n}\}$ of random numbers, we may want to know the arithmetic mean

\langle X\rangle ={\frac {1}{n}}\sum _{k=1}^{n}x_{k}={\frac {x_{1}+\cdots +x_{n}}{n}}

as well as the standard deviation, which is the root-mean-square deviation from the arithmetic mean,

\sigma (X)={\sqrt {{\frac {1}{n}}\sum _{k=1}^{n}(x_{k}-\langle X\rangle )^{2}}}.

The standard deviation is an important measure of statistical dispersion.

Given $n$ possible measurement outcomes $v_{1},\dots v_{n}$ with probabilities $p_{k}=p(v_{k}),$ we have a probability distribution $\{p_{1},\dots ,p_{n}\},$ and we may want to know the expected value of $X,$ defined by

\langle X\rangle =\sum _{k=1}^{n}p_{k}x_{k}

as well as the corresponding standard deviation

\sigma (X)={\sqrt {\sum _{k=1}^{n}p_{k}(x_{k}-\langle X\rangle )^{2}}},

which is a handy measure of the fuzziness of $X$ .

We have defined probability as a numerical measure of likelihood. So what is likelihood? What is probability apart from being a numerical measure? The frequentist definition covers some cases, the epistemic definition covers others, but which definition would cover all cases? It seems that probability is one of those concepts that are intuitively meaningful to us, but — just like time or the experience of purple — cannot be explained in terms of other concepts.

NEXT >