Statistics/Numerical Methods/Random Number Generation

Definition and Usage

Random Numbers → ”A sequence of integers or group of numbers which show absolutely no relationship to each other anywhere in the sequence. At any point, all integers have an equal chance of occurring, and they occur in an unpredictable fashion”

Many statistical methods rely on random numbers. The use of random numbers, however, has expanded beyond random sampling or random assignment of treatments to experimental units. More common uses now are in simulation studies of physical processes, of analytically intractable mathematical expressions, or of a population resampling from a given sample from that population. These three general areas of application are sometimes called simulation, Monte Carlo, and resampling.

Uniform Random Number Generation

Given an infinite sequence $R_{0},R_{1},...,R_{n},...$ most people’s notion of random would include that the numbers be uniformly distributed in the interval (0, 1). We denote this distribution by U(0, 1). I will present in this paper the two ways of generating Uniform Random Numbers : Linear Congruential Generators and Tausworthe Generators.

Linear Congruential Generators

D. H. Lehmer in 1948 proposed the linear congruential generator as a source of random numbers. In this generator, each single number determines its successor. The form of the generator is

                        $X_{i+1}$  = (a $X_{i}$  + c) mod m   , with  0 ≤  $X_{i}$  ≤ m .

m is called modulus. $X_{0}$ , a, and c are known as the seed, multiplier, and the increment respectively.

For example, consider m = 31, a = 7, c = 0 and begin with $X_{0}$ = 19. The next integers in the sequence are

                      9, 1, 7, 18, 2, 14, 5, 4, 28, 10, 8, 25, 20, 16, 19,

so, of course, at this point the sequence begins to repeat.

If we would have taken a = 3 instead of a = 7, we would have got:

     26, 16, 17, 20, 29, 25, 13, 8, 24, 10, 30, 28, 22, 4, 12, 5, 15,
              14, 11, 2, 6, 18, 23, 7, 21, 1, 3, 9, 27, 19

From the simple example above we can guess that modulus, multiplier, and increment play a role in the period length of the linear congruential generator.

The Period → The period is the smallest positive integer λ for which $X_{\lambda }=X_{0}$ . The period can be no greater than m. Therefore, m is chosen to be equal or nearly equal to the largest representable integer in the computer to get a long period.

A full period generator is one in which the period is m, and it is obtained iff:

1. c is relatively prime to m; 2. (a-1) is a multiple of q, for each prime factor q of m; 3. (a-1) is a multiple of 4, if m is.

The Increment →

If c > 0 , we can achieve a full period by such :

1- m = $2^{b}$ → faster computer arithmetic,

2- Set (a-1) as a multiple of 4,

3- c should be odd-valued,

4- Set b as high as possible. For example b=31 in a 32-bit computer.

If c = 0, a full period is not possible. A maximum number of random variables, then, can be achieved by such :

1-A maximum period generator, with λ = m − 1, is one in which a is a primitive element modulo m, if m is prime.

                           i. a mod m 6= 0
                           ii. a(m−1)/q modm 6= 1 , for each prime q of (m-1)

2-Given preceding comments, m is often set to the largest prime number less than 2b. The most commonly used modulus is the Mersenne prime $2^{31}-1$ .

maximum period → m = prime, a = primitive element modulo m

Structure in the Generated Numbers

The idea behind the generated random number is that the numbers should be really random. That means that the numbers should appear to be distributionally independent of each other; that is, the serial correlations should be small. How bad the structure of a sequence is (that is, how much this situation causes the output to appear nonrandom) depends on the structure of the lattice.

Consider the output of the generator with m =31 and a =3 that begins with $x_{0}$ =19. Plot the successive pairs

                               (27, 19), (19, 26), (26, 16)...

As it can be seen easily from the Figure 1.2, the successive pairs of random numbers lie only on three lines. The generated numbers does not seem to be random. They rather seem to be correlated. From a visual perspective we can conclude that a generator with small number of lines does not cover the space well and has a bad lattice structure.

MacLaren and Marsaglia (1965) suggest that the output stream of a linear congruential random number generator be shuffled by using another generator to permuate subsequences from the original generator.

By this way the period can be increased and the shuffling of the output can also break up the bad lattice structure.

Bays-Durham Shuffling of Uniform Deviates

Bays and Durham suggest using a single generator to fill a table of length k and then using a single stream to select a number from the table and replenish the table. After initializing a table T to contain $x_{1},x_{2},...,x_{k},$ set i = k+1 and generate $x_{i}$ to use as an index to the table. Then update the table with $x_{i+1}$ .

For example, with the generator used in Figure 1.3, which yielded the sequence

             27, 19, 26, 16, 17, 20, 29, 25, 13, 8, 24, 10, 30, 28, 22, 4, 12, 5, 15,
                           14, 11, 2, 6, 18, 23, 7, 21, 1, 3, 9,

we select k = 8, and initialize the table as

                                 27, 19, 26, 16, 17, 20, 29, 25.

We then use the next number, 13, as the first value in the output stream, and also to form a random index into the table. If we form the index as 13 mod8 + 1, we get the sixth tabular value, 20, as the second number in the output stream. We generate the next number in the original stream, 8, and put it in the table, so we now have the table

                                 27, 19, 26, 16, 17, 8, 29, 25.

Now we use 20 as the index to the table and get the fifth tabular value, 17, as the third number in the output stream. By continuing in this manner to yield 10,000 deviates, and plotting the successive pairs, we get Figure 1.6.

Tausworthe Generators

Tausworthe (1965) introduced a generator based on a sequence of 0`s and 1`s generated by a recurrence of the form

$b_{i}\equiv (a_{p}b_{i-p}+a_{p-1}b_{i-p+1}+...+a_{1}b_{i-1}){\textit {mod}}2$

where all variables take on values of either 0 or 1.

For computational efficiency, most of the a`s in the equation set to be zero. Then we get,

                                 $b_{i}\equiv (b_{i-p}+b_{i-p+q}){\textit {mod}}2$

After this recurrence has been evaluated a sufficient number of times, say l, the l-tuple of a`s is interpreted as a number in base 2. This is referred to as an l-wise decimation of the sequence of a`s.

As an example take a primitive trinomial modulo 2,

                                         x4 + x + 1,

and begin with the bit sequence

                                         1, 0, 1, 0.

For this polynomial, p = 4 and q = 3 in the recurrence. Operating the generator, we obtain

                          1, 1, 0, 0,1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0,

at which point the sequence repeats. A 4-wise decimation yields the numbers

                                        12, 8, 15, 5, …

As with the linear congruential generators, different values of the c`s and even the starting values of the a`s can yield either good or bad generators.

Goodness-of-Fit Tests

In this part, I will introduce you the basic Goodness-of-Fit Tests, with which we can evaluate the sequence of random number we created using the above mentioned methods.

Chi-Squared Goodness-of-Fit Tests

             $\chi ^{2}=\sum _{i=1}^{k}{\frac {(o_{i}-e_{i})^{2}}{e_{i}}}$

→ The null hypothesis: A random variable has a uniform (0,1) distribution

→ Count the number of observation in each of the ten intervals

                      (0, 0.1], (0.1, 0.2], ..., (0.9, 1.0)

→ Compare those counts with the expected numbers.

→ If the observed numbers are significantly different from the expected numbers, we have reason to reject the null hypothesis.

The Kolmogorov-Smirnov Test

This test compares the empirical cumulative distribution function (c.d.f) ${\hat {F}}(.)$ with a theoretical c.d.f F(.).

→ H0: F(x) = x, 0 _ x < 1

→ Rank the observation so that $R^{(1)}\leq R^{(2)}...\leq R^{n}$

${\hat {F}}(x)={\frac {i}{n}}$ ,and $R^{(n+1)}\equiv 1$

→ where $R^{(0)}\equiv 0$ and $R^{(n+1)}\equiv 1$ .

→ The test statistic measures the size of the largest difference between these two:

$D_{n}=max_{0\leq x<1}$ $\left|{\hat {F}}(x)-F(x)\right|$

Linear Dependence Test

→One way to test for dependencies between numbers in a sequence is to restrict such examination to linear dependence between observations which are separated by k numbers.

→ Given a realization of n random numbers $R_{0},R_{1},\ldots ,R_{n}$ the sample covariance of lag k is

$C_{k}=(n-k)^{-1}\sum _{i=1}^{n-k}(R_{i}-{\frac {1}{2}})(R_{i+k}-{\frac {1}{2}}).$

→ Under randomness $E[C_{k}]=0$ .