General Genetics/Population Genetics

Population Genetics is the field of genetics which studies allele distributions and genetic variation in populations. Population geneticists study the processes of mutation, migration, natural selection and genetic drift on populations, and in doing so are studying evolution as it occurs.

TODO

Editor's note
Chapter or book approach: The approach of this chapter / book is to work through a series of models. The first model will be the Hardy-Weinberg Model, and then progressively, the models will move away from the premises of Hardy-Weinberg.

Foundations of Population GeneticsEdit

Templeton states that the three premises of population genetics are the same premises for population genetics:

  1. Templates/DNA can replicate
  2. Templates/DNA can mutate and recombine
  3. Phenotypes emerge from the interaction of templates/DNA and environment.

Replication of PopulationsEdit

There are three major properties that a population must maintain with replication:

  1. They are composed of reproducing individuals
  2. They are distributed over space and time
  3. They host a population of genes

The first property indicates that individuals of the population must reproduce to keep the population stable. This is necessary because individuals breakdown over time do to the introduction of entropy and inability for an individual to continuously remove the entropy added by the environment. Thus to maintain the population, individuals must pass down their DNA, or organizational encoding, to the next generation. Through continuous reproduction, a population can be maintained over a much longer time than the individuals that comprise it. In addition, the continuous reproduction of over time enables for the population to have properties and components of its own.

The second property is that a population is distributed over a space. Populations can exists as:

  • small isolated groups
  • a collection of groups with a varied amount of genetic exchange
  • a large interbreeding population that exists over a vast space

In general though the population can be divided into a primary group that can be considered as interbreeding and a secondary group that mates occasionally with the primary group. It is this primary group that population geneticists generally study, as it is generally stable. They define the group as a group of interbreeding individuals that share a common system of mating. The secondary group is generally ignored, and treated as noise in the system, unless it is having a major effect on the primary group.

The third property that must be maintained with reproduction is the population's gene pool. The gene pool is the collection of all the genes, organizational templates, in the population that can be used to create new individuals. By studying this gene pool, geneticists can determine the frequency of alleles, and or groups of alleles in the population and how they are changing over time. From the patterns of result that are obtained, geneticists then can start to understand what forces are acting on the population.

Template MutationsEdit

Change is a requirement of evolution and one method of introducing change is through modification of the templates used sustain the population. In the case of living life, these templates are genes.

Sources of Mutation:

  • Insertions
  • Deletions
  • Single Nucleotide Substitutions (sometimes changing the protein sequences and sometimes not)
  • Transpositions
  • Duplications

An allele is an alternate form of an template. In the case of biological systems, an allele is a form of a gene. Zooming further out, a version of a region of templates is called a haplotype. Biological systems would call this a sequence of nucleotides, while a in a computer system, this would be a sequence of linked objects.

Modeling EvolutionEdit

Initially we are going to consider populations with genetic architectures of two loci per template.

Hardy-Weinberg ModelEdit

The genetic architecture of the Hardy Weinberg model is one locus, two allele model (Templeton, p. 35).

Hardy-Weinberg EquilibriumEdit

If a population has no forces of evolution acting upon it is in Hardy Weinberg Equilibrium. Quantitatively it says that if the allele proportions of two alleles A and a are denoted p and q then the genotype proportions will be such that the homozygote AA will be of proportion p2, the heterozygote Aa will have proportion 2pq and the homozygote aa will be of proportion q2.

Testing for Hardy-WeinbergEdit

To do this test you do what is known as a Chi Square test (\chi^2). Where:

 \chi ^2 = \sum \frac{(observed - expected)^2}{expected}

and the degrees of freedom is no. of Genotypes - no. of Alleles if χ 2 < table value then we accept that our population is in Hardy-Weinberg.

...note that we don't actually take a square, it is there out of tradition.

Concise Chisquare TableEdit

Degrees of Freedom 1 2 3
χ 2 Value 3.84 5.99 7.83


ExampleEdit

If there is a population with genotype proportions AA: 0.1 Aa: 0.4 aa: 0.5

So the allele proportions are  p = 0.1 + \frac{1}{2} 0.4 = 0.3

 q = 0.5 + \frac{1}{2} 0.4 = 0.7

So our expected values are AA: 0.09 Aa: 0.42 aa: 0.49.

 \chi ^2 = \sum \frac{(observed - expected)^2}{expected}
 \chi ^2 = \frac{(0.1-0.09)^2}{0.09} + \frac{(0.4 - 0.42)^2}{0.42}+ \frac{(0.5 - 0.49)^2}{0.49}
 \chi ^2 = 0.002 < 3.84

Therefore we can accept the hypothesis is in Hardy-Weinberg Equilibrium and that there are no forces of equilibrium on these alleles.

Two Autosomal Loci, Two Allele ModelEdit

Extending the H.W. model to two autosomal locus model with two alleles.

For the purpose of this discussion, the first locus will have alleles A and a and the second locus will have B and b. From this we can get the following gamete types and their frequencies through recombination:

Gamate Frequency
AB FreqAB
Ab FreqAb
aB FreqaB
ab Freqab
Sum 1

A population producing the above four gamates can produce the following genotypes:

Gamates AB Ab aB ab
AB FreqAB . FreqAB FreqAB . FreqAb FreqAB . FreqaB FreqAB . Freqab
Ab FreqAb . FreqAB FreqAb . FreqAb FreqAb . FreqaB FreqAb . Freqab
aB FreqaB . FreqAB FreqaB . FreqAb FreqaB . FreqaB FreqaB . Freqab
ab Freqab . FreqAB Freqab . FreqAb Freqab . FreqaB Freqab . Freqab

If you analyze the table above, it can be noticed that there are only ten unique combinations. Four combinations correspond to homozygous zygotes and the remaining six are the heterozygous zygotes.

Population Genetics - Gamate Mix (4 to 10).jpg

Notice that the sum of FreqAB, FreqAb, FreqaB, and Freqab is one. This follows from the earlier model where the sum of p and q equaled one.

RecombinationEdit

(this section is used to discuss how we get term r)

Homozygous CaseEdit

AB/AB or ab/ab or aB/aB or Ab/Ab

Single Heterozygous CaseEdit

AB/Ab or AB/aB aB/ab or Ab/ab

Double Heterozygous CaseEdit

AB/ab or Ab/aB

Note that recombination is only noticeable in double heterozygotes.

Linkage DisequilibriumEdit

(this section is used to discuss how we get term D)

Linkage disequilibrium measure, \deltaEdit

Formally, if we define pairwise LD, we consider indicator variables on alleles at two loci, say I_1, I_2. We define the LD parameter \delta as:

\delta := \operatorname{cov}(I_1, I_2) = p_1 p_2 - h_{12} = h_{AB}h_{ab}-h_{Ab}h_{aB}

Here p_1, p_2 denote the marginal allele frequencies at the two loci and h_{12} denotes the haplotype frequency in the joint distribution of both alleles. Various derivatives of this parameter have been developed. In the genetic literature the wording "two alleles are in LD" usually means to imply \delta \ne 0. Contrariwise, linkage equilibrium, denotes the case \delta = 0.

Pulling Information from the ModelEdit

Is Evolution OccurringEdit

If r > 0 and D != 0 then evolution is occurring

If r = 0 or if D equals 0, then no evolution is occurring

Mutation and Disequilibrium - Normalized Linkage DisequilibriumEdit

Mating SystemsEdit

Clipboard

To do:
To be created.

SelectionEdit

Clipboard

To do:
To be created.

Absolute Fitness

Relative Fitness

Frequency before selection

Frequency after selection

Genetic DriftEdit

TODO

Editor's note
This section is in a total flux right now and is being outlined.

First experiments were done in the early 30's. German was a major scientific language before the second world war.

Genetic drift is a function of population size N.

The effects of genetic drift is inversely proportional to population size. This means as the population increases, the deviation from expected allelic frequencies will decrease. (Templeton, p. 84).

Genetic drift is non-directional.

There is no attraction to return ancestral allele frequencies.

Genetic drift is a cumulative function. Changes in allele frequencies from the previous generations are added to the changes that occur in the current generation.

Genetic drift is only occurs when there is variability.

To study genetic drift this section will create a simplified model and expand upon it. The model will just have two states, a) genetic drift is occurring and b) genetic drift is not occurring.

a->b fixation b->a mutations causing differentiable patterns b->a reintroduction of differentiable patterns

Genetic drift can be broken down into two simplified states to start. The f

ExampleEdit

In each population, 2N=32, po=qo=0.5 initially and then did 19 generations of data. (Buri, P. 1956 "Gene frequency in small populations of mutant Drosophila. Evolution 10: 367-402

(See Figure 6.3 from Hedrick, P.W. 2005, Genetics of populations, 3rd edition. Jones and Bartlett, Sudbury, MA)

Note that in this figure, variance is increasing, but mean allele frequency over populations is staying relatively the same.

Simulation of Genetic DriftEdit

Genetic Drift can be simulated using a Monte Carlo Simulation. (See Figure 6.2 in Hedrick)

The proportion of populations expected to go to fixation for a given allele is equal to the initial frequency of that allele

Only the allele frequencies are changing and the distribution of the allele frequency. The mean allele frequency over multiple replicate populations does not change due to genetic drift.

Heterzygositity or the variance of the allele frequency over the replicate population can be used to understand genetic drift.

We can calculate the number of generations necessary to reduce the reduce the heterozygostity

t=ln(x)*-2*N, where x is how much heterozgostity is left, and N is the population size.

Coalescence TheoryEdit

Clipboard

To do:
To be created.

Furthur Reading in WikipediaEdit

GlossaryEdit

Gene Pool - p. 37 and p. 38 of Templeton's book

FormulasEdit

Hardy WeinbergEdit

Example 1Edit

Homozygotes Heterozygotes Sum
Observed 85 15 100
Expected 60 40 100

The probability of observing these numbers is:


 P=\frac{N!}{(N_{homo,~observed}!)\cdot(N_{hetero}!)}\cdot freq(Homo_{expected})^{N_{homo,~observed}}\cdot freq(Hetero_{expected})^{N_{hetero,~observed}}


P=\frac{100!}{(85!)\cdot(15!)}\cdot (0.6)^{85}\cdot (0.4)^{15}

Example 2Edit

Genotype Count
Homodom 20
Hetro 50
Homorec 30

What are the observed frequencies of p and q?

p = \frac{2\cdot Homo_{dom}\cdot Hetero}{2\cdot N}

q = \frac{2\cdot Homo_{rec}\cdot Hetero}{2\cdot N}


p = \frac{2\cdot 20 \cdot 50}{2\cdot 100}

q = \frac{2\cdot 30 \cdot 50}{2\cdot 100}

What are the expected number of genotypes?

freq_{homo,~dom}=p^2 N_{homo}=freq_{homo,~dom}\cdot N
freq_{het~}=2pq N_{hetero}=freq_{hetero}\cdot N
freq_{homo,~rec}=q^2 N_{homo}=freq_{homo,~rec}\cdot N

Do the observed genotypes fit H-W expectation?

\chi^2=\sum\limits^{n}_{i=1}{}\frac{observed_i-expected_i}{expected_i}

G=2\sum\limits^{i=1}_{n}observed_i \cdot \ln{\frac{observed_i}{expected_i}}

If χ2 greater than degree of freedom, then the null hypothesis, in this case it fits H.W. can be rejected.

Example 3 - Next GenerationEdit

For an autosomal locus

p_{next~generation}=homo_{dom}\cdot 1/2 \cdot(hetero)

q_{next~generation}=1-p_{next~generation}

Example 4 - Sex Linked (1 locus, 2 allele)Edit

p_{next,~male}=p_{female}

q_{next,~male}=q_{female}

p_{next,~female}=\frac{p_{female}+p_{male}}{2}

q_{next,~female}=\frac{q_{female}+q_{male}}{2}

It is possible to solve for the amount of time it will take till an initial frequency is less than or equal to a particular threshold by using the following formula:

\frac{deviation_{threshold}}{deviation_{initial}}=(-0.5)^{t}


t=\max{}\left( \frac{\log{}\frac{deviation_{threshold}}{deviation_{initial}}}{-\log0.5}\right)


Example 5 - Inbreeding - Self PollinationEdit

Genotype Count
Homodom 20
Hetro 50
Homorec 30

Calculate F Value

p = \frac{2\cdot homo_{dom} + hetero}{2N}

q = \frac{2\cdot homo_{rec} + hetero}{2N}

Heterozygosity_{observed}=H_{o}=\frac{Hetero_{observed}}{N}


Heterozygosity_{ expected } =H_{e} = 2pq\,


F=1-\left( \frac{H_{observed}}{H_{expected}}\right)

Chi-squared then can be calculated by:

\chi^{2}=F^{2}N\,

Example 6 - Inbreeding (1 Locus / 2 alleles)Edit

Given the frequency of the dominant allele and an F value:

Allele Frequency
p 0.6
q 0.4


Heterozygosity_{expected}=H_{e}=2pq\,


Heterozygosity_{observed}=H_{o}=(1-F)\cdot H_{e}

What sample size would be necessary to detect an effect of F=0.01 at the S% significant level?

  1. Look up the X2 value for S% significant level (usually S = 5%)
  2. Use the following formula:

N=\frac{\chi^2 }{F^2}

Example 7 - Null AlleleEdit

Genotype Count
Homodom 20
Hetro 50
Homorec 30

p = \frac{2\cdot homo_{dom} + hetero}{2N}

q = \frac{2\cdot homo_{rec} + hetero}{2N}

Heterozygosity_{observed}=H_{o}=\frac{Hetero_{observed}}{N}

Heterozygosity_{ expected } =H_{e} = 2pq\,

p_{null}=\frac{H_E-H_O}{1+H_E}


p_{adjusted}=p \cdot (1-p_{null})

q_{adjusted}=q \cdot (1-p_{null})

Selection Part 1Edit

Example 1Edit

Genotypes Homodom Heterozygotes Homorec
Viability 0.7 1 0.6
Fertility 11 6 3


Genotypes Homodom Heterozygotes Homorec
Viability (v) 0.7 1 0.6
Fertility (f) 11 6 3
Absolute Fitness (W) v_{homo,~dom}\cdot f_{homo,~dom} v_{hetero}\cdot f_{hetero} v_{homo,~rec}\cdot f_{homo,~rec}
Relative Fitness (w) \frac{W_{abs,~homo,~dom}}{W_{abs,~max}} \frac{W_{abs,~hetero}}{W_{abs,~max}} \frac{W_{abs,~homo,~rec}}{W_{abs,~max}}

What are the selection coefficients?

sel_{homo,~dom}=1-w_{rel,~homo,~dom}

sel_{hetero}=1-w_{rel,~hetero}

sel_{homo,~rec}=1-w_{rel,~homo,~rec}


Example 2 - Mean Fitness CalculationEdit

Genotypes Homodom Heterozygotes Homorec
Freq 0.3 0.6 0.1
Rel. Fitness 1 0.6 0.1

What is the mean fitness of this population?

w_{mean}=w_{rel,~homo,~dom}\cdot Homo_{dom} +w_{rel,~hetero}\cdot Hetero + w_{rel,~homo,~rec} \cdot Homo_{rec}

Example 3 - Natural Selection AnalysisEdit

Genotypes Homodom Heterozygotes Homorec
Freq 0.3 0.6 0.1
Rel. Fitness 1 0.6 0.1

What is the mean fitness of this population?

w_{mean}=w_{rel,~homo,~dom}\cdot Homo_{dom} +w_{rel,~hetero}\cdot Hetero + w_{rel,~homo,~rec} \cdot Homo_{rec}

p = Homo_{dom} + \frac{Hetero}{2}

q = Homo_{rec} + \frac{Hetero}{2}

Homo_{dom,~next}=\frac{w_{rel,~homo,~dom}\cdot p^2}{w_{mean}}

Hetero_{next}=\frac{w_{rel,~homo,~dom}\cdot 2pq}{w_{mean}}

Homo_{rec,~next}=\frac{w_{rel,~homo,~rec}\cdot q^2}{w_{mean}}

p_{next} = Homo_{dom,~next} + \frac{Hetero_{next}}{2}

q_{next} = Homo_{rec,~next} + \frac{Hetero_{next}}{2}

\Delta p = p_{next} - p\,

Selection Part 2Edit

Example 1Edit

Homodom Heterozygotes Homorec
Relative Fitnessfemales 0.6 0.8 1
Relative Fitnessmales 1 0.8 0.6

What are the equilibrium frequencies for the pf and qf alleles in females considering that heterozygotes have intermediate fitness?

selection_{female}=s_{f}=1-fitness_{homo,~dom,~female}=1-w_{11,~f}

selection_{male}=s_{m}=1-fitness_{homo,~dom,~male}=1-w_{11,~m}

q_{equilibrium,~female}=\frac{s_f - 1}{s_f}+(\frac{s_m \cdot s_f - s_f - s_m + 2}{2s_f \cdot s_m})^{(1/2)}]

p_{equilibrium,~female}=1-q_{equilibrium, female}\,


What are the equilibrium frequencies for the pm and qm alleles in males considering that heterozygotes have intermediate fitness?

q_{equilibrium,~male}=\frac{1}{s_m}-(\frac{s_m \cdot s_f - s_f - s_m + 2}{2s_f \cdot s_m})^{(1/2)}]

p_{equilibrium,~male}=1-q_{equilibrium, male}\,

What range of sf values will give a stable polymorphism?

\frac{s_m}{1-s_m}>s_f>\frac{s_m}{1+s_m}

What range of sm values will give a stable polymorphism?

\frac{s_f}{1-s_f}>s_m>\frac{s_f}{1+s_f}

Example 2 - Complete Dominance w/ Frequency Dependent SelectionEdit

Homodom Heterozygotes Homorec
Relative Fitness (w) 1.3 1.3 1.7

What are the selection coefficients shomo, dom and shetero with q = 0.02?

w_{homo,~dom}=w_{hetro}=1+\frac{s_{homo,~dom}}{1-q^2}

(w_{homo,~dom}-1) \cdot (1-q^2) = s_{homo,~dom}

s_{homo,~dom}=s_{hetero} = (w_{homo}-1) \cdot (1-q^2)


w_{homo,~rec}=1+\frac{s_{homo,~rec}}{q^2}

(w_{homo,~rec}-1) \cdot (q^2) = s_{homo,~rec}

s_{homo,~rec}= (w_{homo}-1) \cdot (q^2)

Calculate the stable equilibrium frequency, qe and pe.

q_e = (\frac{s_2}{s_1+s_2})^{(1/2)}

p_e = 1-q_e\,

Calculate the mean fitness assuming HW equilibrium and using:

(a) current allele frequencies

w_{mean}=w_{homo,~dom}\cdot homo_{dom}+w_{hetero}\cdot hetero + w_{homo,~rec}\cdot 

homo_{rec}

w_{mean}=w_{homo,~dom}\cdot (p)^2+w_{hetero}\cdot 2pq+ w_{homo,~rec}\cdot 

(q)^2

(b) stable equilibrium allele frequencies

w_{mean}=w_{homo,~dom}\cdot homo_{dom}+w_{hetero}\cdot hetero + w_{homo,~rec}\cdot 

homo_{rec}

w_{mean}=w_{homo,~dom}\cdot (p_e)^2+w_{hetero}\cdot 2 p_e q_e+ w_{homo,~rec}\cdot (q_e)^2

(c) general formula

w_{mean} = 1 + s_1 + s_2

Example 3 - Self Incompatibility LocusEdit

Given initial frequencies:

(a) What are the frequencies of the S1, S2, and S3 alleles initially, and after one and two generations?

p_1=(1/2)\cdot (P_{S_{1}S_{2}}+P_{S_{1}S_{3}})

p_2=(1/2)\cdot (P_{S_{1}S_{2}}+P_{S_{2}S_{3}})

p_3=(1/2)\cdot (P_{S_{1}S_{3}}+P_{S_{2}S_{3}})

(b) After one generation?

P_{next,~S_{1}S_{2}}=(1/2)\cdot (1-P_{S_{1}S_{2}})

P_{next,~S_{1}S_{3}}=(1/2)\cdot (1-P_{S_{1}S_{3}})

P_{next,~S_{2}S_{3}}=(1/2)\cdot (1-P_{S_{2}S_{3}})


p_{next, ~1}=(1/2)\cdot (P_{next,~S_{1}S_{2}}+P_{next,~S_{1}S_{3}})

p_{next, ~2}=(1/2)\cdot (P_{next,~S_{1}S_{2}}+P_{next,~S_{2}S_{3}})

p_{next, ~3}=(1/2)\cdot (P_{next,~S_{1}S_{3}}+P_{next,~S_{2}S_{3}})

(b) After two generations?

P_{next,~S_{1}S_{2}}=(1/2)\cdot (1-P_{prev,~S_{1}S_{2}})

P_{next,~S_{1}S_{3}}=(1/2)\cdot (1-P_{prev,~S_{1}S_{3}})

P_{next,~S_{2}S_{3}}=(1/2)\cdot (1-P_{prev,~S_{2}S_{3}})


p_{next, ~1}=(1/2)\cdot (P_{next,~S_{1}S_{2}}+P_{next,~S_{1}S_{3}})

p_{next, ~2}=(1/2)\cdot (P_{next,~S_{1}S_{2}}+P_{next,~S_{2}S_{3}})

p_{next, ~3}=(1/2)\cdot (P_{next,~S_{1}S_{3}}+P_{next,~S_{2}S_{3}})


Example 4 - Fitness / OverdominanceEdit

Homodom Heterozygotes Homorec
Relative Fitness 0.4 1.0 0.8

To find the equilibrium of p:

s_{homo,~dom}=1-w_{homo,~dom}

s_{hetero,~dom}=1-w_{hetero,~dom}

s_{homo,~rec}=1-w_{homo,~rec}


p_{eq}=\frac{s_{homo,~rec}}{s_{homo,~dom}+s_{homo,~rec}}

q_{eq}=1-p_{eq}\,

Example 5 - Selection and ViabilityEdit

Genotypes Homodom Heterozygotes Homorec
Zygotic Frequencies 0.05 0.33 0.62
Viabilities 1 0.6 0.59


Genotypes Homodom Heterozygotes Homorec
Zygotic Frequencies (Z) 0.05 0.33 0.62
Viabilities (V) 1 0.6 0.59
Adult Freq After Selection \frac{Z_{homo,~dom}\cdot V_{homo,~dom}}{w_{mean}} \frac{Z_{hetero}\cdot V_{hetero}}{w_{mean}} \frac{Z_{homo,~rec}\cdot V_{homo,~rec}}{w_{mean}}

w_{mean}=Z_{homo,~dom}\cdot V_{homo,~dom}+Z_{hetero}\cdot V_{hetero}+Z_{homo,~rec}\cdot V_{homo,~rec}

p_{sel}=Homo_{dom,~sel}\cdot \frac{Hetero_{sel}}{2}

q_{sel}=Homo_{rec~sel}\cdot \frac{Hetero_{sel}}{2}

or

q_{sel}=1-p_{sel}\,

Predicted Genotype Frequencies

Homo_{dom, exp}=p_{sel}^2\,

Hetero_{exp}=p_{sel}^2 + 2p_{sel}q_{sel}\,

Homo_{rec, exp}=q_{sel}^2\,

Example 6Edit

Homodom Heterozygotes Homorec
Observed Numbers 750 1000 250

What are the absolute and relative fitness for each?


Homodom Heterozygotes Homorec
Observed Numbers 700 1000 200
Predicted Numbers 500 1000 500

w_{abs,~homo,~dom}=\frac{Homo_{dom,~obs}}{Homo_{dom,~exp}}

w_{abs,~hetero}=\frac{Hetero_{obs}}{Hetero_{exp}}

w_{abs,~homo,~rec}=\frac{Homo_{rec,~obs}}{Homo_{rec,~exp}}


w_{abs,~max} = \max(w_{abs,~homo,~dom}, w_{abs,~hetero}, w_{abs,~homo,~rec})


w_{rel,~homo,~dom}=\frac{w_{abs,~homo,~dom}}{w_{abs,~max}}

w_{rel,~hetero}=\frac{w_{abs,~hetero}}{w_{abs,~max}}

w_{rel,~homo,~rec}=\frac{w_{abs,~homo,~rec}}{w_{abs,~max}}

Last modified on 6 March 2011, at 04:48