Statistical [R]ecipes: Hardy-Weinberg Genotype Frequencies

An important principle in population genetics is called the Hardy-Weinberg principle (or H-W equilibrium or H-W law) which "describes the equilibrium state of a single locus in a randomly mating diploid population that is free of other evolutionary forces such as mutation, migration, and genetic drift" [Population Genetics: A Concise Guide by John Gillespie, 2nd edition]. This means the allele frequencies and genotype frequencies of a single locus are in equilibrium and stay constant generation to generation (in an ideal world) until outside forces act on them such as mutation, migration, genetic drift (see post on simulating genetic drift) or natural selection (in the real world).

Let's try to put some notation to all this. If we have two alleles $A$ and $a$ with genotype frequencies
\begin{eqnarray*}
P(AA) & = & u \\
P(Aa) & = & v \\
P(aa) & = & w
\end{eqnarray*}
then without any more assumptions, the allele frequencies are easily estimated:
\[ P(A) = u + 0.5v = p \hspace{2in} P(a) = w + 0.5v = q \]
If we have the allele frequencies, we need a few more assumptions to calculate the genotype frequencies. This is where the Hardy-Weinberg principle comes into play.

H-W Assumptions:

diploid organism, infinite population, discrete generations
random mating
no outside forces at play (e.g. no selection, no migration, no mutation)
equal (or unequal)* initial genotype frequencies in two sexes

When a population is in H-W equilibrium, the alleles that make up a genotype are thought of as randomly sampling alleles from the population. Thus, we can estimate genotype frequencies from allele frequencies as followed:
\begin{eqnarray*}
P(AA) & = & P(A) P(A) = p^2 \\
P(Aa) & = & 2P(A) P(a) = 2*p*q\\
P(aa) & = & P(a) P(a) = q^2
\end{eqnarray*}
*In dioecious species (individual is either male or female), H-W equilibrium can be reached in just two generations assuming unequal genotype frequencies in two sexes (will be achieved in one generation assuming equal genotype frequencies).

Rcode:
Assume equal genotype frequencies in two sexes. Show HW equilibrium in one generation.

> # Start with two alleles A, a with respective allele frequencies p.0, q.0

> p.0 = 0.2
> q.0 = 1 - p.0
>
> # After one round of mating
> # Calculate genotype frequencies (Assumes HWE)
> AA = p.0*p.0
> AA
[1] 0.04
>
> Aa = 2*p.0*q.0
> Aa
[1] 0.32
>
> aa = q.0*q.0
> aa
[1] 0.64
>
>
> # Calulate new allele frequencies (Does not assume HWE)
> p.1 = AA + 0.5*Aa
> p.1
[1] 0.2
>
> q.1 = 1 - p.1
> q.1
[1] 0.8
>
> # After two rounds of mating
> # Calculate genotype frequencies (Assumes HWE)
> AA = p.1*p.1
> AA
[1] 0.04
>
> Aa = 2*p.1*q.1
> Aa
[1] 0.32
>
> aa = q.1*q.1
> aa
[1] 0.64

Therefore, we see the genotype frequencies after one round of mating were equal to the genotype frequencies after two rounds of mating (i.e. H-W equilibrium is attained).

Now this is all nice, but the H-W assumptions are almost never met in the real world. For example, there may be small population sizes, deviations from random mating (e.g. assortative mating, inbreeding) and there more than likely outside forces at play such as mutation, migration and selection. To test for deviations from H-W equilibrium, we can use the $\chi^2$ Goodness-of-Fit test or Exact tests (better for small sample sizes) in standard software tools such as R or PLINK.

Example: If we have two alleles $A$ and $a$, we can compare the observed genotype counts with values expected under H-W equilibrium:

\begin{eqnarray*}
Genotype & Observed & Expected \\
AA & n_{AA} & np^2 \\
Aa & n_{Aa} & 2np(1-p) \\
aa & n_{aa} & n(1-p)^2
\end{eqnarray*}
where $p = P(A) = (n_{Aa} + 2n_{AA}) / 2n$

Statistical [R]ecipes

Tuesday, January 29, 2013

Hardy-Weinberg Genotype Frequencies

No comments:

Post a Comment