Statistical [R]ecipes: February 2012

Wednesday, February 29, 2012

A Weekend Adventure in College Station

This past weekend my dad and I traveled to College Station, TX to visit my sister, Vanessa Hicks, and her boyfriend, Cory Laird. She's a first year PhD student at Texas A&M and has very little time to cook. Therefore, I offered to come help her learn a few quick healthy recipes and then cook up many more meals that I froze.

Since the Oscars were on Sunday, we also decided to catch up on a few of the nominated movies. Here we were watching the lovely Brad Pitt in Moneyball. A very good movie especially since it was about stats (and baseball). :)

Instead of posting all the pictures, I made a quick collage with Picasa of many of the dishes we bought/made.

Things we made that are pictured above include a spinach cheesy pasta and cabbage soup in the crockpot, turkey burgers, 13 rolls of sushi (yes, 13!), a louisiana gumbo (recipe from Pirate's Pantry)

We went out to eat at Blue Baker, The Lemon Wedge, and Sweet Eugene's in College Station and Bryan, TX.

Here's a list of everything else we made that is not shown in the picture above:
- lemon pie from The Lemon Wedge (delicious!)
- hummus
- salmon
- mojitos
- and green juices

I'd say our weekend was a great success!

Friday, February 24, 2012

Simulating Genetic Drift

In population genetics, the neutral theory of evolution "claims that most of DNA sequence difference between alleles within a population or between species are due to neutral mutations" [Population Genetics: A Concise Guide by John Gillespie, 2nd edition]. Under this model, genetic mutations insert genetic variation into populations and are countered by the process of genetic drift which eliminates genetic variation from populations. A very simple model of genetic drift can be simulated using the Wright-Fisher model. Below is simulation of genetic drift using the Wright-Fisher model written in R.

Consider N = 20 diploid individuals with two segregating sites A1 and A2 with probability 0.20 and 0.80, respectively. Let X be the number of A1 alleles.

## Simulate Genetic Drift (using Wright-Fisher model)
library(ggplot2)
library(reshape)

# Set up parameters
N = 20 # number of diploid individuals
N.chrom = 2*N # number of chromosomes
p = .2; q = 1-p
N.gen = 100 # number of generations
N.sim = 5 # number of simulations

# Simulation
X = array(0, dim=c(N.gen,N.sim))
X[1,] = rep(N.chrom*p,N.sim) # initialize number of A1 alleles in first generation
for(j in 1:N.sim){
for(i in 2:N.gen){
X[i,j] = rbinom(1,N.chrom,prob=X[i-1,j]/N.chrom)
}
}
X = data.frame(X/N.chrom)

# Reshape data and plot the 5 simulations
sim_data <- melt(X)
ggplot(sim_data, aes(x = rep(c(1:100), N.sim), y = value, colour = variable)) + geom_line() + opts(title = "Simulations of Genetic Drift") + xlab("Generation") + ylab("Allele Frequency") + ylim(0,1) + labs(colour = "Simulations")

We plot the frequency of the A1 allele for 100 generations for 5 different simulations given the initial allele frequency for A1 is 0.20. The plot show that 3 out of the 5 simulations resulted in the A1 allele being lost from the population and 1 out of the 5 result in the A2 allele being lost from the population. There is also one simulation which resulted in the A1 allele still in the population after 100 generations. This is an example of how genetic drift removed variation from populations.

Note: there are many other tools (e.g. Hudson, 2002, simuPOP) out there (with available software) that will simulate more complicated versions of genetic drift.

Thursday, February 23, 2012

Spring Cleaning in the office!

So, our office has become a little cluttered this year.... This used to be our awesome puzzle table:

Garrett (one of my office mates) and I decided to take the initiative of cleaning up the mess this morning. Here are some after pictures:

I also decided to clean my desk, but I forgot to take a before picture. Here's the after:

But there is still a little work left to be done :)

Saturday, February 18, 2012

Summer Institute in Statistical Genetics

If you're looking for summer opportunities in statistical genetics, please consider the Summer Institute in Statistical Genetics (SISG) at the University of Washington in Seattle.

http://www.biostat.washington.edu/suminst/sisg/general

I attended this program last summer and would highly recommend it for any student interested in statistical genetics. Scholarships for tuition and travel are available.

Friday, February 17, 2012

Loss-of-function mutations

A study by the Welcome Trust Sanger Institute and Yale University released in Science this month set out to determine on average how many genuine loss-of-function mutations do humans carry and how many genes are inactivated because of the mutations. Depending on what definition you use, humans carry ~20,000 genes. These loss-of-function mutations cause the protein to lose its structure or function. For example, one of the most common cancer genes, TP53, is called a tumor-suppressing gene because it controls the cell cycle. When TP53 is mutated, tumor cells can replicate uncontrollably because TP53 has lost its ability to control (or suppress) the cell cycle properly.

Using the three pilot phases of the 1000 Genomes Project, the researchers suggest humans carry ~100 loss-of-function (or deleterious) mutations and ~20 genes that have been inactivated (that's ~.1% of your genes)! This is such an interesting topic because up till now whenever researchers have found these loss-of-function mutations, they normally assumed it is somehow disease-causing. This is no longer the case. This news article from GenomeWeb states "as more and more apparently healthy individuals have their genomes and exomes sequenced, he added, investigators have unearthed a raft of apparent loss-of-function variants that are both intriguing and puzzling. " The article in Science is suggesting that we should expect humans to have a given number loss-of-function mutations (~100). What is still unclear is how to differentiate between the loss-of-function mutations that are disease-causing and the ones that are more benign. As personalized medicine is becoming an increasingly important topic, this type of research will be critical when whole-genome sequencing becomes cost effective.

Welcome!

Welcome to my new blog! My name is Stephanie Hicks and I'm a fifth year Ph.D. student in the Department of Statistics at Rice University in Houston, TX. Over the years as a graduate student I wanted to create a space for me to post helpful things for other students in our department, but it always seemed too specific. I'm now at a point where I would like to post not only department specific things, but also things related to research in statistics, statistical genetics, R and related to a personal hobby of mine: cooking! My hope is that this blog will be a venue for me to post about all of the above.