Monday, October 14, 2013

Simple Cutout Sugar Cookies: Ghosts and Pumpkin Pi

Fall is in full swing and (it's hard to believe) even Halloween is just around the corner.  I'm starting to get in the spirit by getting out the Halloween decorations and making yummy treats!  Cutout sugar cookies are some of favorite to make in the holiday season because they are so versatile.  Rolling out the dough can be tricky if the dough is not right, so this has been my favorite go-to recipe for cutout sugar cookies.   As for the icing, the usual ingredients are confectioners' sugar, food coloring and water.  I found a recipe for the icing which hardens nicely and contains vanilla extra and light corn syrup (gives it an extra sheen). Here are my ghosts and pumpkin 'pi'!  and yes, that is a mouse :)


For the cookies:
- 6 tbsp of soft unsalted butter
- 1/2 cup sugar
- 1 large egg
- 1/2 tsp vanilla extract
- 1 1/2 cups all-purpose flour
- 1/2 tsp baking powder
- 1/2 tsp salt
- chocolate chips, mini chocolate chips (optional)

For the icing:
- 1 cup confectioners' sugar
- 1 tbsp light corn syrup
- 1/2 tsp vanilla extract
- food coloring


1) Preheat the oven to 350 degrees. In one bowl, mix butter, sugar, egg and vanilla.  In a second bowl, combine the flour, baking powder, salt.  Add the dry ingredients to the wet and mix gently.  You can add more flour if the dough seems too sticky, but be careful to not add too much. Wrap the dough in plastic wrap and let rest for one hour.

2) Sprinkle flour on a clean surface.  With a rolling pin, roll out the dough to a thickness of 1/4-inch.  If the dough is sticking to the rolling pin, sprinkle more flour on the dough.

2) With your favorite cookie cutters, cut out the cookies and place on a baking sheet.

3) The recipe says bake the cookies for 8-12 mins.  My cookies were definitely done at 9 mins.   Let the cookies cool completely before adding the icing.  

4) To prepare the icing mix the sugar, corn syrup, food coloring and vanilla.  If the icing still seems thick, add 1-2 drops of water a time.

5) Decorate your cookies!  My ghosts were made with plain white icing, mini chocolate chips for the eyes and a regular sized chocolate chip for the mouth.  The pumpkins begin with orange icing and are finished with a bit of pipe work.

Sunday, September 8, 2013

Boston or Bust

This past week Chris and I celebrated our first month of being official Bostonians.  For those who don't know, while finishing my PhD at Rice this spring, I accepted a postdoctoral research fellow position starting in August in Rafael Irizarry's lab in the Biostatistics & Computational Biology department at the Dana-Farber Cancer Institute with a joint appointment in the Harvard School of Public Health.  Since graduating in May, this summer has been such a whirlwind of events.  We did quite a bit of traveling in July and spent about a week driving from Texas to Massachusetts.  Some additional good news is Chris just finished his second week at his new job as a Validation Engineer at Bose!  Now that things are starting to settle down a bit, I decided to show some highlights of our trip and a few fun outings we have had exploring around Boston.

To begin, I will be upfront and say no matter how you move, moving is a pain in the butt.  The only part of moving I liked was seeing the 'Podzilla' move the POD around!  Seriously, it's one of the coolest things. :)

We began our very long drive by stopping to visit some family members in Texas & Louisiana.

Next up was Mississippi.  We only spent a short time there, but we found Elvis who sang our blues away about leaving such good friends and family behind.

In Tennessee we stopped at Corky's BBQ for some 'Memphis-style' BBQ which consisted of ribs with a very dry rub & an enormous amount of sauce on top.  I think I prefer North Carolina or Texas style BBQ, but overall it was a good experience! 

Next up was West Virgina, Maryland & Pennsylvania.  We had a wonderful time staying with some of my family in PA!  At the end of the night, we even enjoyed some s'mores under the beautiful Pennsylvania night sky. I can't remember the last time I made s'mores over a real fire pit, but it is something I wish I could do more often.

The next day, we drove through Dallas, PA where my dad grew up!  This is a picture of the house when my dad lived there:

And this was the picture the day we drove by.  So pretty!

The next day was New York, Connecticut & finally Massachusetts!  After our 1900+ mi drive, we rewarded ourselves with clam chowder the night we arrived.  We're determined to find the best new england clam chowder in Boston, so suggestions are welcomed.  

Since arriving in Boston, we've spent most of our time settling into the new apartment, but we have made a few excursions to explore the city.  We've tried a pretty famous local BBQ place called Blue Ribbon BBQ and found some very excited LSU alumni fans!  One of my favorite outings has been to visit Cape Cod. On our drive there, we saw so many windmills!   The beaches were also phenomenal.

Hopefully I'll be able to keep documenting a few of our adventures.  :) 

Friday, August 30, 2013

Comparing Methods to Detect Somatic Mutations

A nice comparison of four methods (VarScanSomaticSniper, JointSNVMix and Strelka) to detect somatic single nucleotide variants (SNVs) was published this week in Bioinformatics.  This is clearly a very hot topic in the analysis of cancer genomes because as I was reading through the article, I was thinking to myself, I can think of at least three other methods which have been published in 2013 alone. This particular article was received by Bioinformatics in March 2013, accepted in June and published first online in July, therefore the more recent methods could not have made it in into this comparison paper, but I would have loved to see the authors take on the three methods discussed in this post.

For those less familiar with the topic, you might ask why is this such an important topic?  One of the first steps in analyzing cancer sequencing data, is to accurately detect the genetic variants with some level of sensitivity and specificity.  The second step would be to understand the genetic (or epigenetic) effect followed by associating the mutational profiles with outcome.  Two genetic variants of interest in matched normal and tumor samples are called somatic and germline single nucleotide polymorphisms (SNPs) where
  • 'Somatic' refers to a variant observed in tumor sample and not observed in the normal sample and the normal allele matches the reference allele
  • 'Germline' refers to the same variant observed in both the normal and tumor samples but the normal allele does not match the reference allele
One straightforward approach is the "subtraction method" in which variants are called separately from normal and tumor samples using a tool such as GATK.  Next, you just subtract the high quality variant calls from the tumor and the high quality 'no-calls' in normal leaving the somatic calls.

The main problem with this approach is when detecting somatic variants from the next-generation sequencing data the "sensitive and accurate detection of somatic singlenucleotide variants (sSNVs) from next-generation sequencing data is still a challenging informatic problem because of differing levels of purity and numbers of subclonal populations represented in tumor samples" [Hansen et al. (2013)].  This implies the 'subtraction method' may not be the best approach to detect true somatic and true germline variants because the somatic variants called from a given method may vary depending on tumor purity or clonality.
  • Carter et al. (2012)Koboldt et al. (2012) and Hansen et al. (2013) agree "the necessary read depth coverage from the NGS data to detect a mutation accurately depends on 
    • the prevalence of the mutation in that sample which depends on 
      • levels of copy number variation, 
      • levels of stromal contamination and 
      • prevalence of mutation among subcones within the sample".
    • The authors note: "Current software does not account for the how deep coverage should be to attain a specific sensitivity and accuracy".
  • Cibulskis et al. (2013) writes "the sensitivity and specificity of any somatic mutation–calling method varies along the genome and depends on several factors, including 
    • the depth of sequence coverage in the tumor and a patient-matched normal sample, 
    • the local sequencing error rate, 
    • the allelic fraction of the mutation and
    • the evidence thresholds used to declare a mutation". 
    • The authors note: "Characterizing how sensitivity and specificity depend on these factors is necessary for designing experiments with adequate power to detect mutations at a given allelic fraction, as well as for inferring the mutation frequency along the genome, which is a key parameter for understanding mutational processes and significance analysis".
Recently, several groups have developed methods specifically to deal with this problem (detecting somatic variants using next-generation sequencing data).  Rather than calling variants separately from the normal and tumor samples, one idea is to simultaneously analyze the reads from the matched normal and tumor samples together resulting in increased confidence in the germline and somatic variants.  Some studies have shown this can lead to better discrimination of germline variants from sequencing errors.  I refer you to the comparison paper published in Bioinformatics this week for a detailed discussion on VarScanSomaticSniperJointSNVMix and Strelka. In this post, I will discuss three more recently published tools to detect somatic mutations ShimmerCake and MuTect.

Shimmer [Hansen et al. (2013)]
  • Method: Uses Fisher's exact test at each genomic position covered by both the tumor and normal sample and corrects for multiple testing.  
  • Documentation:
  • Input: samtools BAM files of the matched normal and tumor samples and reference genome
  • Output: Shimmer will produce a list of somatic variants and somatic indels.  Additionally, Shimmer will provide the output in the ANNOVAR format (and will annotate the file if ANNOVAR is locally installed on the system).
  • Reference paper shows Shimmer is superior to other methods in large amounts of stromal contamination, heterogenity and copy number alterations, but comparable accuracy in high tumor purity.  
Installation and Usage: 
Shimmer is available in a perl script on github:
$ git clone git://
$ cd Shimmer
$ perl Build.PL
$ ./Build
$ ./Build test
$ ./Build install

Basic usage:
$ [normal_bamfile.bam] [tumor_bamfile.bam] --ref reference.fa 

Cake [Rashid et al. (2013)]
  • Method: Uses set of mutations called from at least two callers (VarScan2, CaVEMan, Bambino and SAMtools mpileup). Applies post-processing filters. 
  • Documentation:
  • Input: samtools BAM files of the matched normal and tumor samples and reference genome
  • Output: Cake generates several output files/directory depending on what the user defined, but it will report the somatic variants (from each individual caller) in a VCF format. 
  • The Cake_User_Guide.pdf provides extensive documentation on how to download and install Cake, VarScan2, CaVEMan and Bambino. It also provides documentation on how to install all the prerequistes such as samtools, vcftools, tabix, and additional perl modules.  
Installation and Usage: 
Cake is available in a perl script here:
$ mkdir cake
$ cd cake
$ wget
$ tar xfz Cake_1.0.tar.gz

Basic Usage:
$ perl -s [Sample_File] -species human -callers pileup,varscan,bambino, somatic sniper -o [output_directory] -separator [,|\t] -mode [FULL|CALLING|FILTERING] 

MuTect [Cibulskis et al. (2013)]
  • Method: Pre-processes the aligned reads from the matched normal and tumor samples separately.  Calculates two log odds ratios (LODs) - see below.  Applies an additional set of post-processing filters.   
    • 1) LODs of the tumor containing a non-reference allele for a given position 
    • 2) If the position is called non-reference, the LODs of the normal sample containing the reference allele
  • Documentation:
  • Input: samtools BAM files of the matched normal and tumor samples and wiggle file containing the coverage output
  • Output: A list of somatic calls (use grep to filter the calls not considered confident by MuTect)
  • Reference paper compares MuTect to SomaticSniper, JointSNVMix and Strelka  
  • Can use Cosmic & dbSNP databases for annotation

Installation and Usage: 
To run, must have java installed.  MuTect is available from here: and should be run in the command line.

Basic usage:
$ java -Xmx2g -jar muTect-XXXX-XX-XX.jar --analysis_type MuTect --reference_sequence <reference> --input_file: normal [normal_bamfile.bam] --input_file:tumor [tumor_bamfile.bam] --out [output_file.out] --coverage_file [coverages.wig.txt]

To only see the 'confident' calls from MuTect, use grep to filter for lines not containing REJECT
$ grep -v REJECT [my.output_file.out]

Three additional tools (from 2011 and 2012):
  • deepSNV [Gerstung et al. (2012)] 
    • Based on a likelihood ratio test at each genomic position and corrects for multiple testing.
    • Uses samtools with heavy physical memory requirements
  • CaVEMan [Stephens et al. 2012]
    • Rashid et al. (2013) says this method is similar to SAMtools mpileup which "computes the genotype likelihood of nucleotide positions in tumor and normal genome sequences by use of an expectation-maximization method"
  • Bambino [Edmonson et al. 2011]
    • Bayesian scoring model (as opposed to Fisher's exact testing) which compares alternate allele frequency between tumor and normal samples
    • Similar to VarScan2 (except Bayesian)

Wednesday, June 19, 2013

Getting Started with SAMtools and Rsamtools

Have you heard of this thing called 'samtools', but not really sure what it is or how to get started?   Maybe you are a postdoc in a wet-lab that is now starting to use whole-genome and whole-exome sequencing, but there isn't an in-house bioinformatician to help sift though the computational side of the project?  If you're like me (a statistician by training and working with genomic data), learning about tools to handle next-generation sequencing data is essential.  This tutorial is meant give a little background about samtools and then provide some technical (and hopefully helpful) information on the purpose of samtools, its relation to Rsamtools and examples of how to utilize both.

Beginning with SAMtools

What does SAM mean? How is SAM different from BAM?
The Sequence Alignment/Map (SAM) format was developed to store aligned nucleotide sequence reads produced from genome aligners (e.g. BWA) in a somewhat human readable tab-deliminted text file.  Practically speaking, this format has become the standard alignment format widely used by groups such as 1000 Genomes Project.  The Binary Alignment/Map (BAM) format is a binary version of SAM  which contains the same information and simply compresses the data to save storage space (and BAM files are faster to manipulate).  In the SAM format, there is a header section (all header rows begin with '@') and an alignment section (all alignment rows contain 11 mandatory fields + optional fields).

What is SAMtools and where did SAMtools come from? 
The SAM/BAM format and SAMtools were published by Li et al. 2009. The purpose of samtools is provide a way to manipulate and interact with the aligned sequence reads in SAM/BAM format.  Samtools does not work with the entire file of aligned reads, but rather works in a stream which allows Unix commands to be used in conjunction with the SAM files.

How do you install samtools?
The easiest way to do this to check out the most recent samtools source code from the github project page.  If you're not familiar with git or github, I would suggest this wonderful tutorial by Karl Broman. The best description of github I've heard is this tutorial, Karl writes "Github is like facebook for programmers", which couldn't be more true.   So, if you've set yourself up with a github account, then simply check out the lastest samtools source code

git clone git://
cd samtools
make razip

This last line is to compile razip which is a tool to compress files and still make them randomly accessible by samtools (not normally possible with gzip files).  Finally, the last step is to make samtools accessible to all the users on the machine which I extracted from this tutorial.

sudo cp ~/current/working/directory/samtools /usr/local/samtools-v0.1.19
sudo ln -s /usr/local/samtools-v0.1.19 /usr/local/samtools

The second line makes a symbolic link to the latest version directory. When a new version of samtools becomes available (e.g. samtools-v0.1.20), you can install it in the same directory and then just update the /usr/local/samtools symbolic link.  You can check to make sure samtools was installed in your $PATH with the following command:

which samtools

If nothing happens, samtools was not properly put in your $PATH.

Examples of how to use SAMtools

There are many tutorials on samtools.  The standard format of samtools requires a command and then allows for additional options.

samtools [command] <options>

If you type just 'samtools' in the terminal window,  a list of available command (and their descriptions) will be provided.  We begin by creating a folder which will contain our SAM file and reference genome in fasta format which we used to align the reads.

mkdir MyProject
cp ~/current/working/directory/test.sam MyProject
cp ~/current/working/directory/ref.fa MyProject

samtools view
To convert from SAM to BAM format (option -b) when header is available and input is SAM format (option -S)
samtools view -Sb test.sam > test.bam

To convert from SAM to BAM format when header is not available, but ref.fa was used to map the reads
samtools view -b ref.fa test.sam > test.bam

To filter out unmapped reads from BAM files (option -F)
samtools view -bF test.bam > mappedtest.bam

samtools sort
To sort BAM file by chromosomal coordinates (default) or read names (option -n) and create a test.sorted.bam file
samtools sort test.bam test.sorted

An example of using a Unix pipe to go from the SAM format to the sorted BAM format
samtools view -Sb test.sam | samtools sort test.sorted

samtools index 
To create an indexed bam file (.bai) from the test.sorted.bam file 
samtools index test.sorted.bam

samtools faidx
To create an indexed reference genome in FASTA format (creates a .fai file)
samtools faidx ref.fa 

Examples of using SAM/BAM files 

'samtools mpileup' and 'bcftools view'
To create a Binary Call Format (bcf) file (option -u) by mapping the bases using the indexed reference genome (option -f) and call genomic variants
samtools mpileup -u -f ref.fa test.sorted.bam > test.bcf

To convert the bcf file to a human readable Variant Call Format (vcf) file
bcftools view -v -g test.bcf > test.vcf

From here, you can also check out the Integrative Genomics Viewer (IGV) to visually look at the aligned reads or try importing the BAM files into tools like Rsamtools.

Examples of how to use Rsamtools

The purpose of Rsamtools is to provide an interface between R and BAM files produced by the tools samtools, bcftools, and tabix (not discussed here). To install Rsamtools in R, use


The main purpose is to import (indexed!) BAM files into R using the scanBam() function.  The vignette for Rsamtools can be found here.  The entire BAM file should not be read into R, but rather just the portion of the genome of interest using the what and which arguments in ScanBamParam(which=which, what=what).  There are other packages which can read in BAM files include ShortRead and GenomicRanges.

To extract the header information, use scanBamHeader().  Use filterBam() to filter reads from BAM file according to the criteria defined in ScanBamParam().

Another great use of Rsamtools is to access multiple BAM files using the BamViews class in Rsamtools. This allows you to obtain metadata by 'viewing' the BAM files rather than importing each BAM individually.

Thursday, June 13, 2013

US Supreme court says NO to patenting natural human genes

Today the US Supreme Court rendered a unanimous decision in the case Association for Molecular Pathology v. Myriad Genetics: "A naturally occurring DNA segment is a product of nature and not patent eligible merely because it has been isolated, but cDNA is patent eligible because it is not naturally occurring."

In the court ruling, Justice Clarence Thomas rejects several claims including "the idea that by isolating the gene and separating it from the surrounding chromosome, Myriad had created something new".  Thomas makes the argument cDNA is patentable because it is not a "product of nature". He writes "cDNA does not present the same obstacles to patentability as naturally occurring, isolated DNA segments. Its creation results in an exons-only molecule, which is not naturally occurring. Its order of the exons may be dictated by nature, but the lab technician unquestionably creates something new when introns are removed from a DNA sequence to make cDNA".

This led to one of my favorite tweets from this afternoon:

What does this mean for a patient seeking a genetic test (e.g. the BRCA1, BRCA2 genes)?  Competing companies offering genetic tests no longer have to pay the high licensing fees and hopefully can make the cost accessible to the consumer.

As expected, this decision has major implications for the medical genetics community.  Twitter exploded with responses from various groups in the genetics and genomics community with hashtags #cDNA and #SCOTUS trending.

Interestingly, by this afternoon Myriad released a statement claiming victory: "'We believe the Court appropriately upheld our claims on cDNA, and underscored the patent eligibility of our method claims, ensuring strong intellectual property protection for our BRACAnalysis test moving forward,' said Peter D. Meldrum, president and CEO".   Of course, not everyone agrees with this statement including Myriad's stock holders it seems...

Overall, today was a wonderful day for the genetics community and for, most importantly, the patients and their families.

Monday, May 20, 2013

Graduation and Other Oddities

The last few months have been filled with excitement and new adventures.  I sincerely apologize for the lack of posts.  I promise more statistical/scientific posts will follow very soon. For now, let me try to share a few highlights from the last few months.

In March, we welcomed a new addition to our household: a 2013 Scion FR-S (color 'hot lava')!  We are still working on naming it, but currently the top two contenders are the 'Atomic Carrot' and the 'CarrotMonster'.  Here is a picture of Chris grinning ear to ear in the car lot the day we drove it home:

After a few months of getting to know the car and taking 'an extreme driving course', Chris put the car to the test at the Sports Car Club of America (SCCA) autocross. Here a group of cars getting lined up to race: 

Later in March Rice University was host to the Conference of Texas Statisticians (COTS). Keith Baggerly gave a beautiful keynote talk discussing his famous talk When is Reproducibility an Ethical Issue? Genomics, personalized medicine, and human error.  See his appearance on 60 Minutes for further details related to the story about Potti at Duke University

April was a very exciting month around here!  I ran in the Digital Run 5K Houston with a wonderful group of friends.  It was advertised as a 'nocturnal wonderland' filled with lots of LED lights, fun music and an after party!  Amazingly it was quite cold the night we ran, so here we are all bundled up with our cool glasses: 

Finally, the main reason the frequency of posts have come to a halt was the successful completion of my Ph.D. defense on April 12th!  The Hooding ceremony was held on May 10th in which my research advisor Marek Kimmel (pictured below) was able to hood me: 

Rice University's 100th Commencement ceremony was held May 11th and we were fortunate enough to have Neil deGrasse Tyson as our commencement speaker! :)  Here are a few pictures of my amazing family: 

To close, I leave you with a fun Pi Day fact to ponder (courtesy of my Dad) :)

Saturday, March 16, 2013

Healthy Shepherd's Pie

In the spirit of St. Patrick's Day, we made a delicious and lightened up shepherd's pie this weekend.  We've made them in the past, but they were always so heavy that I never really enjoyed it.  This time we had a goal of making it not only delicious, but enjoyable.  I found Ellie Krieger's Shepherd's Pie recipe which seemed to have potential.  With a little bit more searching, I found another recipe from a blogger (Kristin from Iowa) that also looked fantastic.  Overall, this recipe turned out amazing!  I would highly recommend it to anyone who's looking to have a shepherd's pie full of flavor but without all the fat.


Lightened Up Mashed Potatoes (topping)
- 2 russet potatoes
- 1 head of cauliflower
- 2 tablespoons butter, melted
- 2 cups of skim milk
- salt, pepper

Shepherd's Pie Filling (base)
- 1 lb of ground turkey
- 2 tablespoons canola oil
- 1 onion, diced
- 2 carrots (or 1 large carrot), diced
- 8 oz mushrooms, sliced
- 2 teaspoons thyme, chopped
- 3 tablespoons all-purpose flour
- 2 cups of chicken stock
- 2 tablespoons soy sauce
- 2 teaspoons worcestershire sauce
- 1/2 cup frozen peas
- salt, pepper

1) To make the lightened up mashed potatoes, use both russet potatoes and cauliflower.  Cut the potatoes into 2 in cubes and break the cauliflower into florets.

2) Add the chopped potatoes to a steamer.  Steam with the lid on for 10 mins.

3) After 10 mins, add the cauliflower and steam for another 15 mins. Should be able to stick a knife through it cleanly.

3) Add the steamed potatoes and cauliflower to a food processor.  Add melted butter and milk.  Add salt and pepper to taste.

Mix with a 'whipping blade' until smooth texture.  Set aside.

4) To make the shepherd's pie filling, pull together the following ingredients.

5) Chop onion and carrots.  Add to a saute pan and cook for 8 minutes.

6) Chop mushrooms and thyme. Add to saute pan with onions and carrots and cook for another 8 minutes or until mushrooms are soft.  Set aside cooked vegetables.

7) Brown ground turkey (~5 mins).  Add 3 flour and cook for 1 minute.

8) Add tomato paste, soy sauce, worcestershire sauce, chicken broth, salt, pepper.  Cook for 10 minutes.  Add frozen peas.

9) Pour the filling into a pan. Top with lightened up mashed potatoes topping.  Add a few tablespoons of butter on top.

10)  Bake at 375 for 25 minutes. Broil for an additional 3-5 minutes.

Suggested: pair the shepherd's pie with a guinness!  :)

Sunday, February 10, 2013

Simple Mardi Gras King Cake

.... as they say in Louisiana, laissez le bon temps roulet!  Let the good times roll!

As Mardi Gras season is in full swing, I set out to make a king cake from a kit my sister brought me from New Orleans.  It seemed pretty authentic, so I was very excited to try it out.

The kit contained the 'king cake mix', yeast, praline sugar, glaze, three packets of colored sprinkles, and of course the baby!  I added 1/3 cup of warm water, 3 eggs and 2 egg yolks, and 1 1/2 sticks of butter.

Recipe for king cake (following a kit):
1) Preheat oven to 350. Mix king cake mix, yeast, eggs, warm water and room temperature butter.  Approx 3-4 mins

2) Cover dough with plastic wrap and set in warm place for 30 minutes to rise.  Next, knead dough on a lightly floured surface.  This dough was very sticky and not easy to work with, so I kept adding more flour, probably more than I should have. This made the dough a bit drier than I would have liked.

3) Brush 2 tablespoons of melted butter over the surface and sprinkle the praline sugar topping all over. Next, start to roll up the dough like a jelly roll.  Pinch ends closed.

 4) Place dough on a cookie sheet and bring the two ends together to form a circle or oval. Let rise for 30 minutes until the dough has doubled in size.

5) Bake for 20-30 minutes until lightly browned. Remove from oven and let cool completely.  At this point put the baby somewhere under the cake.

6) To decorate, mix glaze with a bit of water until the consistency is of a thick icing.  Add decorative sprinkle or sugars.

Happy Mardi Gras!