Statistical [R]ecipes: October 2012

Sunday, October 28, 2012

Savory Roasted Pumpkin Seeds

With fall in full swing and Halloween around the corner, we always have lots of squash and/or pumpkins around. Whether you are carving a jack-o-lantern or making a pumpkin pie, there are leftover winter pumpkin seeds which can be turned into delicious, savory roasted snacks!

Ingredients:
- pumpkin seeds (or any winter squash seeds such as my personal favorite: acorn squash)
- olive oil
- salt, pepper
- my seasoning blend: garlic powder, paprika, chipotle powder, worchestire sauce

Recipe:
1) Rinse seeds under water and remove any squash pulp. Pat dry.

2) Add olive oil.

3) Add seasoning blend. Mix well. Spread seasoned seeds on a baking sheet.

4) Bake at 350 degrees for 15 minutes. Halfway through, move seeds around so they don't stick. Let cool and enjoy!

And in case you were curious, here is our jack-o-lantern this year: pumpkin 'pi' :)

Thursday, October 18, 2012

UCSC Genome Browser: A few useful tips

Most of my research revolves around analyzing next-generation sequencing data such as whole-exome sequencing data. As a statistician, I always appreciate finding useful bioinformatic tricks/tips from various tools that make me more efficient. Here are three examples of using the UCSC Genome Browser that I've found helpful.

Tip #1: How do you find a list of chromosome positions given a list of dbSNP identifiers? (Taken from the Guide to the UCSC Genome Browser FAQ by Nature)
Use the 'Variation and Repeats' group in Table Browser and the SNPs track of choice. Just specify the genome (e.g. Human) and assembly (e.g. hg19). For 'Region', if you want the chromosomal positions for a specific regions, click position and specify the region OR click genome and upload a list of dbSNP identifiers. Finally, choose your output format (e.g. GTF, BED) and click 'get output'.

Tip #2: How do you lift over a set of genomic coordinates from hg18 to hg19?
Use the Batch Coordinate Conversion (liftOver) tool in Utilities. Selected Original and New assemblies. Upload the original genomic coordinates (in a BED format) and submit.

Tip #3: How can you extract data from the UCSC browser and use it in R?
For this, we need to install the package rtracklayer. Here is an example on how to extract recombination rates:

library(rtracklayer)
my.session <- browserSession()
genome(my.session) <- "hg19"
recomb.rates <- getTable(ucscTableQuery(my.session, "recombRate"))

Tuesday, October 16, 2012

Perfect Tomatillo Salsa Verde

If you want to know how to make the perfect salsa verde with those weird green tomato-looking things with a peel (aka tomatillos), you should definitely try this recipe! I adapted this recipe from one that looked promising:

Ingredients:
- 1-2 lbs tomatillos
- 1/2 onion
- 1 jalapeno (seeds removed unless you like a lot of heat)
- 2 cloves of garlic
- 1/4 cup of cilantro
- a pinch of cumin, paprika, salt and pepper

Recipe for Tomatillo Salsa Verde:

1) Begin with the tomatillos. Peel the skin off and rinse the tomatillos under some water. Cut in half and place the cut-side down on a pan. Place under the broiler for 4-5 minutes until there is a nice char on the outside of the tomatillos.

2) Dump the broiled tomatillos into a food processor with onion, jalapeno, cilantro, garlic, cumin, paprika, salt and pepper. Pulse until desired chunkiness. I prefer mine to be fairly smooth.

Enjoy on top of some huevos rancheros, pulled pork tacos, fish tacos or chips.

Tuesday, October 9, 2012

Rice University's Centennial Celebration

Rice Institute President Edgar Odell Lovett opened the Rice Institute on October 12, 1912 and this week Rice University will have its centennial celebration October 10-14, 2012!

There are many, many events going on this week including the Centennial Lecture Series, homecoming, alumni reunion weekend, performances and parties. I just want to highlight one of the lectures scheduled for tomorrow:

Centennial Lecture Series #1: J. Craig Venter - Wed Oct 10 3-4pm in the Tudor Fieldhouse
Abstract: Perhaps most famous for being among the first to sequence the human genome, Dr. Venter in 2010 created the first cell with a synthetic genome. He has been listed as one of the world's most influential people by both Time magazine and the British New Statesman. Venter also is tackling energy (stating that algae show promise); last year he published a high-profile paper on the first creation of synthetic life that included then-Rice student Thomas Segall-Shapiro as an author.

Calendar of all events: Centennial calendar
Google Calendar specifically for all the graduate student events: tinyurl.com/GSAcentennial

Happy Centennial Rice University!

Tuesday, October 2, 2012

Beyond the Genome 2012: Highlights and Discussion Points

This past week I attended the Beyond the Genome conference Sept 27-29 held in Boston at Harvard Medical School this year. Genome Medicine and Genome Biology have hosted this conference for three years now and it was my first time attending. To see more details on individual talks, check out the blogposts from Oliver Hofmann tagged with btg2012 or you can search for tweets with the hashtag #BtG2012. I want to give a few highlights from the conference, but first I'll begin with a cartoon which was a very fitting way to describe the conference!

Here are a few key discussion points I took away from the conference:

Genomic or bioinformatic tools used to analyze next-generation sequencing data need to be reproducible, accessible, fast, interactive and web-based. James Taylor from Galaxy advocated for making bioinformatics more reproducible and accessible and even gave an example of in a review only 7 out of 50 papers using BWA provided all the parameters necessary to be able to reproduce their research. Gabor Marth argued bioinformatic tools shouldn't be useable by only informaticians, but also should be useable by biologists.
Bioinformaticians are re-inventing the wheel. I found two recent papers giving a review of batch annotation tools for variants obtained using next-gernation sequencing (Sifrim et al. 2012; Lyon and Wang 2012) with a list of over 19 tools published just since 2010! At the conference, several of the speakers noted in their talks, "As bioinformaticians we apparently like to keep reinventing the wheel". I would agree with this statement. In addition to genomic tools needing to be reproducible, accessible, fast, interactive and web-based, I would argue we need to create a standardized tool or format for annotating variants.
With mutations, context matters. Why do some of the "right mutations" not respond to treatment? Josh Stuart advocated for using pathway-based analyses to assess the impact of mutations. Should focus on recurrent, rare variants (most likely to be impactful), but also need to make sure the background mutation rate is correct and determine if mutations are in key domains such as DNA binding or conserved domains. Daniel MacArthur said we need to analyze variants in the context of tens of thousands of genomes because the human genomic landscape is dominated by ultra-rare variation and we need to require consistent variant calls across studies. He also gave a set of recommendations on establishing causality from the NHGRI (see picture below). Lynda Chin argued even if we know the mutation changes the function of the protein, we don't necessarily know the biological consequence. Therefore she argues we need to use a model systems approach to characterize and interpret a complete catalogue of driver mutations with functional validation.
The idea of using whole exome sequencing as a diagnostic test in clinic has arrived (especially for rare genetic disorders), but we are just now starting to deal with all the challenges that come along with it. Sharon Plon discussed her first year experience with clinical whole exome sequencing and said almost 25% of sequenced patients have "medically actionable" findings (mostly cardiac, but some cancer). Elizabeth Worthey said we need to be very careful because she explained "medically actionable" doesn't always mean "can be treated". With secondary variants and findings in clinic, Leslie Biesecker argued context matters. In his experience, the response of the patient varies depending on prior family history: with previous diagnosis - mundane response, with prior family history - mild surprise, without family history - dazed and confused. Amy McGuire gave a beautiful talk from the legal perspective on the reasons to disclose or not disclose incidental findings and included survey results from actual GWAS researchers. In terms of drug discovery, Stuart Schreiber gave many examples of relating the genomic alterations in cancer to the small-molecule sensitivity.

Finally, I thought the talk by Richard Gibbs deserved it's own paragraph. His talk focused on who should we be sequencing when it comes to human genetic diseases. Though next-generation sequencing technology has come a long way, it's still not perfect. Performing whole exome sequencing as a service can be effectively done by experts, but this is inefficient and not scalable. Automating the process of interpretation often ends with dumb results. Sequencing healthy individuals should be a low priority especially when not tying phenotypes to the control individuals (crazy!). Sequencing individuals with complex diseases is still tricky because of the debate regarding CDCV v. CDRV hypothesis. He argued that if complex diseases are caused by rare variants, then we should focus within families and not broader populations. Sequencing individuals with mendelian disease should be high on the priority list because there is a huge value in obtaining a molecular diagnosis even without a treatment. Finally recreational sequencing is low on the priority list. He has even coined the term 'narcciss-ome'! Overall, he predicts sequencing will become standard of care and soon all the excitement will be passé.

Final thoughts: The conference was filled with fantastic talks given by world-renowned speakers. The level of genetic complexity of diseases never ceases to amaze me, but at the same time seeing such great research being produced to answer some tough genetic questions is always exciting. I learned a great deal and would highly recommend Beyond the Genome to future participants!