Statistical [R]ecipes: June 2012

Friday, June 29, 2012

EM Algorithm: Confidence Regions

A previous post related to estimation using the EM algorithm described how to calculate a 100(1 - \alpha)$\%$ confidence intervals for $\theta$ (with an application to binomial mixtures). This blogpost will be discuss two additional topics:

1) How to calculate 100(1-$\alpha$)$\%$ confidence regions of $\theta$?

If we consider the information matrix
\[ I(\theta) = - \frac{\partial^2 Q}{\partial \theta^2 } |_{\theta = \hat{\theta} } \]
where $Q(\theta | \theta^{(t)}) = E_{Z_i | Y_i, \theta^{(t)}} [ L_0(\theta | X) ]$, then it was previously shown a $100(1-\alpha)\%$ confidence interval for $\theta$

\[ [\hat{\theta} - Z_{\alpha/2} (\frac{1}{\sqrt{I(\theta)}}) , \hat{\theta} + Z_{\alpha/2} (\frac{1}{\sqrt{I(\theta)}}) ] \]

If we consider a p-dimensional $\theta$, a 100(1-\alpha)$\%$ likelihood-ratio confidence region would be given by

\[ \{ \theta: 2(l (\hat{\theta}) - l(\theta)) \leq \chi^2_{p, \alpha} \} \]

On the other than a 100(1-\alpha)$\%$ Wald confidence region would given by

\[ \{ \theta: (\theta - \hat{\theta})^{T} V^{-1} (\theta - \hat{\theta}) \leq \chi^2_{p, \alpha} \} \]
where $V^{-1}$ is given by $- Q^{''}(\hat{\theta})$.

2) How to calculate 100(1-$\alpha$)$\%$ confidence regions of $h(\theta)$ where $\frac{\partial h(\theta)}{\partial \theta}$ exists and is non-zero?

For this question, we are interested in confidence regions of $h(\theta)$ which is a q-dimensional where $q \leq p$ Using the Delta Method and asymptotic efficiency of MLEs [see Casella and Berger (2002) Second Edition, pg 473], the variance of $h(\hat{\theta})$ is approximated by

\[ \text{Var}[h(\hat{\theta})] \approx [ \frac{\partial h(\theta)}{\partial \theta} |_{\theta = \hat{\theta} } ]^T [I(\theta) |_{\theta = \hat{\theta} } ]^{-1} [ \frac{\partial h(\theta)}{\partial \theta} |_{\theta = \hat{\theta} } ] \]

Therefore, a 100(1-\alpha)$\%$ Wald confidence region for $h(\theta)$ would given by

\[ \{ \theta: (h(\theta) - h(\hat{\theta}))^{T} V^{-1} (h(\theta) - h(\hat{\theta})) \leq \chi^2_{p, \alpha} \} \]

where

\[ V^{-1} = [\text{Var}(h(\hat{\theta}))]^{-1} \]

For p or q = 2, these confidence regions can easily be plotted using the contour2d() function in R.

Friday, June 1, 2012

Private Genetics Company Announces First Mutation Patent

Last fall when I attended the American Society of Human Genetics in Montreal, there was a buzz about the company genetics company 23andMe. With the cost of exome sequencing falling, this company was offering the opportunity for anyone to have their exome sequenced quite inexpensively. On Monday 23andMe announced it's first patent titled "Polymorphisms Related to Parkinson's Disease". They discovered the G2019S variant in the SGK1 gene and suggest it may be protective against the disease. Interestingly, they say "our patent is an important step in ensuring that we've done all we can towards successful translation of this discovery" and they "want to those discoveries to move from the realm of academic publishing to the world of impacting lives by preventing, treating or curing disease". The response from the bloggers has been quite controversial. At ASHG the idea of patenting genes or mutations was discussed in an open forum which was equally controversial. Nature posted a blog about it yesterday which caught my eye.

The main problem with this announcement is the company is advertising the idea of patenting a gene or mutation as being beneficial because it will give individuals access to their genomes. In reality, these are two separate issues. The idea of patenting a gene or mutation is beneficial for the company to make money. Giving people access to their genome is a nice thought, but as a researcher who works in the field of interpreting the variants discovered in exomes or genomes, there is not a sufficient amount of information available yet to properly interpret the variants. We are just now discovering ways to efficiently sequence genomes and only at the very beginning of interpreting them. There is this idea of a $1,000 genome, but $1 million interpretation. I'm curious to see how researchers in both academia and industry respond to this new world of patenting genes and mutations.