Wednesday, April 4, 2012

Applying the EM Algorithm: Binomial Mixtures

Last month I made a post about the EM algorithm and how to estimate the confidence intervals for the parameter estimates out of the EM algorithm.  In this post, I give the code for estimating the parameters of a binomial mixture and their confidence intervals.

Consider the following problem:  We have two coins. The probability of observing a head with the first coin is $p$ and the second coin is $q$. We do not observe which coin was used.  We only observe $Y_i \in \{0,1\}$ which records whether we see a heads or tails.   Let $Z_i \in \{0,1\}$ be the missing information of which coin was used on each flip. The probability of using the first coin is $P(Z_i = 0) = \pi$ and the second coin is $P(Z_i = 1) = 1-\pi$. Then, the complete information is given by $X = (Y, Z)$.  Using the law of total probability, we know
$P[Y] = P[Y | Z = 0] P[Z=0] + P[Y| Z = 1] P[Z = 1]$
Therefore, the complete likelihood is given by
$L_0(\theta | X) = \prod_i [p^{y_i}(1-p)^{(1-y_i)} \pi ]^{1-z_i} [q^{y_i}(1-q)^{(1-y_i)} (1-\pi) ]^{z_i}$
where $\theta = (\pi, p, q)$.  We can compute the expectation of $Z_i$ conditional on the observed information and the current parameter estimates $\theta^{(t)}$:
$w_i = E_{Z_i | Y_i, \theta^{(t)}} [ Z_i ] = \frac{ (p^{(t)})^{y_i}(1-(p^{(t)}))^{(1-y_i)} (\pi^{(t)}) }{ (p^{(t)})^{y_i}(1-(p^{(t)}))^{(1-y_i)} (\pi^{(t)}) + (q^{(t)})^{y_i}(1-(q^{(t)}))^{(1-y_i)} (1-(\pi^{(t)})) }$
Taking the gradient of $Q(\theta | \theta^{(t)}) = E_{Z_i | Y_i, \theta^{(t)}} [ L_0(\theta | X) ]$ with respect to the three parameters $\pi, p, q$ and solving for $\pi, p, q$ yield the
\begin{eqnarray*}
\hat{\pi} & = & \frac{1}{n} \sum_{i = 1}^n w_i \\
\hat{p} & = &  \frac{  \sum_{i = 1}^n w_i y_i }{  \sum_{i = 1}^n w_i} \\
\hat{q} & = &  \frac{  \sum_{i = 1}^n (1-w_i) y_i }{  \sum_{i = 1}^n (1-w_i)}
\end{eqnarray*}

The confidence intervals associated with the EM estimates can be found by calculating
$I(\theta) = - \frac{\partial^2 Q}{\partial \theta^2 } |_{\theta = \hat{\theta} }$
To calculate a $100(1-\alpha)\%$ confidence interval for $\theta$, we use
$[\hat{\theta} - Z_{\alpha/2} (\frac{1}{\sqrt{I(\theta)}}) , \hat{\theta} + Z_{\alpha/2} (\frac{1}{\sqrt{I(\theta)}}) ]$

Here is my R code:
## Expectation Step
estep <- function(obs,pi,p,q){
pi_estep <- (pi*dbinom(obs,1,p)) / ( pi*dbinom(obs,1,p) + (1-pi)*dbinom(obs,1,q) )
return(pi_estep)
}

## Maximization Step
mstep <- function(obs,e.step){

# estimate pi
pi_temp <- mean(e.step)

# estimate p,q
p_temp <- sum(obs*e.step) / sum(e.step)
q_temp <- sum(obs*(1-e.step)) / sum(1-e.step)

list(pi_temp,p_temp,q_temp)
}

## EM Algorithm
em.algo <- function(obs,pi_inits,p_inits,q_inits,maxit=1000,tol=1e-6){
# Initial parameter estimates
flag <- 0
pi_cur <- pi_inits; p_cur <- p_inits; q_cur <- q_inits

# Iterate between expectation and maximization steps
for(i in 1:maxit){
cur <- c(pi_cur,p_cur,q_cur)
new <- mstep(obs,estep(obs, pi_cur, p_cur, q_cur))
pi_new <- new[[1]]; p_new <- new[[2]]; q_new <- new[[3]]
new_step <- c(pi_new,p_new,q_new)

# Stop iteration if the difference between the current and new estimates is less than a tolerance level
if( all(abs(cur - new_step) < tol) ){ flag <- 1; break}

# Otherwise continue iteration
pi_cur <- pi_new; p_cur <- p_new; q_cur <- q_new
}
if(!flag) warning("Didn’t converge\n")

list(pi_cur, p_cur, q_cur)
}

## Calculate Information matrix
Info.Mat.function <- function(obs, pi.est, p.est, q.est){
estep.est <- estep(obs,pi.est, p.est, q.est)
info.mat <- matrix(rep(0,9),3,3)
info.mat[1,1] <- - sum(estep.est)/(pi.est^2)  - sum((1-estep.est))/((1-pi.est)^2)
info.mat[2,2] <- - sum(estep.est*obs)/(p.est^2) - sum(estep.est*(1-obs))/((1-p.est)^2)
info.mat[3,3] <- - sum((1-estep.est)*obs)/(q.est^2) - sum((1-estep.est)*(1-obs))/((1-q.est)^2)
return(-info.mat)
}

## Generate sample data
n <- 5000
pi_true <- 0.90 # prob of using first coin
p_true <-  0.60 # the first coin has P(heads) = 0.60
q_true <-  0.50 # the second coin has P(heads) = 0.50
true <- c(pi_true,p_true,q_true)
u <- ifelse(runif(n)<pi_true, rbinom(n,1,p_true),rbinom(n,1,q_true))

## Set parameter estimates
pi_init = 0.70; p_init = 0.70; q_init = 0.60

## Run EM Algorithm
output <- em.algo(u,pi_init,p_init,q_init)

## Calculate Confidence Intervals
sd.out <- sqrt(diag(solve(Info.Mat.function(u,output[[1]],output[[2]],output[[3]]))))
data.frame("Truth" = true, "EM Estimate" = unlist(output), "Lower CI" = unlist(output) - qnorm(.975)*sd.out, "Upper CI" = unlist(output) + qnorm(.975)*sd.out)

Example output:
Truth EM.Estimate  Lower.CI  Upper.CI
1   0.9   0.8941314 0.8856034 0.9026594
2   0.6   0.6081113 0.5938014 0.6224211
3   0.5   0.4483729 0.4060064 0.4907393

4. I believe there is an error in the definition of $Q$