Wednesday, April 4, 2012

Applying the EM Algorithm: Binomial Mixtures

Last month I made a post about the EM algorithm and how to estimate the confidence intervals for the parameter estimates out of the EM algorithm.  In this post, I give the code for estimating the parameters of a binomial mixture and their confidence intervals.

Consider the following problem:  We have two coins. The probability of observing a head with the first coin is $p$ and the second coin is $q$. We do not observe which coin was used.  We only observe $Y_i \in \{0,1\}$ which records whether we see a heads or tails.   Let $Z_i \in \{0,1\}$ be the missing information of which coin was used on each flip. The probability of using the first coin is $P(Z_i = 0) = \pi$ and the second coin is $P(Z_i = 1) = 1-\pi$. Then, the complete information is given by $X = (Y, Z)$.  Using the law of total probability, we know
\[ P[Y] =  P[Y | Z = 0] P[Z=0] + P[Y| Z = 1] P[Z = 1] \]
Therefore, the complete likelihood is given by
\[ L_0(\theta | X) = \prod_i [p^{y_i}(1-p)^{(1-y_i)} \pi ]^{1-z_i} [q^{y_i}(1-q)^{(1-y_i)}  (1-\pi)  ]^{z_i} \]
where $\theta = (\pi, p, q)$.  We can compute the expectation of $Z_i$ conditional on the observed information and the current parameter estimates $\theta^{(t)}$:
\[ w_i = E_{Z_i | Y_i, \theta^{(t)}} [ Z_i ] = \frac{ (p^{(t)})^{y_i}(1-(p^{(t)}))^{(1-y_i)} (\pi^{(t)}) }{ (p^{(t)})^{y_i}(1-(p^{(t)}))^{(1-y_i)} (\pi^{(t)}) + (q^{(t)})^{y_i}(1-(q^{(t)}))^{(1-y_i)} (1-(\pi^{(t)})) }  \]
Taking the gradient of $Q(\theta | \theta^{(t)}) = E_{Z_i | Y_i, \theta^{(t)}} [ L_0(\theta | X) ]$ with respect to the three parameters $\pi, p, q$ and solving for $\pi, p, q$ yield the
\begin{eqnarray*}
\hat{\pi} & = & \frac{1}{n} \sum_{i = 1}^n w_i \\
\hat{p} & = &  \frac{  \sum_{i = 1}^n w_i y_i }{  \sum_{i = 1}^n w_i} \\
\hat{q} & = &  \frac{  \sum_{i = 1}^n (1-w_i) y_i }{  \sum_{i = 1}^n (1-w_i)}
\end{eqnarray*}

The confidence intervals associated with the EM estimates can be found by calculating
\[ I(\theta) = - \frac{\partial^2 Q}{\partial \theta^2 } |_{\theta = \hat{\theta} } \]
To calculate a $100(1-\alpha)\%$ confidence interval for $\theta$, we use
\[ [\hat{\theta} - Z_{\alpha/2} (\frac{1}{\sqrt{I(\theta)}}) , \hat{\theta} + Z_{\alpha/2} (\frac{1}{\sqrt{I(\theta)}}) ] \]



Here is my R code:
## Expectation Step
estep <- function(obs,pi,p,q){
  pi_estep <- (pi*dbinom(obs,1,p)) / ( pi*dbinom(obs,1,p) + (1-pi)*dbinom(obs,1,q) )
  return(pi_estep)
}


## Maximization Step
mstep <- function(obs,e.step){
  
  # estimate pi
  pi_temp <- mean(e.step)
  
  # estimate p,q
  p_temp <- sum(obs*e.step) / sum(e.step)
  q_temp <- sum(obs*(1-e.step)) / sum(1-e.step)
  
  list(pi_temp,p_temp,q_temp)   
}



## EM Algorithm
em.algo <- function(obs,pi_inits,p_inits,q_inits,maxit=1000,tol=1e-6){
  # Initial parameter estimates
  flag <- 0
  pi_cur <- pi_inits; p_cur <- p_inits; q_cur <- q_inits
   
  # Iterate between expectation and maximization steps
  for(i in 1:maxit){
    cur <- c(pi_cur,p_cur,q_cur)
    new <- mstep(obs,estep(obs, pi_cur, p_cur, q_cur))
    pi_new <- new[[1]]; p_new <- new[[2]]; q_new <- new[[3]]
    new_step <- c(pi_new,p_new,q_new)
    
    # Stop iteration if the difference between the current and new estimates is less than a tolerance level
    if( all(abs(cur - new_step) < tol) ){ flag <- 1; break}


    # Otherwise continue iteration
    pi_cur <- pi_new; p_cur <- p_new; q_cur <- q_new
  }
  if(!flag) warning("Didn’t converge\n")
  
  list(pi_cur, p_cur, q_cur)
}



## Calculate Information matrix
Info.Mat.function <- function(obs, pi.est, p.est, q.est){
  estep.est <- estep(obs,pi.est, p.est, q.est)
  info.mat <- matrix(rep(0,9),3,3)
  info.mat[1,1] <- - sum(estep.est)/(pi.est^2)  - sum((1-estep.est))/((1-pi.est)^2) 
  info.mat[2,2] <- - sum(estep.est*obs)/(p.est^2) - sum(estep.est*(1-obs))/((1-p.est)^2)
  info.mat[3,3] <- - sum((1-estep.est)*obs)/(q.est^2) - sum((1-estep.est)*(1-obs))/((1-q.est)^2)
  return(-info.mat)
}


## Generate sample data
n <- 5000 
pi_true <- 0.90 # prob of using first coin
p_true <-  0.60 # the first coin has P(heads) = 0.60
q_true <-  0.50 # the second coin has P(heads) = 0.50
true <- c(pi_true,p_true,q_true)
u <- ifelse(runif(n)<pi_true, rbinom(n,1,p_true),rbinom(n,1,q_true))


## Set parameter estimates
pi_init = 0.70; p_init = 0.70; q_init = 0.60


## Run EM Algorithm
output <- em.algo(u,pi_init,p_init,q_init)


## Calculate Confidence Intervals
sd.out <- sqrt(diag(solve(Info.Mat.function(u,output[[1]],output[[2]],output[[3]]))))
data.frame("Truth" = true, "EM Estimate" = unlist(output), "Lower CI" = unlist(output) - qnorm(.975)*sd.out, "Upper CI" = unlist(output) + qnorm(.975)*sd.out)

Example output: 
  Truth EM.Estimate  Lower.CI  Upper.CI
1   0.9   0.8941314 0.8856034 0.9026594
2   0.6   0.6081113 0.5938014 0.6224211
3   0.5   0.4483729 0.4060064 0.4907393



4 comments:

  1. Glad to run into your post, the 3rd item on google for "binomial mixture em algorithm" and my personal favorite of the three! (-Gao :P)

    ReplyDelete
    Replies
    1. Thanks Gao! I'm glad you found it useful. :)

      Delete
  2. This comment has been removed by a blog administrator.

    ReplyDelete