Consider the following problem: We have two coins. The probability of observing a head with the first coin is $p$ and the second coin is $q$. We do not observe which coin was used. We only observe $Y_i \in \{0,1\}$ which records whether we see a heads or tails. Let $Z_i \in \{0,1\}$ be the missing information of which coin was used on each flip. The probability of using the first coin is $P(Z_i = 0) = \pi$ and the second coin is $P(Z_i = 1) = 1-\pi$. Then, the complete information is given by $X = (Y, Z)$. Using the law of total probability, we know
\[ P[Y] = P[Y | Z = 0] P[Z=0] + P[Y| Z = 1] P[Z = 1] \]
Therefore, the complete likelihood is given by
\[ L_0(\theta | X) = \prod_i [p^{y_i}(1-p)^{(1-y_i)} \pi ]^{1-z_i} [q^{y_i}(1-q)^{(1-y_i)} (1-\pi) ]^{z_i} \]
where $\theta = (\pi, p, q)$. We can compute the expectation of $Z_i$ conditional on the observed information and the current parameter estimates $\theta^{(t)}$:
\[ w_i = E_{Z_i | Y_i, \theta^{(t)}} [ Z_i ] = \frac{ (p^{(t)})^{y_i}(1-(p^{(t)}))^{(1-y_i)} (\pi^{(t)}) }{ (p^{(t)})^{y_i}(1-(p^{(t)}))^{(1-y_i)} (\pi^{(t)}) + (q^{(t)})^{y_i}(1-(q^{(t)}))^{(1-y_i)} (1-(\pi^{(t)})) } \]
Taking the gradient of $Q(\theta | \theta^{(t)}) = E_{Z_i | Y_i, \theta^{(t)}} [ L_0(\theta | X) ]$ with respect to the three parameters $\pi, p, q$ and solving for $\pi, p, q$ yield the
\begin{eqnarray*}
\hat{\pi} & = & \frac{1}{n} \sum_{i = 1}^n w_i \\
\hat{p} & = & \frac{ \sum_{i = 1}^n w_i y_i }{ \sum_{i = 1}^n w_i} \\
\hat{q} & = & \frac{ \sum_{i = 1}^n (1-w_i) y_i }{ \sum_{i = 1}^n (1-w_i)}
\end{eqnarray*}
The confidence intervals associated with the EM estimates can be found by calculating
\[ I(\theta) = - \frac{\partial^2 Q}{\partial \theta^2 } |_{\theta = \hat{\theta} } \]
To calculate a $100(1-\alpha)\%$ confidence interval for $\theta$, we use
\[ [\hat{\theta} - Z_{\alpha/2} (\frac{1}{\sqrt{I(\theta)}}) , \hat{\theta} + Z_{\alpha/2} (\frac{1}{\sqrt{I(\theta)}}) ] \]
Here is my R code:
## Expectation Step
estep <- function(obs,pi,p,q){
pi_estep <- (pi*dbinom(obs,1,p)) / ( pi*dbinom(obs,1,p) + (1-pi)*dbinom(obs,1,q) )
return(pi_estep)
}
## Maximization Step
mstep <- function(obs,e.step){
# estimate pi
pi_temp <- mean(e.step)
# estimate p,q
p_temp <- sum(obs*e.step) / sum(e.step)
q_temp <- sum(obs*(1-e.step)) / sum(1-e.step)
list(pi_temp,p_temp,q_temp)
}
## EM Algorithm
em.algo <- function(obs,pi_inits,p_inits,q_inits,maxit=1000,tol=1e-6){
# Initial parameter estimates
flag <- 0
pi_cur <- pi_inits; p_cur <- p_inits; q_cur <- q_inits
# Iterate between expectation and maximization steps
for(i in 1:maxit){
cur <- c(pi_cur,p_cur,q_cur)
new <- mstep(obs,estep(obs, pi_cur, p_cur, q_cur))
pi_new <- new[[1]]; p_new <- new[[2]]; q_new <- new[[3]]
new_step <- c(pi_new,p_new,q_new)
# Stop iteration if the difference between the current and new estimates is less than a tolerance level
if( all(abs(cur - new_step) < tol) ){ flag <- 1; break}
# Otherwise continue iteration
pi_cur <- pi_new; p_cur <- p_new; q_cur <- q_new
}
if(!flag) warning("Didn’t converge\n")
list(pi_cur, p_cur, q_cur)
}
## Calculate Information matrix
Info.Mat.function <- function(obs, pi.est, p.est, q.est){
estep.est <- estep(obs,pi.est, p.est, q.est)
info.mat <- matrix(rep(0,9),3,3)
info.mat[1,1] <- - sum(estep.est)/(pi.est^2) - sum((1-estep.est))/((1-pi.est)^2)
info.mat[2,2] <- - sum(estep.est*obs)/(p.est^2) - sum(estep.est*(1-obs))/((1-p.est)^2)
info.mat[3,3] <- - sum((1-estep.est)*obs)/(q.est^2) - sum((1-estep.est)*(1-obs))/((1-q.est)^2)
return(-info.mat)
}
## Generate sample data
n <- 5000
pi_true <- 0.90 # prob of using first coin
p_true <- 0.60 # the first coin has P(heads) = 0.60
q_true <- 0.50 # the second coin has P(heads) = 0.50
true <- c(pi_true,p_true,q_true)
u <- ifelse(runif(n)<pi_true, rbinom(n,1,p_true),rbinom(n,1,q_true))
## Set parameter estimates
pi_init = 0.70; p_init = 0.70; q_init = 0.60
## Run EM Algorithm
output <- em.algo(u,pi_init,p_init,q_init)
## Calculate Confidence Intervals
sd.out <- sqrt(diag(solve(Info.Mat.function(u,output[[1]],output[[2]],output[[3]]))))
data.frame("Truth" = true, "EM Estimate" = unlist(output), "Lower CI" = unlist(output) - qnorm(.975)*sd.out, "Upper CI" = unlist(output) + qnorm(.975)*sd.out)
Example output:
Truth EM.Estimate Lower.CI Upper.CI
1 0.9 0.8941314 0.8856034 0.9026594
2 0.6 0.6081113 0.5938014 0.6224211
3 0.5 0.4483729 0.4060064 0.4907393
Glad to run into your post, the 3rd item on google for "binomial mixture em algorithm" and my personal favorite of the three! (-Gao :P)
ReplyDeleteThanks Gao! I'm glad you found it useful. :)
DeleteThis comment has been removed by a blog administrator.
ReplyDeleteHelpful Stephanie
ReplyDeleteI believe there is an error in the definition of $Q$
ReplyDelete