Tuesday, January 15, 2013

Easy Introduction to Markov Chains in R

Markov chains are an important concept in probability and many other areas of research.  In this post, I provide the basic Markov property and then a few examples including R code to give an example of how they work.  Most of the background can be found in one of my favorite books on this topic which is called Introduction to Probability Models by Sheldon Ross.

We define a stochastic process $\{X_n, n = 0,1,2, \ldots \}$ that takes on a finite or countable number of possible values.  Let the possible values be nonnegative integers (i.e. $X_n \in \mathbb{Z}_+ $).  If $X_n = i$ then the process is said to be in state $i$ at time $n$.

Now, we define the 'Markov property' in discrete time:
\[ P_{ij} = P[X_{n+1} = j | X_n = i_n, X_{n-1} = i_{n-1}, \ldots, X_0 = i_0] = P[X_{n+1} = j | X_n = i_n] \hspace{.2in} \forall i, j \in \mathbb{Z}_+ \]
Such a stochastic process is known as Markov chain.  We call $P_{ij}$ a 1-step transition probability because we moved from time $n$ to time $n+1$.  It is a first-order Markov chain because the probability of being in state $j$ at the $(n+1)$ time point only depends on the state at the the $n$th time point.  Since proabilities are nonnegative and the process must make a transition into some state, we have these properties:
  1. $P_{ij} \geq 0, \hspace{.2in} \forall i,j \geq 0$
  2. $\sum_{j \geq 0} P_{ij} = 1, \hspace{.2in} i = 0, 1, 2, \ldots$
The $n$-step transition probability $P_{ij}^n$ is defined as
\[ P_{ij}^n = P[X_{n+k} = j | X_k = i] \hspace{.2in} \forall n \geq 0, \hspace{.2in} i,j \geq 0 \]
An important set of equations called the Chapman-Kolmogorov equations allow us to compute these $n$-step transition probabilities.  It states
\[ P_{ij}^{n+m} = \sum_{k=0}^{\infty} P_{ik}^n P_{kj}^m \hspace{.5in} \forall n, m \geq 0, \hspace{.2in} \forall i, j \geq 0\]
Let's try an example.  Consider two states: 0 = rain and 1 = no rain.  Define two probabilities $\alpha$ = the probability it will rain tomorrow given it rained today and $\beta$ = the probability it will rain tomorrow given it didn't rain today.  We are interested in asking what is the probability it will rain the day after tomorrow given it rained today?

First, we must set up the transition probability matrix. Translating $\alpha$ and $\beta$ to probabilities leads to $\alpha = P_{00} = P[X_{n+1} = 0 | X_n = 0]$ and $\beta = P_{01} = P[X_{n+1} = 1 | X_n = 0]$.  Because the rows must sum to 1, this leads to the transition probability matrix:
\[ P = \left[ \begin{array}{cc}
\alpha & 1-\alpha  \\
\beta & 1- \beta \\
\end{array} \right]  \]
If we assign numbers to $\alpha$ = 0.7 and $\beta$= 0.4, then we can compute the probability it will rain the day after tomorrow (aka 2 steps) given it rained today.
\[ P^2 = \left[ \begin{array}{cc}
\alpha & 1-\alpha  \\
\beta & 1- \beta \\
\end{array} \right]  \left[ \begin{array}{cc}
\alpha & 1-\alpha  \\
\beta & 1- \beta \\
\end{array} \right]  = \left[ \begin{array}{cc}
.61 & .39 \\
.52 & .48 \\
\end{array} \right]  \]
R code:
> trans.mat <- matrix(c(0.7, 0.3, 0.4, 0.6), 2,2, byrow = TRUE)
> trans.mat %*% trans.mat
     [,1] [,2]
[1,] 0.61 0.39
[2,] 0.52 0.48

These probabilities are conditional probabilities because they depend on what happened the previous day.  We can also ask: 'What is the unconditional probability of rain at time $n$? '   This can be easily found by computing the unconditional or marginal distribution of the state at time $n$:
\[  P[X_n = j] = \sum_{i = 0}^{\infty} P[X_n = j| X_0 = i] P[X_0 = i] = \sum_{i = 0}^{\infty} P_{ij}^n \alpha_i \]
where $\alpha_i = P[X_0 = i]$, $\forall i \geq 0$ and note $\sum_{i = 0}^{\infty} \alpha_i = 1$ because we must start in some state.

Back to the example. If we let $\alpha_0 = .4$ and $\alpha_1 = 0.6$ in our rain example, then the unconditional or marginal probability it will rain in two days is
\[ P[X_2 = 0] = 0.4 P_{00}^2 + 0.6 P_{10}^2 = 0.4 (0.61) + 0.6 (0.52) = 0.566 \]
> twostep.trans <- trans.mat %*% trans.mat 
> init.dist = matrix(c(0.4, 0.6), 1, 2)
> init.dist %*% twostep.trans
      [,1]  [,2]
[1,] 0.556 0.444

There are many more fun questions to ask, but for now I hope this was a basic introduction to discrete time Markov chains. Stay tuned for a blogpost about stationary distributions and second-order Markov chains!


  1. I already one first order Markov Chain model.

    I am trying to build a second-order Markov Chain model.

    How do evaluate the Markov Chain models? i.e. Should I choose the first order model or second order model? Is there any standard?

    Can you suggest any code I can do it? (preferable in R or Matlab)

    Thank you!

  2. For the Chapman Kolmogorov equations shouldn't the P from i state to k be in n-k steps, instead of k.

  3. Hi Andres, thanks for your comment! I have corrected the equation.