Thursday, January 9, 2014

Easy introduction to meta-analyses in R

My incredibly intelligent younger sister is in the middle of her third year of a PhD program in clinical psychology.  While I have always considered myself to be logical, analytical and drawn to the more science and math-oriented topics, she has always been more artistic, intuitive and definitely the writer in the family.  Part of her curriculum has included several statistics courses (which she aced!) so you can imagine the statistician in me is beaming with pride!

Recently, she has started searching for potential thesis topics and mentioned meta-analyses were particularly interesting to her.  To help her and hopefully other non-statisticans who are interested in performing a meta-analysis, I have put together a short tutorial to run a simple meta-analysis in R.

Performing a search for the words 'meta-analysis' or 'meta analysis' on CRAN and Bioconductor currently yields 34 and 6 available R packages, respectively. Some are meant for a specific type of data (e.g. genomic data such as microarrays), but in general the majority of these R packages are meant for combining summary statistics of discrete or continuous data extracted from a set of carefully selected published studies.  

Before starting a meta-analysis, there are many important questions to be answered such as
  1. How to pick which studies to include in the meta-analysis? What are possible biases in selecting the studies?
  2. What effect are you interested in measuring? What data needs to be extracted from the papers? 
  3. What type of meta-analysis should be performed? 
  4. What software tools/packages are available to perform a meta-analysis? 
  5. How do I interpret the results from the output of the software tool used? How do I know if the meta-analysis yielded anything statistically valid and significant?
I want to preface this tutorial with the statement: the first two questions are extremely important and should be answered before starting any meta-analysis.  Because an entire course could focus on meta-analyses, I've limited the focus of this tutorial to discussing the last three questions: (1) basic types of meta-analyses, (2) statistical tools/packages available to perform the meta-analysis and (3) interpreting the results.   

Generally speaking there are four types of meta-analyses: 

  • univariate meta-analysis
    • n studies comparing two treatments (e.g. case/control) with a dichotomous outcome
  • multivariate meta-analysis
    • n studies comparing two treatments with multiple outcomes
  • meta-regression
    • n studies comparing two treatments with a dichotomous outcome but can investigate the impact of additional "moderator" or explanatory variables (e.g. year of study) on the outcome
  • network meta-analysis (also known as multiple treatment meta-analysis)
    • n studies comparing multiple treatments with a dichotomous outcome
All of these types of meta-analyses can easily be run in R with freely available packages such as meta, mvmeta, mvtmeta, metafor, rmeta and getmtc.   

For example, here is a brief summary of the meta R package 
  • Description: Simple package to estimate fixed-effects and random-effects models for binary and continuous data in a univariate meta-analysis. Meta-regression is also available. 
  • Documentation: http://cran.r-project.org/web/packages/meta/index.html
  • Useful Notes: Use metabin() for binary data and metacont() for continuous data.  Using continuous data, can estimate mean difference and using binary data, can estimate risk ratio, odds ratio, risk difference and arcsine difference using "sm = " argument.  Try print()summary()forest()funnel() and labbe()metabias() for analyzing the results from the meta-analysis. Use metareg() for meta-regression.
Simulated data example
Consider a univariate meta-analysis with n = 10 studies comparing two treatments (drug A and drug B) and a dichotomous outcome (e.g. death, no death).   Estimate an overall odds ratio of the death in the drug A group relative to the drug B group.  

In the first study, 109 individuals were in the control group who received drug A and 107 individuals in the case group who receive drug B.  Out of the 109 individuals who received drug A, 14 individuals died compared to 52 individuals who received drug B.  The odds ratio for study 1 is 6.42 with a 95% confidence interval of (3.26, 12.63) which is statistically significant.  Below is a forest plot of all the studies used in the meta analysis.


Based on the simulated data, the odds ratio of death using drug B relative using drug A is 8.15 which is statistically signifiant because the 95% confidence interval of (6.44, 10.30) using a fixed-effects model does not contain the value 1.

For a further discussion on the differences between a fixed-effects and random-effects model, Wikipedia has a fairly easy to understand description of the differences.  The main thing to understand is if your studies are considered to be "heterogeneous", then you will need to use a random effects model.  Otherwise you should use a fixed effects model.  The way to test which model to use is with the Cochran Q test or the $I^2$ test.  In the forest plot, the $I^2$ test was performed in which the null hypothesis is there is no study heterogeneity and the fixed effects model should be used.  Because the p-value (p = 0.6586) was greater than an a  $\alpha$ confidence level of 0.05, we fail to reject the null hypothesis and use the fixed effects model for the meta-analysis.

For an overview of the R packages CRAN has to offer, a Task View dedicated specifically to meta-analyses is available. Another good resource is from the 2013 UseR conference.  
Note: there are also many other software tools available outside of R which may be of interest: MetaEasy in Excel and similar functions in Stata, SPSS and SAS.  

1 comment:

  1. Part of her curriculum has included several statistics courses (which she aced!) so you can imagine the statistician in me is beaming with pride! fengshui

    ReplyDelete