Recently, she has started searching for potential thesis topics and mentioned meta-analyses were particularly interesting to her. To help her and hopefully other non-statisticans who are interested in performing a meta-analysis, I have put together a short tutorial to run a simple meta-analysis in R.
Performing a search for the words 'meta-analysis' or 'meta analysis' on CRAN and Bioconductor currently yields 34 and 6 available R packages, respectively. Some are meant for a specific type of data (e.g. genomic data such as microarrays), but in general the majority of these R packages are meant for combining summary statistics of discrete or continuous data extracted from a set of carefully selected published studies.
Before starting a meta-analysis, there are many important questions to be answered such as
- How to pick which studies to include in the meta-analysis? What are possible biases in selecting the studies?
- What effect are you interested in measuring? What data needs to be extracted from the papers?
- What type of meta-analysis should be performed?
- What software tools/packages are available to perform a meta-analysis?
- How do I interpret the results from the output of the software tool used? How do I know if the meta-analysis yielded anything statistically valid and significant?
Generally speaking there are four types of meta-analyses:
- univariate meta-analysis
- n studies comparing two treatments (e.g. case/control) with a dichotomous outcome
- multivariate meta-analysis
- n studies comparing two treatments with multiple outcomes
- meta-regression
- n studies comparing two treatments with a dichotomous outcome but can investigate the impact of additional "moderator" or explanatory variables (e.g. year of study) on the outcome
- network meta-analysis (also known as multiple treatment meta-analysis)
- n studies comparing multiple treatments with a dichotomous outcome
For example, here is a brief summary of the meta R package :
- Description: Simple package to estimate fixed-effects and random-effects models for binary and continuous data in a univariate meta-analysis. Meta-regression is also available.
- Documentation: http://cran.r-project.org/web/packages/meta/index.html
- Useful Notes: Use metabin() for binary data and metacont() for continuous data. Using continuous data, can estimate mean difference and using binary data, can estimate risk ratio, odds ratio, risk difference and arcsine difference using "sm = " argument. Try print(), summary(), forest(), funnel() and labbe(), metabias() for analyzing the results from the meta-analysis. Use metareg() for meta-regression.
Simulated data example:
Consider a univariate meta-analysis with n = 10 studies comparing two treatments (drug A and drug B) and a dichotomous outcome (e.g. death, no death). Estimate an overall odds ratio of the death in the drug A group relative to the drug B group.
In the first study, 109 individuals were in the control group who received drug A and 107 individuals in the case group who receive drug B. Out of the 109 individuals who received drug A, 14 individuals died compared to 52 individuals who received drug B. The odds ratio for study 1 is 6.42 with a 95% confidence interval of (3.26, 12.63) which is statistically significant. Below is a forest plot of all the studies used in the meta analysis.
Based on the simulated data, the odds ratio of death using drug B relative using drug A is 8.15 which is statistically signifiant because the 95% confidence interval of (6.44, 10.30) using a fixed-effects model does not contain the value 1.
For a further discussion on the differences between a fixed-effects and random-effects model, Wikipedia has a fairly easy to understand description of the differences. The main thing to understand is if your studies are considered to be "heterogeneous", then you will need to use a random effects model. Otherwise you should use a fixed effects model. The way to test which model to use is with the Cochran Q test or the $I^2$ test. In the forest plot, the $I^2$ test was performed in which the null hypothesis is there is no study heterogeneity and the fixed effects model should be used. Because the p-value (p = 0.6586) was greater than an a $\alpha$ confidence level of 0.05, we fail to reject the null hypothesis and use the fixed effects model for the meta-analysis.
For an overview of the R packages CRAN has to offer, a Task View dedicated specifically to meta-analyses is available. Another good resource is from the 2013 UseR conference.
For a further discussion on the differences between a fixed-effects and random-effects model, Wikipedia has a fairly easy to understand description of the differences. The main thing to understand is if your studies are considered to be "heterogeneous", then you will need to use a random effects model. Otherwise you should use a fixed effects model. The way to test which model to use is with the Cochran Q test or the $I^2$ test. In the forest plot, the $I^2$ test was performed in which the null hypothesis is there is no study heterogeneity and the fixed effects model should be used. Because the p-value (p = 0.6586) was greater than an a $\alpha$ confidence level of 0.05, we fail to reject the null hypothesis and use the fixed effects model for the meta-analysis.
For an overview of the R packages CRAN has to offer, a Task View dedicated specifically to meta-analyses is available. Another good resource is from the 2013 UseR conference.
Note: there are also many other software tools available outside of R which may be of interest: MetaEasy in Excel and similar functions in Stata, SPSS and SAS.