Systematic Reviews


What are systematic reviews?

A systematic review is a review of the evidence on a clearly formulated question that uses systematic and explicit methods to identify, select and critically appraise relevant primary research and to extract and analyse data from the studies that are included in the review. In other words, systematic reviews use explicit and rigorous methods to identify, critically appraise, include and synthesise relevant research studies. Systematic reviews differ from narrative reviews. In narrative reviews, work is described but not systematically identified, assessed for quality and synthesised. Statistical methods (meta-analysis) may be used to combine results of the studies into a summary measure. Commonly, systematic reviews are used as part of developing clincial guidelines and so you may find that some systematic reviews include recommendations that have Roman Numerals attached, indicating what level of evidence the recommendation is based on.

Systematic reviews do not equate meta-analysis. Quite often, a subject area is new and developing or lacks funding and so the evidence does not contain experimental trials which makes a meta-analysis difficult to conduct. Or there can be a large heterogenetity of study results which means that a meta-analysis is not possible. Results of systematic reviews can be reported qualitatively. The key is that the review is conducted systematically and therefore is replicable (i.e. if someone else conducted the review using your protocol, they could come up with the same results as you).

Aims of Systematic Reviews

They seek to minimise bias by using a replicable, scientific and transparent approach.

  1. They seek to summarise results of otherwise unmanageable quantities of research. They also combine studies which gives more statistical power.
  2. They do not reflect the views of 'experts' as they generate balanced inferences based on a collation and analysis of the available evidence.
  3. They establish whether scientific findings are consistent and can be generalised across populations or whether findings vary significantly by particular subsets.


  1. Unbiased review of existing evidence.
  2. Pooling data increases precision.  By combining data, SRs improve the ability to study the consistency of results.  Sometimes studies are too few or small to detect important effects so by combining studies, you improve the statistical power and thereby will be able to detect these effects.
  3. By combining studies, you see similar effects across settings and designs thereby providing evidence of robustness and transferability of the results to other settings.  Where there is variation, you will be able to examine the reasons for this variation.
  4. May highlight key design issues, especially power and comparator.
  5. If using systematic reviews to research the need to do a RCT, it may show that no further research are needed and therefore reduce resource wastage.  It is essential to do before designing an RCT.


  1. Methods wise, improving the power can allow small biases to result in an apparent effect. 
  2. From a user point of view, sometimes SRs only tell you whether interventions are effective or not and the size of effect and this is sometimes not that useful, especially when you are looking at what programme should you commission and whether it would be applicable to your population.  For example,evidence on violence prevention programmes in schools is mostly from the USA and as a public health commissioner I am interested in which programme should I use and is it applicable for my population.
  3. Sometimes, there isn't enough higher level evidence to conduct a meta-anlaysis and you end up doing a qualitative SR - i.e. using thematic analysis or summarising findings instead of a meta anlaysis.  

Stages of a Systematic Review

  1. Identify the need for the review
  2. Develop a protocol & formulate review questions
  3. Conduct searches (i.e. find the relevant titles, abstracts and papers)
  4. Select studies according to selection criteria
  5. Assess study quality and bias
  6. Extract data & conduct data synthesis
  7. See if answers are applicable to your review questions
  8. Write report and disseminate findings

Appraising a Systematic Review

  1. Is the topic well defined?
  2. Was the search for papers thorough?
  3. Were the criteria for inclusion of studies clearly described and fairly applied?
  4. Was study quality assessed by blinded or independent reviewers?
  5. Was missing information sought from the original study investigators?
  6. Do the included studies seem to indicate similar effects?
  7. Were the overall findings assessed for their robustness? (Think bias, chance, confounding, real effects)
  8. Are the recommendations based firmly on the quality of evidence presented?

Types of Systematic Reviews

Systematic reviews of randomised controlled trials

  • These reviews seek to evaluate effectiveness.
  • A drawback is that there may be a lack of power of individual studies.

Systematic reviews of analytical (observational) studies

  • These reviews seek to evaluate risks (causes)
  • A common problem encountered when conducting this type of review is bias.

Systematic reviews of qualitative studies

  • Also known as Qualitative Evidence Synthesis (QES), this provides a vital supplement or extension to intervention reviews, by exploring why an intervention works (or not), in which populations and in what circumstances.  See for more details.

Question Specific Methods

Sometimes you will hear people say "This will be an intervention type of SR" or "we're going to use a diagnostic SR to find the answers for this question".  This is linked to the above section and graph.  People are basically saying that depending upon which type of question you're asking, you'll use slightly different approaches.  Common question specific methods are:

  • What are the effects of X?
  • Ideal study type - RCTs (and SRs of RCTs)

Frequency/rate (burden of illness)

  • How common is a particular condition or disease in a specified group?
  • Ideal study type - cross-sectional studies with a standardised measurement in a representative sample of people.  For a rate, the sample would need to be followed over time. 

Diagnostic Accuracy

  • How accurate is X text in predicting the true diagnosis category of a patient?
  • Ideal study type - cross-sectional in which the results of tests  on consecutively attending patients are cross-classified against disease status determined by a reference (gold) standard.

Aetioloy and risk factors

  • Are there known risk factors that increase the risk of X disease or X outcome?  A clear association between the factor and the disease needs to be first established before doing this SR.
  • Ideal study type - cohort studies 


  • Can the risk for a patient be predicted? Or based on one or more risk factors, what is the level of risk for a particular outcome to the person? 
  • Ideal study type - cohort studies

Economic Evaluation

  • Can be done from SRs of intervention studies
  • Looks at cost analysis, cost-effectiveness analyses, cost-utility analyses and cost-benefit analyses.  


A meta analysis is the analytical or statistical part of a systematic review and is used to combine and summarise results quantitatively. Data combination increases power which enables small differences to be seen and makes a more precise estimate of a treatment effect. The precision with which the size of any effect can be estimated depends upon the number of patients studied. So combining trials leads to (1) more patients and more power to detect small but clincially significant effects and (2) more precise estimates of the size of effects. Meta-analysis differs from data-pooling as each trial is weighted before combining is done. The types of studies used in meta-analysis are randomised controlled trials, analytical (observational) studies or a single multi-centre observational study. Estimates, confidence intervals and/or standard errors are usually reported.

There are statistical programmes that can help you to do the meta analysis such as RevMan (Cochrane programme that is free to download from their website).  

Forrest Plot

This is a visual representation of contribution of each study displaying estimates (symbols) and confidence intervals (displayed by length). There is also a visual representation of combined estimate and hetrogeneity of studies.


Heterogeneity is variation between the estimates over and above the natural sampling variation. A test for heterogeneity examines the null hypothesis that all studies are evaluating the same effect. It is performed at the same time as the estimates are combined (usually given as a chi-squared statistic). Overall, estimate and confidence interval can take heterogeneity into account, as in a random effects meta-analysis. A random effects model assumes that in addition to the presence of random error, differences between studies can also result from real differences between the study populations and procedures. Fixed effects meta-analysis, on the other hand, assumes that all studies are a certain random sample of one large common study and that differences between study outcomes only result from random error. Pooling is then simple as it consists of calculating a weighted average of individual study results. It is worth noting that there is no agreement amongst statisticians on whether fixed or random effects meta-analysis is better. Reasons for heterogeneity should be investigated.

When different studies have largely different results, this can be random error or heterogeneity. To test homogeneity, use chi-square or Fisher's exact test for small studies. Power of test tends to be low but offers guidance. Basic but informative method is to produce a graph in which individual outcomes are plotted together with 95% CI.

Cochran's  Q can be used to test heterogeneity.  It uses degrees of freedom (df).  If Cochran's Q is statistically significant, there is positive heterogeneity and it must be explored.  If it is not statisticially significant but df is greater than 1, it could be possible heterogeneity so explore heterogeneity. There is no heterogeneity if Cochran's Q is not satistically significant and df is less than 1.

Publication Bias

Research with statistically significant results is potentially more likely to be submitted, published or published more rapidly than work with non-significant results. Positive results are more likely to be published than negative results and false positive results are more likely to be published than false negative ones. Meta-analysis could, therefore, lead to incorrect conclusion.  Publication bias can be difficult to detect so try to avoid it by improving your searches.  If not, you can use a funnel plot to check publication bias. Also check the appendix of Cohrane's Handbook for trial registeries.  

Funnel Plot

This plot displays effect size by sample size. The plot will be funnel shaped if all studies that estimate the same quantity have been identified. It will be asymmetrical if trials are missing - usually smaller studies showing no effect. You can then estimate the number of missing studies that would change conclusion. Funnel plots can also show publication bias (if asymmetrical). Asymmetry can also be due to tendency for smaller studies to show larger treatment effects (tendency to have less rigorous methodology). Relative risks and odds ratio are plotted on a logarithmatic scale so that effects of same magnitude but in opposite directions are equidistant from 1.0 (e.g. 2 and 0.5). They are plotted against precision: 1/standard error. This emphasizes differences between larger studies.

Sensitivity Analysis

This looks at how the study quality is affected by bias, error, reporting and power. It examines results in relation to decisions made in systematic reviews, e.g. inclusion/exclusion criteria for studies, impact of each study and random and fixed effects.

Note on Hierarchy of Evidence

While randomised controlled trials top the ranks of evidence, it may not always be appropriate to restrict systematic reviews to just those of RCTs. For example, for studying risk factors, use of cohort studies rather than RCTs are more applicable. Some interventions, such as defibrillation for ventricular fibrillation, have an impact so large that observational data are sufficient to show it. In relation to rare or infrequent adverse outcomes, these would only be detected by RCTs so large that they are rarely conducted. Thus, observational methods such as postmarking surveillance of medicines are the only alternative. Sometimes, observational data provide a realistic means of assessing the long-term outcome of interventions beyond the timescale of trials.

Tips on Doing a Systematic Review

  1. Don't underestimate how long a systematic review will take. Typically, a review will take between 3 and 9 months (average 6 months), depending on how easy it is to get hold of the articles for quality assessment.
  2. Don't restrict your searches to just Medline, EMBASE and the other 'big name' databases. A 1997 study found that Medline and EMBASE only covered 6,000 of 20,000 journals. Check what databases are available to you at your library and use them all. You should also include hand searches, snowballing (i.e. using the list of references at the end of an article to generate more literature) and talk to people in the field as this will help you to locate grey literature.
  3. Spend time on your time line - look at the subject at hand as you may have to go  back into 20th century publications, depending on your questions and the gold standards that you may using as comparators, which could have been done years ago.  Equally be aware that technology advances so sometimes you have to restrict yourself to the last 10 or 20 years.
  4. Be thorough! This is a systematic review of all the evidence on a certain topic. You need to take great care in finding all relevant studies (published and unpublished), assess each study, synthesise the findings from individual studies in an unbiased way and present a balanced and impartial summary. Make use of snowballing (i.e. looking through the references of articles you have sourced) and hand searches.
  5. Spend time getting the right question - keep asking yourself why is this important to answer? And how are you going to use the results?  What outcomes do you want? Think population, intervention, comparative intervention and outcomes.
  6. Spend time refining your review questions. These are what will drive your quality assessment and help shape your data synthesis. They will also help flag up gaps in the literature.
  7. Make sure you establish a clear need for the review. When preparing a protocol, undertake a preliminary assessment of the literature. Have any reviews been done on this topic before? If so, see if you can build upon these reviews, for example, if a review was done 5 years ago, you could argue that it needs to be updated.

Further Reading

For doing a systematic review, it's hard to beat 'Undertaking Systematic Reviews of Research on Effectiveness: CRD's Guidance for those carrying out or commissioning reviews'. Download it at

On levels of evidence, see and Grimes, D.A. & Schultz, K.F. (2002) "An Overview of clinical research:the lay of the land." he Lancet, 359, 145-49.

To see systematic reviews in action, check out Cochrane Review.  They also have some useful online learning resources.  

Campbell Collaboration is another useful database of systematic reviews, especially on the social sciences. 

Glasziou P, Irwig L, Bain C & Colditz G.  2001. Systematic Reviews in Health Care: A Practical Guide Cambridge University Press: Cambridge.