Systematic Reviews


What are systematic reviews?

A systematic review is a review of the evidence on a clearly formulated question that uses systematic and explicit methods to identify, select and critically appraise relevant primary research and to extract and analyse data from the studies that are included in the review. In other words, systematic reviews use explicit and rigorous methods to identify, critically appraise, include and synthesise relevant research studies. Systematic reviews differ from narrative reviews. In narrative reviews, work is described but not systematically identified, assessed for quality and synthesised. Statistical methods (meta-analysis) may be used to combine results of the studies into a summary measure. Commonly, systematic reviews are used as part of developing clinical guidelines and so you may find that some systematic reviews include recommendations that have Roman Numerals attached, indicating what level of evidence the recommendation is based on.

Systematic reviews do not equate meta-analysis. Quite often, a subject area is new and developing or lacks funding and so the evidence does not contain experimental trials which makes a meta-analysis difficult to conduct. Or there can be a large heterogeneity of study results which means that a meta-analysis is not possible. Results of systematic reviews can be reported qualitatively. The key is that the review is conducted systematically and therefore is replicable (i.e. if someone else conducted the review using your protocol, they could come up with the same results as you).

Aims of Systematic Reviews

They seek to minimise bias by using a replicable, scientific and transparent approach.

  1. They seek to summarise results of otherwise unmanageable quantities of research. They also combine studies which gives more statistical power.
  2. They do not reflect the views of 'experts' as they generate balanced inferences based on a collation and analysis of the available evidence.
  3. They establish whether scientific findings are consistent and can be generalised across populations or whether findings vary significantly by particular subsets.


  1. Unbiased review of existing evidence.
  2. Pooling data increases precision.  By combining data, SRs improve the ability to study the consistency of results.  Sometimes studies are too few or small to detect important effects so by combining studies, you improve the statistical power and thereby will be able to detect these effects.
  3. By combining studies, you see similar effects across settings and designs thereby providing evidence of robustness and transferability of the results to other settings.  Where there is variation, you will be able to examine the reasons for this variation.
  4. May highlight key design issues, especially power and comparator.
  5. If using systematic reviews to research the need to do a RCT, it may show that no further research are needed and therefore reduce resource wastage.  It is essential to do before designing an RCT.


  1. Methods wise, improving the power can allow small biases to result in an apparent effect. 
  2. From a user point of view, sometimes SRs only tell you whether interventions are effective or not and the size of effect and this is sometimes not that useful, especially when you are looking at what programme should you commission and whether it would be applicable to your population.  For example, evidence on violence prevention programmes in schools is mostly from the USA and as a public health commissioner I am interested in which programme should I use and is it applicable for my population.
  3. Sometimes, there isn't enough higher level evidence to conduct a meta-analysis and you end up doing a qualitative SR - i.e. using thematic analysis or summarising findings instead of a meta analysis.  

Stages of a Systematic Review

  1. Identify the need for the review
  2. Develop a protocol & formulate review questions
  3. Conduct searches (i.e. find the relevant titles, abstracts and papers)
  4. Select studies according to selection criteria
  5. Assess study quality and bias
  6. Extract data & conduct data synthesis
  7. See if answers are applicable to your review questions
  8. Write report and disseminate findings

Pointers on Doing the Searches

Have a clear search strategy, listing all the search terms and combinations you will use (remember the idea is that if someone else undertook the same review using your protocol they will get the same results).  List the resources you will look at and please don't restrict to just one or two databases.  Be mindful too of the time frame you set.  Sometimes you will have to go back in time but if you're looking at updates since a published systematic review or you're looking at recent interventions, you may only want to look at the last five-ten years.  If you restrict your timeframe, state why in your protocol.  Don't forget the grey literature (i.e. papers at conferences, unpublished data etc).   Internet searches, ‘snow balling’ and hand searches of journals will help locate any unpublished study or other ‘grey literature’. You will also need to consider whether you will restrict languages or your studies to relevant countries.  

Pointers on Checking Quality of Studies

After scanning titles and abstracts of citations identified from the electronic database searches, you will need to assess them for eligibility.  (Hint: don't forget to record the number of hits you got when you scanned titles and then abstracts as you will need these for writing up your methodology).  Obtain full copies of those studies that definitely, or possibly, meet the pre-defined inclusion criteria.   Studies that fulfil the inclusion criteria should be retrieved for more detailed evaluation and the quality of the studies will need to be examined on the basis of methodology, bias, internal and external validity.   Methodological quality will then need be assessed.  I have in the past used  the system suggested by the Oxford Centre for Evidence-based Medicine ( http:///  I have sometimes modified this but also explicitly explained my criteria in the protocol so it is clear how I assessed quality of the methodology. The reasons for excluding any study should be documented in the final report.  

I have often found that when doing a SR in public health, much of the literature pertains to USA or Australia and I then have to assess how applicable the evidence is to the population I'm working with.  I have detailed and utilised a system similar to the following:  

"Applicability to UK population will also be considered. It is likely that the interventions will be coded in relation to:

    1. The intervention has been delivered within the UK (UK)
    2. The intervention has not been delivered within the UK but it had been delivered in a similar population and could be adapted (Non-UK+)
    3. The intervention has been delivered in a different population and may not be appropriate for adaptation (Non-UK-)"

Pointers on Synthesis  

Sometimes it is not possible to do a meta-analysis.  You may then need to a narrative synthesis (which can be thematic analysis).  However, you can still grade the impact of the interventions as illustrated below.

Coding frame for intervention impact

A=  Positive impact (at least 50% of outcome measured proved significantly positive in favour of the intervention)

B=  Possible positive impact (some, but less than 50%, of the measures proved significantly positive in favour of the intervention

C= Impact unlikely

D= Negative impact (significant findings in favour of the control group, i.e. the intervention proved harmful compared with the control)

Following the use of narrative synthesis, you may want to make recommendations for use for the interventions you found.  These too can be classified in terms of strength of evidence, for example: 

I = Strong recommendation for use, arising from strong

evidence obtained from systematic reviews with meta-analysis or randomised controlled trials.

II = Recommendation for use that is based on experimental studies.

III = Suggestion for use that is based on controlled observational studies.

Appraising a Systematic Review

  1. Is the topic well defined?
  2. Was the search for papers thorough?
  3. Were the criteria for inclusion of studies clearly described and fairly applied?
  4. Was study quality assessed by blinded or independent reviewers?
  5. Was missing information sought from the original study investigators?
  6. Do the included studies seem to indicate similar effects?
  7. Were the overall findings assessed for their robustness? (Think bias, chance, confounding, real effects)
  8. Are the recommendations based firmly on the quality of evidence presented?

Types of Systematic Reviews

Systematic reviews of randomised controlled trials

  • These reviews seek to evaluate effectiveness.
  • A drawback is that there may be a lack of power of individual studies.

Systematic reviews of analytical (observational) studies

  • These reviews seek to evaluate risks (causes)
  • A common problem encountered when conducting this type of review is bias.

Systematic reviews of qualitative studies

  • Also known as Qualitative Evidence Synthesis (QES), this provides a vital supplement or extension to intervention reviews, by exploring why an intervention works (or not), in which populations and in what circumstances.  See for more details.

Question Specific Methods

Sometimes you will hear people say "This will be an intervention type of SR" or "we're going to use a diagnostic SR to find the answers for this question".  This is linked to the above section and graph.  People are basically saying that depending upon which type of question you're asking, you'll use slightly different approaches.  Common question specific methods are:

Intervention (or Effectiveness)
  • What are the effects of X?
  • What types of intervention and programme are effective in increasing physical activity levels among children aged under five years?
  • Ideal study type - RCTs (and SRs of RCTs)

Frequency/rate (burden of illness)

  • How common is a particular condition or disease in a specified group?
  • Ideal study type - cross-sectional studies with a standardised measurement in a representative sample of people.  For a rate, the sample would need to be followed over time. 

Diagnostic Accuracy

  • How accurate is X text in predicting the true diagnosis category of a patient?
  • What are the most appropriate methods/instruments for identifying X in patients?
  • Ideal study type - cross-sectional in which the results of tests  on consecutively attending patients are cross-classified against disease status determined by a reference (gold) standard.

Aetiology and risk factors

  • Are there known risk factors that increase the risk of X disease or X outcome?  A clear association between the factor and the disease needs to be first established before doing this SR.
  • Ideal study type - cohort studies 


  • Can the risk for a patient be predicted? Or based on one or more risk factors, what is the level of risk for a particular outcome to the person? 
  • Ideal study type - cohort studies

Economic Evaluation

  • Can be done from SRs of intervention studies
  • Looks at cost analysis, cost-effectiveness analyses, cost-utility analyses and cost-benefit analyses.  


  • Views and experiences of patients/users
  • Service delivery - what is the most appropriate hospital inpatient bed capacity for elderly people with dementia? 
  • Barriers/facilitators - what are the barriers and facilitators to delivering effective services for children with learning disabilities? 
  • Implementation - what is the most effective way to help GP practices to identify informal carers?

Developing Review Questions

National Institute of Clinical Excellence (NICE) recommend that you use an appropriate framework to develop review questions.  There are a number of different types but typically PICO or SPICE are used.  

PICO stands for Population, Intervention, Comparison and Outcome and is usually recommended for questions about effectiveness.  

Example: In young children (under 5) is paracetamol more effective than ibuprofen in reducing fever?


Children aged under 5 with a fever






Reduction in fever

SPICE stands for Setting, Perspective, Intervention, Comparison and Outcome.  


A meta analysis is the analytical or statistical part of a systematic review and is used to combine and summarise results quantitatively.  Meta-analysis is essentially a statistical technique for combining a number of studies in order to obtain a larger sample size and produce a more stable estimate of an effect.  It is retrospective.  It involves taking a weighted average of the results of a number of individual studies.  

It is good for synthesising the results of many studies.  Data combination increases power which enables small differences to be seen and makes a more precise estimate of a treatment effect. The precision with which the size of any effect can be estimated depends upon the number of patients studied. Combining trials leads to (1) more patients and more power to detect small but clinically significant effects and (2) more precise estimates of the size of effects. 

Meta-analysis differs from data-pooling as each trial is weighted before combining is done. The types of studies used in meta-analysis are randomised controlled trials, analytical (observational) studies or a single multi-centre observational study. Estimates, confidence intervals and/or standard errors are usually reported.  Results of meta-analysis are only as solid as the data that go into it.  Quality is affected by publication bias and by biases and confounding factors in the studies (e.g. recall bias).  It can end up sounding authoritative but is using a combination of poor quality data.  However, meta-analysis can be used to probe quality of data, suggest possible associations and identify weaknesses.  

There are statistical programmes that can help you to do the meta analysis such as RevMan (Cochrane programme that is free to download from their website).  

Forrest Plot

This is a visual representation of contribution of each study displaying estimates (symbols) and confidence intervals (displayed by length). There is also a visual representation of combined estimate and heterogeneity of studies.


Heterogeneity is variation between the estimates over and above the natural sampling variation. A test for heterogeneity examines the null hypothesis that all studies are evaluating the same effect. It is performed at the same time as the estimates are combined (usually given as a chi-squared statistic). Overall, estimate and confidence interval can take heterogeneity into account, as in a random effects meta-analysis.   The two main models for meta-analysis are Fixed effect and Random effects.  They make different assumptions about heterogeneity. 

 A random effects model assumes that in addition to the presence of random error, (i.e. chance), differences between studies can also result from real differences between the study populations and procedures.  It assumes the treatment effect varies between studies.  It estimates the mean of the distribution of effects weighted for both within-study and between study variation (tau2, t2).  

Fixed effects meta-analysis assumes that all studies are measuring the same treatment effect.  It also assumes that all studies are a certain random sample of one large common study and that differences between study outcomes only result from random error. Pooling is then simple as it consists of calculating a weighted average of individual study results.  Fixed effects model assumes that all results would be identical if not for random sampling error.    Random -effects meta-analysis are almost identical to fixed-effect when there is no heterogeneity.  They are similar to fixed-effect but with wider confidence intervals when there is heterogeneity of the sort assumed by the random-effects model.  Where random-effects meta-analysis are different to fixed-effect meta-analysis are when results are related to study size. (RE model gives relatively more weight to smaller studies).   It is worth noting that there is no agreement among statisticians on whether fixed or random effects meta-analysis is better. Reasons for heterogeneity should be investigated.

When different studies have largely different results, this can be random error or heterogeneity. To test homogeneity, use chi-square or Fisher's exact test for small studies. Power of test tends to be low with few studies but offers guidance. It may detect clinically unimportant differences with many studies.  The narrow question of yes/no isn't useful if heterogeneity is inevitable. Basic but informative method is to produce a graph in which individual outcomes are plotted together with 95% CI.

Cochran's  Q can be used to test heterogeneity.  It uses degrees of freedom (df).  If Cochran's Q is statistically significant, there is positive heterogeneity and it must be explored.  If it is not statistically significant but df is greater than 1, it could be possible heterogeneity so explore heterogeneity. There is no heterogeneity if Cochran's Q is not statistically significant and df is less than 1.

Publication Bias

Research with statistically significant results is potentially more likely to be submitted, published or published more rapidly than work with non-significant results. Positive results are more likely to be published than negative results and false positive results are more likely to be published than false negative ones. Meta-analysis could, therefore, lead to incorrect conclusion.  Publication bias can be difficult to detect so try to avoid it by improving your searches.  If not, you can use a funnel plot to check publication bias. Also check the appendix of Cochrane's Handbook for trial registries.  

Funnel Plot

This plot displays effect size by sample size. The plot will be funnel shaped if all studies that estimate the same quantity have been identified. It will be asymmetrical if trials are missing - usually smaller studies showing no effect. You can then estimate the number of missing studies that would change conclusion. Funnel plots can also show publication bias (if asymmetrical). Asymmetry can also be due to tendency for smaller studies to show larger treatment effects (tendency to have less rigorous methodology). Relative risks and odds ratio are plotted on a logarithmic scale so that effects of same magnitude but in opposite directions are equidistant from 1.0 (e.g. 2 and 0.5). They are plotted against precision: 1/standard error. This emphasizes differences between larger studies.

Sensitivity Analysis

This looks at how the study quality is affected by bias, error, reporting and power. It examines results in relation to decisions made in systematic reviews, e.g. inclusion/exclusion criteria for studies, impact of each study and random and fixed effects.

Note on Hierarchy of Evidence

While randomised controlled trials top the ranks of evidence, it may not always be appropriate to restrict systematic reviews to just those of RCTs. For example, for studying risk factors, use of cohort studies rather than RCTs are more applicable. Some interventions, such as defibrillation for ventricular fibrillation, have an impact so large that observational data are sufficient to show it. In relation to rare or infrequent adverse outcomes, these would only be detected by RCTs so large that they are rarely conducted. Thus, observational methods such as postmarking surveillance of medicines are the only alternative. Sometimes, observational data provide a realistic means of assessing the long-term outcome of interventions beyond the timescale of trials.

Tips on Doing a Systematic Review

  1. Don't underestimate how long a systematic review will take. Typically, a review will take between 3 and 9 months (average 6 months), depending on how easy it is to get hold of the articles for quality assessment.
  2. Don't restrict your searches to just Medline, EMBASE and the other 'big name' databases. A 1997 study found that Medline and EMBASE only covered 6,000 of 20,000 journals. Check what databases are available to you at your library and use them all. You should also include hand searches, snowballing (i.e. using the list of references at the end of an article to generate more literature) and talk to people in the field as this will help you to locate grey literature.
  3. Spend time on your time line - look at the subject at hand as you may have to go  back into 20th century publications, depending on your questions and the gold standards that you may using as comparators, which could have been done years ago.  Equally be aware that technology advances so sometimes you have to restrict yourself to the last 10 or 20 years.
  4. Be thorough! This is a systematic review of all the evidence on a certain topic. You need to take great care in finding all relevant studies (published and unpublished), assess each study, synthesise the findings from individual studies in an unbiased way and present a balanced and impartial summary. Make use of snowballing (i.e. looking through the references of articles you have sourced) and hand searches.
  5. Spend time getting the right question - keep asking yourself why is this important to answer? And how are you going to use the results?  What outcomes do you want? Think population, intervention, comparative intervention and outcomes.
  6. Spend time refining your review questions. These are what will drive your quality assessment and help shape your data synthesis. They will also help flag up gaps in the literature.
  7. Make sure you establish a clear need for the review. When preparing a protocol, undertake a preliminary assessment of the literature. Have any reviews been done on this topic before? If so, see if you can build upon these reviews, for example, if a review was done 5 years ago, you could argue that it needs to be updated.

Further Reading

For doing a systematic review, it's hard to beat 'Undertaking Systematic Reviews of Research on Effectiveness: CRD's Guidance for those carrying out or commissioning reviews'. Download it at

On levels of evidence, see and Grimes, D.A. & Schultz, K.F. (2002) "An Overview of clinical research: the lay of the land." he Lancet, 359, 145-49.

To see systematic reviews in action, check out Cochrane Review.  They also have some useful online learning resources. Such as

Campbell Collaboration is another useful database of systematic reviews, especially on the social sciences. 

Glasziou P, Irwig L, Bain C & Colditz G.  2001. Systematic Reviews in Health Care: A Practical Guide Cambridge University Press: Cambridge.