GRADE guidelines: 7. Rating the quality of evidence--inconsistency

Gordon H Guyatt; Andrew D Oxman; Regina Kunz; James Woodcock; Jan Brozek; Mark Helfand; Pablo Alonso-Coello; Paul Glasziou; Roman Jaeschke; Elie A Akl; Susan Norris; Gunn Vist; Philipp Dahm; Vijay K Shukla; Julian Higgins; Yngve Falck-Ytter; Holger J Schünemann; GRADE Working Group

doi:10.1016/j.jclinepi.2011.03.017

GRADE guidelines: 7. Rating the quality of evidence--inconsistency

J Clin Epidemiol. 2011 Dec;64(12):1294-302. doi: 10.1016/j.jclinepi.2011.03.017. Epub 2011 Jul 31.

Authors

Gordon H Guyatt¹, Andrew D Oxman, Regina Kunz, James Woodcock, Jan Brozek, Mark Helfand, Pablo Alonso-Coello, Paul Glasziou, Roman Jaeschke, Elie A Akl, Susan Norris, Gunn Vist, Philipp Dahm, Vijay K Shukla, Julian Higgins, Yngve Falck-Ytter, Holger J Schünemann; GRADE Working Group

Affiliation

¹ Department of Clinical Epidemiology and Biostatistics, McMaster University, 1200 Main Street, West Hamilton, Ontario L8N 3Z5, Canada. guyatt@mcmaster.ca

PMID: 21803546
DOI: 10.1016/j.jclinepi.2011.03.017

Abstract

This article deals with inconsistency of relative (rather than absolute) treatment effects in binary/dichotomous outcomes. A body of evidence is not rated up in quality if studies yield consistent results, but may be rated down in quality if inconsistent. Criteria for evaluating consistency include similarity of point estimates, extent of overlap of confidence intervals, and statistical criteria including tests of heterogeneity and I(2). To explore heterogeneity, systematic review authors should generate and test a small number of a priori hypotheses related to patients, interventions, outcomes, and methodology. When inconsistency is large and unexplained, rating down quality for inconsistency is appropriate, particularly if some studies suggest substantial benefit, and others no effect or harm (rather than only large vs. small effects). Apparent subgroup effects may be spurious. Credibility is increased if subgroup effects are based on a small number of a priori hypotheses with a specified direction; subgroup comparisons come from within rather than between studies; tests of interaction generate low P-values; and have a biological rationale.

MeSH terms

Confidence Intervals*
Evidence-Based Medicine / standards*
Humans
Observer Variation*
Practice Guidelines as Topic*
Randomized Controlled Trials as Topic / standards
Research Design
Sample Size

Grants and funding

MC_U105285807/MRC_/Medical Research Council/United Kingdom