Department of Clinical Epidemiology and Biostatistics, McMaster University, 1200 Main Street, West Hamilton, Ontario L8N 3Z5, Canada.
J Clin Epidemiol. 2011 Dec;64(12):1294-302. doi: 10.1016/j.jclinepi.2011.03.017. Epub 2011 Jul 31.
This article deals with inconsistency of relative (rather than absolute) treatment effects in binary/dichotomous outcomes. A body of evidence is not rated up in quality if studies yield consistent results, but may be rated down in quality if inconsistent. Criteria for evaluating consistency include similarity of point estimates, extent of overlap of confidence intervals, and statistical criteria including tests of heterogeneity and I(2). To explore heterogeneity, systematic review authors should generate and test a small number of a priori hypotheses related to patients, interventions, outcomes, and methodology. When inconsistency is large and unexplained, rating down quality for inconsistency is appropriate, particularly if some studies suggest substantial benefit, and others no effect or harm (rather than only large vs. small effects). Apparent subgroup effects may be spurious. Credibility is increased if subgroup effects are based on a small number of a priori hypotheses with a specified direction; subgroup comparisons come from within rather than between studies; tests of interaction generate low P-values; and have a biological rationale.
本文探讨了二分类结局中相对(而非绝对)治疗效果的不一致性。如果研究结果一致,则不会提高证据质量,但如果结果不一致,则可能会降低质量。一致性评估标准包括点估计值的相似性、置信区间的重叠程度以及包括异质性和 I(2)检验在内的统计标准。为了探索异质性,系统评价作者应生成和检验与患者、干预措施、结局和方法相关的少量预设假设。如果不一致性较大且无法解释,则因不一致性而降低质量是合适的,特别是如果一些研究表明有显著获益,而其他研究则没有效果或有害(而不仅仅是大或小的效果)。明显的亚组效应可能是虚假的。如果亚组效应基于具有指定方向的少数预设假设,则可信度会增加;亚组比较来自于同一研究内,而不是来自不同研究之间;交互作用检验产生低 P 值;并且具有生物学依据。