Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence, USA.
Department of Health Services, Policy & Practice, Brown University School of Public Health, Providence, USA.
J Gen Intern Med. 2020 May;35(5):1396-1404. doi: 10.1007/s11606-020-05713-5. Epub 2020 Mar 19.
Observational analysis methods can be refined by benchmarking against randomized trials. We reviewed studies systematically comparing observational analyses using propensity score methods against randomized trials to explore whether intervention or outcome characteristics predict agreement between designs.
We searched PubMed (from January 1, 2000, to April 30, 2017), the AHRQ Scientific Resource Center Methods Library, reference lists, and bibliographies to identify systematic reviews that compared estimates from observational analyses using propensity scores against randomized trials across three or more clinical topics; reported extractable relative risk (RR) data; and were published in English. One reviewer extracted data from all eligible systematic reviews; a second reviewer verified the extracted data.
Six systematic reviews matching published observational studies to randomized trials, published between 2012 and 2016, met our inclusion criteria. The reviews reported on 127 comparisons overall, in cardiology (29 comparisons), surgery (49), critical care medicine and sepsis (46), nephrology (2), and oncology (1). Disagreements were large (relative RR < 0.7 or > 1.43) in 68 (54%) and statistically significant in 12 (9%) of the comparisons. The degree of agreement varied among reviews but was not strongly associated with intervention or outcome characteristics.
Disagreements between observational studies using propensity score methods and randomized trials can occur for many reasons and the available data cannot be used to discern the reasons behind specific disagreements. Better benchmarking of observational analyses using propensity scores (and other causal inference methods) is possible using observational studies that explicitly attempt to emulate target trials.
观察性分析方法可以通过与随机试验进行基准测试来进行改进。我们系统地回顾了比较使用倾向评分方法的观察性分析与随机试验的研究,以探讨干预或结局特征是否可以预测两种设计之间的一致性。
我们在 PubMed(从 2000 年 1 月 1 日至 2017 年 4 月 30 日)、AHRQ 科学资源中心方法库、参考文献列表和参考文献中进行了检索,以确定比较使用倾向评分的观察性分析与横跨三个或更多临床主题的随机试验的系统评价;报告可提取的相对风险(RR)数据;并用英文发表。一名评审员从所有符合条件的系统评价中提取数据;第二名评审员对提取的数据进行验证。
共有 6 项符合条件的系统评价,将发表的观察性研究与随机试验进行了匹配,发表时间在 2012 年至 2016 年之间,符合我们的纳入标准。这些综述共报告了 127 项比较,涉及心脏病学(29 项比较)、外科学(49 项)、重症监护医学和脓毒症(46 项)、肾脏病学(2 项)和肿瘤学(1 项)。在 127 项比较中,有 68 项(54%)差异较大(相对 RR<0.7 或>1.43),12 项(9%)差异有统计学意义。各综述之间的一致性程度不同,但与干预或结局特征没有很强的相关性。
使用倾向评分方法的观察性研究与随机试验之间可能会出现不一致,原因有很多,并且可用数据无法用于辨别特定不一致的原因。使用明确尝试模拟目标试验的观察性研究,可以更好地对使用倾向评分(和其他因果推理方法)的观察性分析进行基准测试。