Anglemyer Andrew, Horvath Hacsi T, Bero Lisa
Global Health Sciences, University of California, San Francisco, San Francisco, California, USA, 94105.
Cochrane Database Syst Rev. 2014 Apr 29;2014(4):MR000034. doi: 10.1002/14651858.MR000034.pub2.
Researchers and organizations often use evidence from randomized controlled trials (RCTs) to determine the efficacy of a treatment or intervention under ideal conditions. Studies of observational designs are often used to measure the effectiveness of an intervention in 'real world' scenarios. Numerous study designs and modifications of existing designs, including both randomized and observational, are used for comparative effectiveness research in an attempt to give an unbiased estimate of whether one treatment is more effective or safer than another for a particular population.A systematic analysis of study design features, risk of bias, parameter interpretation, and effect size for all types of randomized and non-experimental observational studies is needed to identify specific differences in design types and potential biases. This review summarizes the results of methodological reviews that compare the outcomes of observational studies with randomized trials addressing the same question, as well as methodological reviews that compare the outcomes of different types of observational studies.
To assess the impact of study design (including RCTs versus observational study designs) on the effect measures estimated.To explore methodological variables that might explain any differences identified.To identify gaps in the existing research comparing study designs.
We searched seven electronic databases, from January 1990 to December 2013.Along with MeSH terms and relevant keywords, we used the sensitivity-specificity balanced version of a validated strategy to identify reviews in PubMed, augmented with one term ("review" in article titles) so that it better targeted narrative reviews. No language restrictions were applied.
We examined systematic reviews that were designed as methodological reviews to compare quantitative effect size estimates measuring efficacy or effectiveness of interventions tested in trials with those tested in observational studies. Comparisons included RCTs versus observational studies (including retrospective cohorts, prospective cohorts, case-control designs, and cross-sectional designs). Reviews were not eligible if they compared randomized trials with other studies that had used some form of concurrent allocation.
In general, outcome measures included relative risks or rate ratios (RR), odds ratios (OR), hazard ratios (HR). Using results from observational studies as the reference group, we examined the published estimates to see whether there was a relative larger or smaller effect in the ratio of odds ratios (ROR).Within each identified review, if an estimate comparing results from observational studies with RCTs was not provided, we pooled the estimates for observational studies and RCTs. Then, we estimated the ratio of ratios (risk ratio or odds ratio) for each identified review using observational studies as the reference category. Across all reviews, we synthesized these ratios to get a pooled ROR comparing results from RCTs with results from observational studies.
Our initial search yielded 4406 unique references. Fifteen reviews met our inclusion criteria; 14 of which were included in the quantitative analysis.The included reviews analyzed data from 1583 meta-analyses that covered 228 different medical conditions. The mean number of included studies per paper was 178 (range 19 to 530).Eleven (73%) reviews had low risk of bias for explicit criteria for study selection, nine (60%) were low risk of bias for investigators' agreement for study selection, five (33%) included a complete sample of studies, seven (47%) assessed the risk of bias of their included studies,Seven (47%) reviews controlled for methodological differences between studies,Eight (53%) reviews controlled for heterogeneity among studies, nine (60%) analyzed similar outcome measures, and four (27%) were judged to be at low risk of reporting bias.Our primary quantitative analysis, including 14 reviews, showed that the pooled ROR comparing effects from RCTs with effects from observational studies was 1.08 (95% confidence interval (CI) 0.96 to 1.22). Of 14 reviews included in this analysis, 11 (79%) found no significant difference between observational studies and RCTs. One review suggested observational studies had larger effects of interest, and two reviews suggested observational studies had smaller effects of interest.Similar to the effect across all included reviews, effects from reviews comparing RCTs with cohort studies had a pooled ROR of 1.04 (95% CI 0.89 to 1.21), with substantial heterogeneity (I(2) = 68%). Three reviews compared effects of RCTs and case-control designs (pooled ROR: 1.11 (95% CI 0.91 to 1.35)).No significant difference in point estimates across heterogeneity, pharmacological intervention, or propensity score adjustment subgroups were noted. No reviews had compared RCTs with observational studies that used two of the most common causal inference methods, instrumental variables and marginal structural models.
AUTHORS' CONCLUSIONS: Our results across all reviews (pooled ROR 1.08) are very similar to results reported by similarly conducted reviews. As such, we have reached similar conclusions; on average, there is little evidence for significant effect estimate differences between observational studies and RCTs, regardless of specific observational study design, heterogeneity, or inclusion of studies of pharmacological interventions. Factors other than study design per se need to be considered when exploring reasons for a lack of agreement between results of RCTs and observational studies. Our results underscore that it is important for review authors to consider not only study design, but the level of heterogeneity in meta-analyses of RCTs or observational studies. A better understanding of how these factors influence study effects might yield estimates reflective of true effectiveness.
研究人员和组织常常利用随机对照试验(RCT)的证据来确定在理想条件下一种治疗方法或干预措施的疗效。观察性设计研究通常用于衡量一项干预措施在“现实世界”场景中的效果。为了对一种治疗方法是否比另一种治疗方法对特定人群更有效或更安全给出无偏估计,众多研究设计以及对现有设计(包括随机设计和观察性设计)的修改被用于比较效果研究。需要对所有类型的随机和非实验性观察性研究的研究设计特征、偏倚风险、参数解释和效应大小进行系统分析,以确定设计类型的具体差异和潜在偏倚。本综述总结了方法学综述的结果,这些综述比较了观察性研究与针对同一问题的随机试验的结果,以及比较不同类型观察性研究结果的方法学综述。
评估研究设计(包括随机对照试验与观察性研究设计)对估计的效应指标的影响。探索可能解释所发现差异的方法学变量。确定现有研究在比较研究设计方面的差距。
我们检索了1990年1月至2013年12月期间的七个电子数据库。除了医学主题词和相关关键词外,我们使用了一种经过验证的策略的灵敏度 - 特异性平衡版本来识别PubMed中的综述,并增加了一个词(文章标题中的“综述”),以便更好地针对叙述性综述。未应用语言限制。
我们审查了作为方法学综述设计的系统综述,以比较测量试验中测试的干预措施的疗效或效果的定量效应大小估计值与观察性研究中测试的估计值。比较包括随机对照试验与观察性研究(包括回顾性队列研究、前瞻性队列研究、病例对照设计和横断面设计)。如果综述将随机试验与使用某种形式的同期分配的其他研究进行比较,则不符合纳入标准。
一般来说,结局指标包括相对风险或率比(RR)、比值比(OR)、风险比(HR)。以观察性研究的结果作为参考组,我们检查已发表的估计值,以查看比值比(ROR)的比率中是否存在相对较大或较小的效应。在每项确定的综述中,如果未提供比较观察性研究与随机对照试验结果的估计值,我们汇总观察性研究和随机对照试验的估计值。然后,我们以观察性研究作为参考类别,为每项确定的综述估计比率之比(风险比或比值比)。在所有综述中,我们综合这些比率以获得比较随机对照试验结果与观察性研究结果的汇总ROR。
我们的初始检索产生了4406条独特的参考文献。十五项综述符合我们的纳入标准;其中十四项纳入了定量分析。纳入的综述分析了来自1583项荟萃分析的数据,这些荟萃分析涵盖了228种不同的医疗状况。每篇论文纳入研究的平均数量为178项(范围为19至530项)。十一项(73%)综述在研究选择的明确标准方面偏倚风险较低,九项(60%)在研究者对研究选择的一致性方面偏倚风险较低,五项(33%)纳入了完整的研究样本,七项(47%)评估了其纳入研究的偏倚风险,七项(47%)综述控制了研究之间的方法学差异,八项(53%)综述控制了研究之间的异质性,九项(60%)分析了相似的结局指标,四项(27%)被判定报告偏倚风险较低。我们的主要定量分析包括十四项综述,结果显示比较随机对照试验效应与观察性研究效应的汇总ROR为1.08(95%置信区间(CI)0.96至1.22)。在该分析纳入的十四项综述中,十一项(79%)发现观察性研究与随机对照试验之间无显著差异。一项综述表明观察性研究具有更大的感兴趣效应,两项综述表明观察性研究具有较小的感兴趣效应。与所有纳入综述的效应相似,比较随机对照试验与队列研究的综述效应的汇总ROR为1.04(95%CI 0.89至1.21),存在实质性异质性(I² = 68%)。三项综述比较了随机对照试验和病例对照设计的效应(汇总ROR:1.11(95%CI 0.91至1.35))。在异质性、药物干预或倾向评分调整亚组的点估计中未发现显著差异。没有综述将随机对照试验与使用两种最常见因果推断方法(工具变量和边际结构模型)的观察性研究进行比较。
我们所有综述的结果(汇总ROR 1.08)与类似开展的综述报告的结果非常相似。因此,我们得出了类似的结论;平均而言,几乎没有证据表明观察性研究与随机对照试验之间在效应估计上存在显著差异,无论具体的观察性研究设计、异质性或是否纳入药物干预研究。在探究随机对照试验和观察性研究结果不一致的原因时,需要考虑研究设计本身以外的因素。我们的结果强调,综述作者不仅要考虑研究设计,还要考虑随机对照试验或观察性研究荟萃分析中的异质性水平,这一点很重要。更好地理解这些因素如何影响研究效应可能会得出反映真实效果的估计值。