Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.
Now with Optum, Boston, Massachusetts.
JAMA. 2023 Apr 25;329(16):1376-1385. doi: 10.1001/jama.2023.4221.
Nonrandomized studies using insurance claims databases can be analyzed to produce real-world evidence on the effectiveness of medical products. Given the lack of baseline randomization and measurement issues, concerns exist about whether such studies produce unbiased treatment effect estimates.
To emulate the design of 30 completed and 2 ongoing randomized clinical trials (RCTs) of medications with database studies using observational analogues of the RCT design parameters (population, intervention, comparator, outcome, time [PICOT]) and to quantify agreement in RCT-database study pairs.
DESIGN, SETTING, AND PARTICIPANTS: New-user cohort studies with propensity score matching using 3 US claims databases (Optum Clinformatics, MarketScan, and Medicare). Inclusion-exclusion criteria for each database study were prespecified to emulate the corresponding RCT. RCTs were explicitly selected based on feasibility, including power, key confounders, and end points more likely to be emulated with real-world data. All 32 protocols were registered on ClinicalTrials.gov before conducting analyses. Emulations were conducted from 2017 through 2022.
Therapies for multiple clinical conditions were included.
Database study emulations focused on the primary outcome of the corresponding RCT. Findings of database studies were compared with RCTs using predefined metrics, including Pearson correlation coefficients and binary metrics based on statistical significance agreement, estimate agreement, and standardized difference.
In these highly selected RCTs, the overall observed agreement between the RCT and the database emulation results was a Pearson correlation of 0.82 (95% CI, 0.64-0.91), with 75% meeting statistical significance, 66% estimate agreement, and 75% standardized difference agreement. In a post hoc analysis limited to 16 RCTs with closer emulation of trial design and measurements, concordance was higher (Pearson r, 0.93; 95% CI, 0.79-0.97; 94% meeting statistical significance, 88% estimate agreement, 88% standardized difference agreement). Weaker concordance occurred among 16 RCTs for which close emulation of certain design elements that define the research question (PICOT) with data from insurance claims was not possible (Pearson r, 0.53; 95% CI, 0.00-0.83; 56% meeting statistical significance, 50% estimate agreement, 69% standardized difference agreement).
Real-world evidence studies can reach similar conclusions as RCTs when design and measurements can be closely emulated, but this may be difficult to achieve. Concordance in results varied depending on the agreement metric. Emulation differences, chance, and residual confounding can contribute to divergence in results and are difficult to disentangle.
使用保险索赔数据库的非随机研究可以分析产生关于医疗产品有效性的真实世界证据。鉴于缺乏基线随机化和测量问题,人们担心此类研究是否会产生无偏的治疗效果估计。
使用观察性模拟 RCT 设计参数(人群、干预、对照、结局、时间 [PICOT])的保险索赔数据库研究来模拟 30 项已完成和 2 项正在进行的药物随机对照试验(RCT)的设计,并量化 RCT-数据库研究对的一致性。
设计、设置和参与者:使用 3 个美国索赔数据库(Optum Clinformatics、MarketScan 和 Medicare)进行新用户队列研究,并进行倾向评分匹配。每个数据库研究的纳入-排除标准都是预先规定的,以模拟相应的 RCT。RCT 是根据可行性(包括功效、关键混杂因素和更有可能使用真实世界数据模拟的终点)明确选择的。所有 32 项方案都在进行分析之前在 ClinicalTrials.gov 上进行了注册。模拟工作于 2017 年至 2022 年进行。
纳入了多种临床情况的治疗方法。
数据库研究模拟侧重于相应 RCT 的主要结局。使用预定义指标比较数据库研究和 RCT 的结果,包括 Pearson 相关系数和基于统计显著性、估计一致性和标准化差异的二进制指标。
在这些高度选择的 RCT 中,RCT 和数据库模拟结果之间的总体观察一致性为 0.82(95%CI,0.64-0.91)的 Pearson 相关系数,75%具有统计学意义,66%的估计一致性和 75%的标准化差异一致性。在一项仅限于更接近试验设计和测量模拟的 16 项 RCT 的事后分析中,一致性更高(Pearson r,0.93;95%CI,0.79-0.97;94%具有统计学意义,88%的估计一致性,88%的标准化差异一致性)。对于某些设计元素(PICOT)无法通过保险索赔数据进行紧密模拟的 16 项 RCT,一致性较弱(Pearson r,0.53;95%CI,0.00-0.83;56%具有统计学意义,50%的估计一致性,69%的标准化差异一致性)。
当设计和测量可以紧密模拟时,真实世界证据研究可以得出与 RCT 相似的结论,但这可能很难实现。结果的一致性取决于一致性指标。结果的差异、偶然性和残留混杂因素可能导致结果出现分歧,并且难以区分。