• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

外部控制臂分析:倾向评分方法、G-计算和双重无偏机器学习的评估。

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning.

机构信息

Owkin France, Paris, France.

出版信息

BMC Med Res Methodol. 2022 Dec 28;22(1):335. doi: 10.1186/s12874-022-01799-z.

DOI:10.1186/s12874-022-01799-z
PMID:36577946
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9795588/
Abstract

BACKGROUND

An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approaches. There are alternative approaches to infer efficacy based on comparisons between outcomes of single-arm patients and machine-learning predictions of control patient outcomes. These methods include G-computation and Doubly Debiased Machine Learning (DDML) and their evaluation for External Control Arms (ECA) analysis is insufficient.

METHODS

We consider both numerical simulations and a trial replication procedure to evaluate the different statistical approaches: propensity score matching, Inverse Probability of Treatment Weighting (IPTW), G-computation, and DDML. The replication study relies on five type 2 diabetes randomized clinical trials granted by the Yale University Open Data Access (YODA) project. From the pool of five trials, observational experiments are artificially built by replacing a control arm from one trial by an arm originating from another trial and containing similarly-treated patients.

RESULTS

Among the different statistical approaches, numerical simulations show that DDML has the smallest bias followed by G-computation. In terms of mean squared error, G-computation usually minimizes mean squared error. Compared to other methods, DDML has varying Mean Squared Error performances that improves with increasing sample sizes. For hypothesis testing, all methods control type I error and DDML is the most conservative. G-computation is the best method in terms of statistical power, and DDML has comparable power at [Formula: see text] but inferior ones for smaller sample sizes. The replication procedure also indicates that G-computation minimizes mean squared error whereas DDML has intermediate performances in between G-computation and propensity score approaches. The confidence intervals of G-computation are the narrowest whereas confidence intervals obtained with DDML are the widest for small sample sizes, which confirms its conservative nature.

CONCLUSIONS

For external control arm analyses, methods based on outcome prediction models can reduce estimation error and increase statistical power compared to propensity score approaches.

摘要

背景

外部对照臂是从单臂试验外部数据中收集的对照患者队列。为了提供疗效的无偏估计,单臂和外部臂患者的临床特征应通过倾向评分方法进行匹配。还有其他方法可以基于单臂患者的结果与对照患者结果的机器学习预测之间的比较来推断疗效。这些方法包括 G 计算和双重偏差机器学习(DDML),但对外部对照臂(ECA)分析的评估不足。

方法

我们考虑了数值模拟和试验复制程序来评估不同的统计方法:倾向评分匹配、逆处理概率加权(IPTW)、G 计算和 DDML。复制研究依赖于耶鲁大学开放数据访问(YODA)项目授予的五项 2 型糖尿病随机临床试验。在这五项试验中,通过从一项试验中替换对照臂并使用来自另一项试验的臂来构建人工观察实验,该臂包含接受类似治疗的患者。

结果

在不同的统计方法中,数值模拟表明 DDML 的偏差最小,其次是 G 计算。在均方误差方面,G 计算通常最小化均方误差。与其他方法相比,DDML 的均方误差性能各不相同,随着样本量的增加而提高。对于假设检验,所有方法均控制 I 型错误,而 DDML 最保守。G 计算在统计功效方面是最好的方法,而 DDML 在 [Formula: see text] 时具有可比的功效,但在较小的样本量时效果较差。复制程序还表明,G 计算最小化均方误差,而 DDML 在 G 计算和倾向评分方法之间具有中间性能。G 计算的置信区间最窄,而 DDML 的置信区间在小样本量时最宽,这证实了其保守性。

结论

对于外部对照臂分析,基于结果预测模型的方法与倾向评分方法相比,可以减少估计误差并提高统计功效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/38ad4af3df38/12874_2022_1799_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/2c8948c1236a/12874_2022_1799_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/3ad0859c0a5c/12874_2022_1799_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/9d25809e84d0/12874_2022_1799_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/5e848e951d8a/12874_2022_1799_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/38ad4af3df38/12874_2022_1799_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/2c8948c1236a/12874_2022_1799_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/3ad0859c0a5c/12874_2022_1799_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/9d25809e84d0/12874_2022_1799_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/5e848e951d8a/12874_2022_1799_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/899a/9795588/38ad4af3df38/12874_2022_1799_Fig5_HTML.jpg

相似文献

1
External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning.外部控制臂分析:倾向评分方法、G-计算和双重无偏机器学习的评估。
BMC Med Res Methodol. 2022 Dec 28;22(1):335. doi: 10.1186/s12874-022-01799-z.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Machine learning outcome regression improves doubly robust estimation of average causal effects.机器学习结果回归改进了平均因果效应的双重稳健估计。
Pharmacoepidemiol Drug Saf. 2020 Sep;29(9):1120-1133. doi: 10.1002/pds.5074. Epub 2020 Jul 27.
4
Comparing g-computation, propensity score-based weighting, and targeted maximum likelihood estimation for analyzing externally controlled trials with both measured and unmeasured confounders: a simulation study.比较 g 计算、倾向评分加权和有测量和未测量混杂因素的外部对照试验的靶向极大似然估计:一项模拟研究。
BMC Med Res Methodol. 2023 Jan 17;23(1):18. doi: 10.1186/s12874-023-01835-6.
5
The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies.不同倾向评分法在观察性研究中估计比例差异(风险差异或绝对风险降低)的表现。
Stat Med. 2010 Sep 10;29(20):2137-48. doi: 10.1002/sim.3854.
6
An evaluation of inverse probability weighting using the propensity score for baseline covariate adjustment in smaller population randomised controlled trials with a continuous outcome.评价在较小人群随机对照试验中使用倾向评分进行基线协变量调整的逆概率加权法,该试验的结局变量为连续型变量。
BMC Med Res Methodol. 2020 Mar 23;20(1):70. doi: 10.1186/s12874-020-00947-7.
7
Model misspecification and robustness in causal inference: comparing matching with doubly robust estimation.因果推断中的模型误设定与稳健性:比较匹配法和双重稳健估计。
Stat Med. 2012 Jul 10;31(15):1572-81. doi: 10.1002/sim.4496. Epub 2012 Feb 23.
8
Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.机器学习在因果推断中的应用:基于交叉拟合估计量的研究。
Epidemiology. 2021 May 1;32(3):393-401. doi: 10.1097/EDE.0000000000001332.
9
Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies.观察性研究中因果推断的靶向最大似然估计
Am J Epidemiol. 2017 Jan 1;185(1):65-73. doi: 10.1093/aje/kww165. Epub 2016 Dec 9.
10
The performance of different propensity score methods for estimating marginal hazard ratios.不同倾向评分方法估计边缘风险比的性能。
Stat Med. 2013 Jul 20;32(16):2837-49. doi: 10.1002/sim.5705. Epub 2012 Dec 12.

引用本文的文献

1
FedECA: federated external control arms for causal inference with time-to-event data in distributed settings.FedECA:用于在分布式环境中对具有事件发生时间数据进行因果推断的联邦外部对照臂。
Nat Commun. 2025 Aug 13;16(1):7496. doi: 10.1038/s41467-025-62525-z.
2
Flexible quantitative bias analysis for unmeasured confounding in subject-level indirect treatment comparisons with proportional hazards violation.针对比例风险违背情况下受试者水平间接治疗比较中未测量混杂因素的灵活定量偏倚分析。
BMC Med Res Methodol. 2025 May 10;25(1):131. doi: 10.1186/s12874-025-02551-z.
3
Synthetic Data in Healthcare and Drug Development: Definitions, Regulatory Frameworks, Issues.

本文引用的文献

1
Within study comparisons and risk of bias in international development: Systematic review and critical appraisal.国际发展中的研究比较与偏倚风险:系统评价与批判性评估。
Campbell Syst Rev. 2019 Jul 26;15(1-2):e1027. doi: 10.1002/cl2.1027. eCollection 2019 Jun.
2
G-computation and doubly robust standardisation for continuous-time data: A comparison with inverse probability weighting.连续时间数据的G计算和双重稳健标准化:与逆概率加权法的比较
Stat Methods Med Res. 2022 Apr;31(4):706-718. doi: 10.1177/09622802211047345. Epub 2021 Dec 3.
3
Emulation of a randomized controlled trial in ulcerative colitis with US and French claims data: Infliximab with thiopurines compared to infliximab monotherapy.
医疗保健与药物研发中的合成数据:定义、监管框架、问题
CPT Pharmacometrics Syst Pharmacol. 2025 May;14(5):840-852. doi: 10.1002/psp4.70021. Epub 2025 Apr 7.
4
Outcomes after intranasal human milk therapy in preterm infants with intraventricular hemorrhage.早产脑室周围-脑室内出血患儿鼻饲人乳治疗后的结局
J Perinatol. 2025 Feb;45(2):202-207. doi: 10.1038/s41372-024-02147-3. Epub 2024 Oct 9.
5
Examining external control arms in oncology: A scoping review of applications to date.审视肿瘤学中的外部控制臂:对迄今为止应用的范围综述。
Cancer Med. 2024 Jul;13(13):e7447. doi: 10.1002/cam4.7447.
6
Relapse Rates With Paliperidone Palmitate in Adult Patients With Schizophrenia: Results for the 6-Month Formulation From an Open-label Extension Study Compared to Real-World Data for the 1-Month and 3-Month Formulations.帕利哌酮棕榈酸酯治疗精神分裂症成人患者的复发率:开放标签扩展研究中 6 个月剂型与 1 个月和 3 个月剂型真实世界数据的比较结果。
Int J Neuropsychopharmacol. 2024 Feb 1;27(2). doi: 10.1093/ijnp/pyad067.
7
Neoadjuvant adebrelimab in locally advanced resectable esophageal squamous cell carcinoma: a phase 1b trial.在局部晚期可切除食管鳞癌中的新辅助 adebrelimab:1b 期试验。
Nat Med. 2023 Aug;29(8):2068-2078. doi: 10.1038/s41591-023-02469-3. Epub 2023 Jul 24.
利用美国和法国的索赔数据模拟溃疡性结肠炎的随机对照试验:硫唑嘌呤联合英夫利昔单抗与英夫利昔单抗单药治疗比较。
Pharmacoepidemiol Drug Saf. 2022 Feb;31(2):167-175. doi: 10.1002/pds.5356. Epub 2021 Sep 26.
4
Building External Control Arms From Patient-Level Electronic Health Record Data to Replicate the Randomized IMblaze370 Control Arm in Metastatic Colorectal Cancer.从患者级别的电子健康记录数据中构建外部对照臂,以复制转移性结直肠癌的 IMblaze370 对照臂随机试验。
JCO Clin Cancer Inform. 2021 Apr;5:450-458. doi: 10.1200/CCI.20.00149.
5
G-computation, propensity score-based methods, and targeted maximum likelihood estimator for causal inference with different covariates sets: a comparative simulation study.基于 G-计算、倾向评分匹配方法和有不同协变量集的因果推断的目标极大似然估计器:一项比较模拟研究。
Sci Rep. 2020 Jun 8;10(1):9219. doi: 10.1038/s41598-020-65917-x.
6
Synthetic and External Controls in Clinical Trials - A Primer for Researchers.临床试验中的合成对照与外部对照——研究人员入门指南
Clin Epidemiol. 2020 May 8;12:457-467. doi: 10.2147/CLEP.S242097. eCollection 2020.
7
Real-world evidence to support regulatory decision-making for medicines: Considerations for external control arms.支持药品监管决策的真实世界证据:外部对照臂的考虑因素。
Pharmacoepidemiol Drug Saf. 2020 Oct;29(10):1228-1235. doi: 10.1002/pds.4975. Epub 2020 Mar 11.
8
Emulation Differences vs. Biases When Calibrating Real-World Evidence Findings Against Randomized Controlled Trials.将真实世界证据结果与随机对照试验进行校准过程中的模拟差异与偏差
Clin Pharmacol Ther. 2020 Apr;107(4):735-737. doi: 10.1002/cpt.1793. Epub 2020 Feb 12.
9
When a randomized controlled trial is unlikely: Propensity score analysis of blinatumomab in adults with relapsed/refractory Philadelphia chromosome-positive B-cell acute lymphoblastic leukemia.当随机对照试验不太可行时:复发/难治性费城染色体阳性B细胞急性淋巴细胞白血病成人患者中博纳吐单抗的倾向评分分析
Cancer. 2020 Jan 15;126(2):253-255. doi: 10.1002/cncr.32565. Epub 2019 Oct 18.
10
Estimating treatment effects with machine learning.使用机器学习估计治疗效果。
Health Serv Res. 2019 Dec;54(6):1273-1282. doi: 10.1111/1475-6773.13212. Epub 2019 Oct 10.