外部控制臂分析：倾向评分方法、G-计算和双重无偏机器学习的评估。

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning.

机构信息

Owkin France, Paris, France.

出版信息

BMC Med Res Methodol. 2022 Dec 28;22(1):335. doi: 10.1186/s12874-022-01799-z.

DOI:10.1186/s12874-022-01799-z

PMID:36577946

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9795588/

Abstract

BACKGROUND

An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approaches. There are alternative approaches to infer efficacy based on comparisons between outcomes of single-arm patients and machine-learning predictions of control patient outcomes. These methods include G-computation and Doubly Debiased Machine Learning (DDML) and their evaluation for External Control Arms (ECA) analysis is insufficient.

METHODS

We consider both numerical simulations and a trial replication procedure to evaluate the different statistical approaches: propensity score matching, Inverse Probability of Treatment Weighting (IPTW), G-computation, and DDML. The replication study relies on five type 2 diabetes randomized clinical trials granted by the Yale University Open Data Access (YODA) project. From the pool of five trials, observational experiments are artificially built by replacing a control arm from one trial by an arm originating from another trial and containing similarly-treated patients.

RESULTS

Among the different statistical approaches, numerical simulations show that DDML has the smallest bias followed by G-computation. In terms of mean squared error, G-computation usually minimizes mean squared error. Compared to other methods, DDML has varying Mean Squared Error performances that improves with increasing sample sizes. For hypothesis testing, all methods control type I error and DDML is the most conservative. G-computation is the best method in terms of statistical power, and DDML has comparable power at [Formula: see text] but inferior ones for smaller sample sizes. The replication procedure also indicates that G-computation minimizes mean squared error whereas DDML has intermediate performances in between G-computation and propensity score approaches. The confidence intervals of G-computation are the narrowest whereas confidence intervals obtained with DDML are the widest for small sample sizes, which confirms its conservative nature.

CONCLUSIONS

For external control arm analyses, methods based on outcome prediction models can reduce estimation error and increase statistical power compared to propensity score approaches.

摘要

背景

外部对照臂是从单臂试验外部数据中收集的对照患者队列。为了提供疗效的无偏估计，单臂和外部臂患者的临床特征应通过倾向评分方法进行匹配。还有其他方法可以基于单臂患者的结果与对照患者结果的机器学习预测之间的比较来推断疗效。这些方法包括 G 计算和双重偏差机器学习（DDML），但对外部对照臂（ECA）分析的评估不足。

方法

我们考虑了数值模拟和试验复制程序来评估不同的统计方法：倾向评分匹配、逆处理概率加权（IPTW）、G 计算和 DDML。复制研究依赖于耶鲁大学开放数据访问（YODA）项目授予的五项 2 型糖尿病随机临床试验。在这五项试验中，通过从一项试验中替换对照臂并使用来自另一项试验的臂来构建人工观察实验，该臂包含接受类似治疗的患者。

结果

在不同的统计方法中，数值模拟表明 DDML 的偏差最小，其次是 G 计算。在均方误差方面，G 计算通常最小化均方误差。与其他方法相比，DDML 的均方误差性能各不相同，随着样本量的增加而提高。对于假设检验，所有方法均控制 I 型错误，而 DDML 最保守。G 计算在统计功效方面是最好的方法，而 DDML 在 [Formula: see text] 时具有可比的功效，但在较小的样本量时效果较差。复制程序还表明，G 计算最小化均方误差，而 DDML 在 G 计算和倾向评分方法之间具有中间性能。G 计算的置信区间最窄，而 DDML 的置信区间在小样本量时最宽，这证实了其保守性。

结论

对于外部对照臂分析，基于结果预测模型的方法与倾向评分方法相比，可以减少估计误差并提高统计功效。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

外部控制臂分析：倾向评分方法、G-计算和双重无偏机器学习的评估。

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

外部控制臂分析：倾向评分方法、G-计算和双重无偏机器学习的评估。

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献