INSERM UMR 1246 - SPHERE, Nantes University, Tours University, 22 Boulevard Bénoni Goullin, 44200, Nantes, France.
IDBC-A2COM, Pacé, France.
Sci Rep. 2021 Jan 14;11(1):1435. doi: 10.1038/s41598-021-81110-0.
In clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.
在临床研究中,人们越来越感兴趣地使用倾向评分匹配方法来估计因果效应。由于其具有较高的统计功效,G 计算是一种替代方法。由于其可能对模型误设具有稳健性,机器学习也越来越多地被使用。在本文中,我们旨在提出一种方法,当结局和暴露状态都是二分类时,将机器学习和 G 计算结合起来,并且能够处理小样本。我们通过模拟评估了几种方法的性能,包括惩罚逻辑回归、神经网络、支持向量机、提升分类和回归树以及超级学习者。我们提出了六个不同的场景,这些场景具有不同的样本量、协变量的数量和协变量之间、暴露状态和结局之间的关系。我们还说明了这些方法的应用,这些方法用于估计颅内高压发作的 24 小时内开处方的巴比妥类药物的疗效。在 GC 的背景下,为了估计两个反事实世界中的个体结局概率,我们报告超级学习者在偏差和方差方面往往优于其他方法,特别是对于小样本量。支持向量机表现良好,但平均偏差略高于超级学习者。在所研究的场景中,即使在小样本量的情况下,与超级学习者结合的 G 计算也是一种用于进行因果推断的有效方法。