Pirracchio Romain, Yue John K, Manley Geoffrey T, van der Laan Mark J, Hubbard Alan E
1 Department of Anesthesia and Perioperative Care, UCSF, San Francisco General Hospital, San Francisco, CA, USA.
2 Brain and Spinal Injury Center, San Francisco, CA, USA.
Stat Methods Med Res. 2018 Jan;27(1):286-297. doi: 10.1177/0962280215627335. Epub 2016 Jun 29.
Standard statistical practice used for determining the relative importance of competing causes of disease typically relies on ad hoc methods, often byproducts of machine learning procedures (stepwise regression, random forest, etc.). Causal inference framework and data-adaptive methods may help to tailor parameters to match the clinical question and free one from arbitrary modeling assumptions. Our focus is on implementations of such semiparametric methods for a variable importance measure (VIM). We propose a fully automated procedure for VIM based on collaborative targeted maximum likelihood estimation (cTMLE), a method that optimizes the estimate of an association in the presence of potentially numerous competing causes. We applied the approach to data collected from traumatic brain injury patients, specifically a prospective, observational study including three US Level-1 trauma centers. The primary outcome was a disability score (Glasgow Outcome Scale - Extended (GOSE)) collected three months post-injury. We identified clinically important predictors among a set of risk factors using a variable importance analysis based on targeted maximum likelihood estimators (TMLE) and on cTMLE. Via a parametric bootstrap, we demonstrate that the latter procedure has the potential for robust automated estimation of variable importance measures based upon machine-learning algorithms. The cTMLE estimator was associated with substantially less positivity bias as compared to TMLE and larger coverage of the 95% CI. This study confirms the power of an automated cTMLE procedure that can target model selection via machine learning to estimate VIMs in complicated, high-dimensional data.
用于确定疾病竞争病因相对重要性的标准统计方法通常依赖于临时方法,这些方法往往是机器学习程序(逐步回归、随机森林等)的副产品。因果推断框架和数据自适应方法可能有助于调整参数以匹配临床问题,并使人们摆脱任意的建模假设。我们的重点是这种半参数方法在变量重要性度量(VIM)方面的应用。我们基于协作目标最大似然估计(cTMLE)提出了一种用于VIM的全自动程序,该方法在存在潜在众多竞争病因的情况下优化关联估计。我们将该方法应用于从创伤性脑损伤患者收集的数据,具体是一项包括三个美国一级创伤中心的前瞻性观察性研究。主要结局是受伤后三个月收集的残疾评分(扩展格拉斯哥预后量表(GOSE))。我们使用基于目标最大似然估计器(TMLE)和cTMLE的变量重要性分析,在一组风险因素中识别出具有临床重要性的预测因素。通过参数自举法,我们证明后一种程序有可能基于机器学习对变量重要性度量进行稳健的自动估计。与TMLE相比,cTMLE估计器具有显著更少的阳性偏差和更大的95%置信区间覆盖率。这项研究证实了一种自动化cTMLE程序的能力,该程序可以通过机器学习针对模型选择来估计复杂高维数据中的VIM。