Faculty of Epidemiology and Population Health, Department of Non-Communicable Disease Epidemiology, Cancer Survival Group, London School of Hygiene and Tropical Medicine, London, United Kingdom.
Laboratory for Psychiatric Biostatistics, McLean Hospital, Belmont, Massachusetts.
Am J Epidemiol. 2018 Apr 1;187(4):871-878. doi: 10.1093/aje/kwx317.
In this paper, we propose a structural framework for population-based cancer epidemiology and evaluate the performance of double-robust estimators for a binary exposure in cancer mortality. We conduct numerical analyses to study the bias and efficiency of these estimators. Furthermore, we compare 2 different model selection strategies based on 1) Akaike's Information Criterion and the Bayesian Information Criterion and 2) machine learning algorithms, and we illustrate double-robust estimators' performance in a real-world setting. In simulations with correctly specified models and near-positivity violations, all but the naive estimators had relatively good performance. However, the augmented inverse-probability-of-treatment weighting estimator showed the largest relative bias. Under dual model misspecification and near-positivity violations, all double-robust estimators were biased. Nevertheless, the targeted maximum likelihood estimator showed the best bias-variance trade-off, more precise estimates, and appropriate 95% confidence interval coverage, supporting the use of the data-adaptive model selection strategies based on machine learning algorithms. We applied these methods to estimate adjusted 1-year mortality risk differences in 183,426 lung cancer patients diagnosed after admittance to an emergency department versus persons with a nonemergency cancer diagnosis in England (2006-2013). The adjusted mortality risk (for patients diagnosed with lung cancer after admittance to an emergency department) was 16% higher in men and 18% higher in women, suggesting the importance of interventions targeting early detection of lung cancer signs and symptoms.
在本文中,我们提出了一种基于人群的癌症流行病学结构框架,并评估了用于癌症死亡率中二元暴露的双重稳健估计量的性能。我们进行了数值分析,以研究这些估计量的偏差和效率。此外,我们比较了基于 1)Akaike 信息准则和贝叶斯信息准则和 2)机器学习算法的两种不同的模型选择策略,并在实际环境中说明了双重稳健估计量的性能。在模型正确指定和接近正性违反的模拟中,除了天真的估计量外,所有估计量的性能都相对较好。然而,增强的逆处理权重估计量表现出最大的相对偏差。在双重模型误定和接近正性违反的情况下,所有双重稳健估计量都存在偏差。然而,靶向最大似然估计量表现出最佳的偏差方差权衡、更精确的估计值和适当的 95%置信区间覆盖,支持使用基于机器学习算法的数据自适应模型选择策略。我们将这些方法应用于估计 183426 名在英国急诊部门就诊后诊断为肺癌的患者与非急诊癌症诊断患者的 1 年调整死亡率风险差异(2006-2013 年)。调整后的死亡率风险(急诊部门就诊后诊断为肺癌的患者)在男性中高 16%,在女性中高 18%,这表明针对肺癌症状和体征的早期检测的干预措施的重要性。