Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Stat Med. 2020 Oct 30;39(24):3299-3312. doi: 10.1002/sim.8666. Epub 2020 Jul 6.
Many diseases such as cancer and heart diseases are heterogeneous and it is of great interest to study the disease risk specific to the subtypes in relation to genetic and environmental risk factors. However, due to logistic and cost reasons, the subtype information for the disease is missing for some subjects. In this article, we investigate methods for multinomial logistic regression with missing outcome data, including a bootstrap hot deck multiple imputation (BHMI), simple inverse probability weighted (SIPW), augmented inverse probability weighted (AIPW), and expected estimating equation (EEE) estimators. These methods are important approaches for missing data regression. The BHMI modifies the standard hot deck multiple imputation method such that it can provide valid confidence interval estimation. Under the situation when the covariates are discrete, the SIPW, AIPW, and EEE estimators are numerically identical. When the covariates are continuous, nonparametric smoothers can be applied to estimate the selection probabilities and the estimating scores. These methods perform similarly. Extensive simulations show that all of these methods yield unbiased estimators while the complete-case (CC) analysis can be biased if the missingness depends on the observed data. Our simulations also demonstrate that these methods can gain substantial efficiency compared with the CC analysis. The methods are applied to a colorectal cancer study in which cancer subtype data are missing among some study individuals.
许多疾病,如癌症和心脏病,具有异质性,研究与遗传和环境风险因素相关的特定于亚型的疾病风险非常有趣。然而,由于逻辑和成本原因,一些受试者的疾病亚型信息缺失。在本文中,我们研究了缺失结局数据的多项逻辑回归方法,包括bootstrap 热甲板多重插补(BHMI)、简单逆概率加权(SIPW)、增强逆概率加权(AIPW)和期望估计方程(EEE)估计量。这些方法是缺失数据回归的重要方法。BHMI 修改了标准的热甲板多重插补方法,使其能够提供有效的置信区间估计。在协变量为离散的情况下,SIPW、AIPW 和 EEE 估计量在数值上是相同的。当协变量为连续时,可以应用非参数平滑器来估计选择概率和估计得分。这些方法的表现相似。广泛的模拟表明,所有这些方法都产生无偏估计量,而完整案例(CC)分析如果缺失依赖于观察数据,则可能存在偏差。我们的模拟还表明,与 CC 分析相比,这些方法可以获得实质性的效率增益。这些方法应用于一项结直肠癌研究,其中一些研究个体的癌症亚型数据缺失。