Grupo de Epidemiologia de Cancro, Centro de Investigação do IPO Porto (CI-IPOP), Instituto Português de Oncologia do Porto (IPO Porto), Porto, Portugal.
EPI-UNIT - Instituto de Saúde Pública, Universidade do Porto, Porto, Portugal.
Stat Methods Med Res. 2021 Oct;30(10):2256-2268. doi: 10.1177/09622802211031615. Epub 2021 Sep 2.
Missing data is a common issue in epidemiological databases. Among the different ways of dealing with missing data, multiple imputation has become more available in common statistical software packages. However, the incompatibility between the imputation and substantive model, which can arise when the associations between variables in the substantive model are not taken into account in the imputation models or when the substantive model is itself nonlinear, can lead to invalid inference. Aiming at analysing population-based cancer survival data, we extended the multiple imputation substantive model compatible-fully conditional specification (SMC-FCS) approach, proposed by Bartlett et al. in 2015 to accommodate excess hazard regression models. The proposed approach was compared with the standard fully conditional specification multiple imputation procedure and with the complete-case analysis using a simulation study. The SMC-FCS approach produced unbiased estimates in both scenarios tested, while the fully conditional specification produced biased estimates and poor empirical coverages probabilities. The SMC-FCS algorithm was then used for handling missing data in the evaluation of socioeconomic inequalities in survival from colorectal cancer patients diagnosed in the North Region of Portugal. The analysis using SMC-FCS showed a clearer trend in higher excess hazards for patients coming from more deprived areas. The proposed algorithm was implemented in R software and is presented as Supplementary Material.
缺失数据是流行病学数据库中常见的问题。在处理缺失数据的不同方法中,多重插补已在常见的统计软件包中变得更加可用。然而,当插补模型中没有考虑到实质性模型中变量之间的关联,或者实质性模型本身是非线性时,插补和实质性模型之间的不兼容性可能导致无效的推断。针对基于人群的癌症生存数据的分析,我们扩展了 Bartlett 等人在 2015 年提出的多重插补实质性模型兼容完全条件指定(SMC-FCS)方法,以适应超额风险回归模型。使用模拟研究比较了提出的方法与标准完全条件指定多重插补程序和完整案例分析。在测试的两种情况下,SMC-FCS 方法都产生了无偏估计,而完全条件指定方法产生了有偏估计和较差的经验覆盖概率。然后,SMC-FCS 算法被用于处理葡萄牙北部地区诊断的结直肠癌患者生存的社会经济不平等评估中的缺失数据。使用 SMC-FCS 的分析显示,来自贫困地区的患者的超额风险更高,趋势更明显。该算法已在 R 软件中实现,并作为补充材料呈现。