Institut de Recherche pour le Développement (IRD), Université Montpellier 1, UMI 233, Montpellier, France; Ecole Nationale Supérieure Polytechnique (ENSP), Université Yaoundé 1, Yaoundé, Cameroun.
Stat Med. 2013 Nov 20;32(26):4651-65. doi: 10.1002/sim.5854. Epub 2013 May 28.
Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B-splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV-infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random.
多重插补通常用于在 Cox 半参数回归设置中插补缺失的协变量。它通过 Gibbs 抽样过程为每个缺失数据填充更多合理的值,为每个缺失变量指定一个插补模型。这种插补方法在多个软件中实现,这些软件提供了由要插补的变量的形状引导的插补模型,但所有这些插补模型都对协变量效应做出了线性假设。然而,这种假设在实践中并不经常得到验证,因为协变量可能具有非线性效应。这种线性假设可能会导致误导性的结论,因为插补模型应该构建为反映缺失值和观测值之间的真实分布关系。为了估计插补模型中连续时间不变协变量的非线性效应,我们提出了一种基于 B-样条函数的方法。为了评估该方法的性能,我们进行了一项模拟研究,其中我们比较了基于贝叶斯样条插补模型的多重插补方法与生存分析设置中基于贝叶斯线性插补模型的多重插补方法。我们在塞内加尔的一个观察队列研究中对 HIV 感染患者进行了一项动机性数据收集,该数据集中包含了几个不完整的变量,我们在该数据集中评估了我们的方法。我们发现,当数据完全随机缺失或随机缺失时,与线性插补方法相比,我们的方法在估计风险比方面表现良好。