1 Department of Biostatistics, Erasmus MC, Rotterdam, The Netherlands.
2 Department of Epidemiology, Erasmus MC, Rotterdam, The Netherlands.
Stat Methods Med Res. 2019 Feb;28(2):555-568. doi: 10.1177/0962280217730851. Epub 2017 Oct 25.
Studies involving large observational datasets commonly face the challenge of dealing with multiple missing values. The most popular approach to overcome this challenge, multiple imputation using chained equations, however, has been shown to be sub-optimal in complex settings, specifically in settings with longitudinal outcomes, which cannot be easily and adequately included in the imputation models. Bayesian methods avoid this difficulty by specification of a joint distribution and thus offer an alternative. A popular choice for that joint distribution is the multivariate normal distribution. In more complicated settings, as in our two motivating examples that involve time-varying covariates, additional issues require consideration: the endo- or exogeneity of the covariate and its functional relation with the outcome. In such situations, the implied assumptions of standard methods may be violated, resulting in bias. In this work, we extend and study a more flexible, Bayesian alternative to the multivariate normal approach, to better handle complex incomplete longitudinal data. We discuss and compare assumptions of the two Bayesian approaches about the endo- or exogeneity of the covariates and the functional form of the association with the outcome, and illustrate and evaluate consequences of violations of those assumptions using simulation studies and two real data examples.
涉及大型观察性数据集的研究通常面临处理多个缺失值的挑战。最流行的克服这一挑战的方法是使用链式方程进行多重插补,但在复杂情况下,特别是在纵向结果的情况下,这种方法被证明是次优的,因为这些结果不容易且充分地包含在插补模型中。贝叶斯方法通过指定联合分布来避免这一困难,从而提供了另一种选择。对于该联合分布,一个流行的选择是多元正态分布。在更复杂的情况下,如我们的两个具有时变协变量的示例中,需要考虑其他问题:协变量的内源性或外生性及其与结果的函数关系。在这种情况下,标准方法的隐含假设可能会被违反,导致偏差。在这项工作中,我们扩展并研究了一种更灵活的、基于贝叶斯的替代多元正态方法,以更好地处理复杂的不完全纵向数据。我们讨论并比较了这两种贝叶斯方法关于协变量的内源性或外生性以及与结果的关联的函数形式的假设,并使用模拟研究和两个真实数据示例来说明和评估违反这些假设的后果。