多元纵向混合缺失数据插补方法的评价与研究

Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables.

机构信息

Department of Biostatistics, Brown University, Providence, Rhode Island, USA.

Section of Geriatrics, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, USA.

出版信息

Stat Med. 2022 Dec 30;41(30):5844-5876. doi: 10.1002/sim.9592. Epub 2022 Oct 11.

DOI:10.1002/sim.9592

PMID:36220138

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9771917/

Abstract

Estimating relationships between multiple incomplete patient measurements requires methods to cope with missing values. Multiple imputation is one approach to address missing data by filling in plausible values for those that are missing. Multiple imputation procedures can be classified into two broad types: joint modeling (JM) and fully conditional specification (FCS). JM fits a multivariate distribution for the entire set of variables, but it may be complex to define and implement. FCS imputes missing data variable-by-variable from a set of conditional distributions. In many studies, FCS is easier to define and implement than JM, but it may be based on incompatible conditional models. Imputation methods based on multilevel modeling show improved operating characteristics when imputing longitudinal data, but they can be computationally intensive, especially when imputing multiple variables simultaneously. We review current MI methods for incomplete longitudinal data and their implementation on widely accessible software. Using simulated data from the National Health and Aging Trends Study, we compare their performance for monotone and intermittent missing data patterns. Our simulations demonstrate that in a longitudinal study with a limited number of repeated observations and time-varying variables, FCS-Standard is a computationally efficient imputation method that is accurate and precise for univariate single-level and multilevel regression models. When the analyses comprise multivariate multilevel models, FCS-LMM-latent is a statistically valid procedure with overall more accurate estimates, but it requires more intensive computations. Imputation methods based on generalized linear multilevel models can lead to biased subject-level variance estimates when the statistical analyses involve hierarchical models.

摘要

估计多个不完整患者测量值之间的关系需要使用方法来处理缺失值。多重插补是一种通过为缺失值填充合理值来解决缺失数据的方法。多重插补程序可以分为两类：联合建模（JM）和完全条件指定（FCS）。JM 拟合整个变量集的多元分布，但定义和实现可能很复杂。FCS 从一组条件分布逐变量地插补缺失数据。在许多研究中，FCS 比 JM 更容易定义和实现，但它可能基于不兼容的条件模型。基于多层次建模的插补方法在插补纵向数据时显示出改进的操作特性，但它们可能计算密集，尤其是在同时插补多个变量时。我们回顾了用于不完整纵向数据的当前 MI 方法及其在广泛使用的软件上的实现。使用来自国家健康老龄化趋势研究的模拟数据，我们比较了它们在单调和间歇性缺失数据模式下的性能。我们的模拟表明，在具有有限重复观测和时变变量的纵向研究中，FCS-Standard 是一种计算效率高的插补方法，对于单变量单水平和多水平回归模型准确且精确。当分析包括多变量多水平模型时，FCS-LMM-latent 是一种统计上有效的程序，总体上具有更准确的估计值，但需要更密集的计算。基于广义线性多水平模型的插补方法在涉及层次模型的统计分析中可能导致有偏的个体水平方差估计。