Suppr超能文献

缺失纵向数据的插补:方法比较

Imputation of missing longitudinal data: a comparison of methods.

作者信息

Engels Jean Mundahl, Diehr Paula

机构信息

Departments of Biostatistics and Health Services, University of Washington, 1959 Northeast Pacific Avenue, Box 357232, Seattle, WA 98195, USA.

出版信息

J Clin Epidemiol. 2003 Oct;56(10):968-76. doi: 10.1016/s0895-4356(03)00170-7.

Abstract

BACKGROUND AND OBJECTIVES

Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults.

METHODS

We identified situations where a person had a known value following one or more missing values, and treated the known value as a "missing value." This "missing value" was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates.

RESULTS

Most imputation methods were biased toward estimating the "missing value" as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the "missing value" were superior to other methods, followed by imputations based on a person's values before the "missing value." Imputations that used no information specific to the person, such as using the sample mean, had the worst performance.

CONCLUSIONS

We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person.

摘要

背景与目的

在纵向研究中,缺失信息不可避免,且可能导致估计偏差和效能损失。解决此问题的一种方法是对缺失数据进行插补,以生成更完整的数据集。我们的目标是比较14种缺失数据插补方法在一个老年纵向队列中对抑郁、体重、认知功能和自评健康状况的表现。

方法

我们确定了一个人在一个或多个缺失值之后有已知值的情况,并将该已知值视为“缺失值”。使用每种方法对这个“缺失值”进行插补,并与观察值进行比较。比较了这些方法在估计值的均方根误差、平均绝对偏差、偏差和相对方差方面的表现。

结果

大多数插补方法倾向于将“缺失值”估计得过于健康,且大多数估计值的方差过低。基于一个人在“缺失值”前后的值进行的插补优于其他方法,其次是基于该人在“缺失值”之前的值进行的插补。不使用特定于该人的信息(如使用样本均值)的插补表现最差。

结论

我们得出结论,在纵向研究中,如果总体趋势是健康状况随时间恶化,且可以假设缺失数据主要与较差的健康状况相关,那么纵向序列中的缺失数据应根据该人的可用纵向数据进行插补。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验