Suppr超能文献

存在与时间呈非线性关联的时变协变量时,用于处理纵向数据中缺失值的多种多重填补方法的比较:一项模拟研究。

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.

作者信息

De Silva Anurika Priyanjali, Moreno-Betancur Margarita, De Livera Alysha Madhu, Lee Katherine Jane, Simpson Julie Anne

机构信息

Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.

Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, Royal Children's Hospital, Melbourne, VIC, Australia.

出版信息

BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.

Abstract

BACKGROUND

Missing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI)) treat repeated measurements of the same time-dependent variable as just another 'distinct' variable for imputation and therefore do not make the most of the longitudinal structure of the data. Only a few studies have explored extensions to the standard approaches to account for the temporal structure of longitudinal data. One suggestion is the two-fold fully conditional specification (two-fold FCS) algorithm, which restricts the imputation of a time-dependent variable to time blocks where the imputation model includes measurements taken at the specified and adjacent times. To date, no study has investigated the performance of two-fold FCS and standard MI methods for handling missing data in a time-varying covariate with a non-linear trajectory over time - a commonly encountered scenario in epidemiological studies.

METHODS

We simulated 1000 datasets of 5000 individuals based on the Longitudinal Study of Australian Children (LSAC). Three missing data mechanisms: missing completely at random (MCAR), and a weak and a strong missing at random (MAR) scenarios were used to impose missingness on body mass index (BMI) for age z-scores; a continuous time-varying exposure variable with a non-linear trajectory over time. We evaluated the performance of FCS, MVNI, and two-fold FCS for handling up to 50% of missing data when assessing the association between childhood obesity and sleep problems.

RESULTS

The standard two-fold FCS produced slightly more biased and less precise estimates than FCS and MVNI. We observed slight improvements in bias and precision when using a time window width of two for the two-fold FCS algorithm compared to the standard width of one.

CONCLUSION

We recommend the use of FCS or MVNI in a similar longitudinal setting, and when encountering convergence issues due to a large number of time points or variables with missing values, the two-fold FCS with exploration of a suitable time window.

摘要

背景

缺失数据是流行病学研究中的常见问题,在涉及多轮数据收集的纵向数据中尤为突出。传统的多重填补(MI)方法(完全条件设定(FCS)和多元正态填补(MVNI))将同一随时间变化变量的重复测量视为另一个用于填补的“不同”变量,因此没有充分利用数据的纵向结构。只有少数研究探索了对标准方法的扩展,以考虑纵向数据的时间结构。一种建议是双重完全条件设定(two-fold FCS)算法,该算法将随时间变化变量的填补限制在时间块内,其中填补模型包括在指定时间和相邻时间进行的测量。迄今为止,尚无研究调查双重FCS和标准MI方法在处理随时间具有非线性轨迹的时变协变量中的缺失数据方面的性能——这是流行病学研究中常见的情况。

方法

我们基于澳大利亚儿童纵向研究(LSAC)模拟了1000个包含5000名个体的数据集。使用三种缺失数据机制:完全随机缺失(MCAR),以及弱随机缺失和强随机缺失(MAR)情景,对年龄z评分的体重指数(BMI)施加缺失值;这是一个随时间具有非线性轨迹的连续时变暴露变量。在评估儿童肥胖与睡眠问题之间的关联时,我们评估了FCS、MVNI和双重FCS在处理高达50%缺失数据方面的性能。

结果

标准的双重FCS产生的估计偏差略大且精度略低,与FCS和MVNI相比。与标准宽度为1相比,当双重FCS算法的时间窗口宽度为2时,我们观察到偏差和精度略有改善。

结论

我们建议在类似的纵向研究中使用FCS或MVNI,并且当由于大量时间点或具有缺失值的变量而遇到收敛问题时,使用双重FCS并探索合适的时间窗口。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验