• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

队列研究中纵向数据缺失值插补方法的模拟研究

[Simulation study on missing data imputation methods for longitudinal data in cohort studies].

作者信息

Li Y M, Zhao P, Yang Y H, Wang J X, Yan H, Chen F Y

机构信息

Department of Epidemiology and Biostatistics, School of Public Health of Xi'an Jiaotong University Health Science Center, Xi'an 710061, China.

出版信息

Zhonghua Liu Xing Bing Xue Za Zhi. 2021 Oct 10;42(10):1889-1894. doi: 10.3760/cma.j.cn112338-20201130-01363.

DOI:10.3760/cma.j.cn112338-20201130-01363
PMID:34814629
Abstract

Data being missed is an unavoidable problem in cohort studies. This paper compares the imputation effect of eight common missing data imputation methods involved in cutting longitudinal data through simulation study to provide a valuable reference for the treatment of missing data in longitudinal studies. The simulation study is based on R language software and generates missing longitudinal data by the Monte Carlo method. By comparing the average absolute deviation, average relative deviation, and TypeⅠerror from the regression analysis of different imputation methods, the imputation effect of varying imputation methods on missing longitudinal data and the influence on subsequent multivariate analysis are evaluated. The mean imputation, k nearest neighbor (KNN), regression imputation, and random forest all have a similar imputation effect, which is also steady. However, the hot deck is inferior to the above imputation methods. K-means clustering and expectation maximization (EM) algorithm are among the worst and unstable. Mean imputation, EM algorithm, random forest, KNN, and regression imputation can control TypeⅠerror. Still, multiple imputations, hot deck, and K-means clustering cannot effectively manage the TypeⅠerror. For missing data in longitudinal studies, mean imputation, KNN, regression imputation, and random forest can be used as better imputation methods under the mechanism of missing at random. When the missing ratio is not too large, multiple imputations and hot deck can also perform well, but K-means clustering and EM algorithm are not recommended.

摘要

在队列研究中,数据缺失是一个不可避免的问题。本文通过模拟研究比较了八种常见的缺失数据插补方法在截断纵向数据时的插补效果,为纵向研究中缺失数据的处理提供有价值的参考。模拟研究基于R语言软件,采用蒙特卡罗方法生成缺失的纵向数据。通过比较不同插补方法回归分析的平均绝对偏差、平均相对偏差和Ⅰ类错误,评估不同插补方法对缺失纵向数据的插补效果以及对后续多变量分析的影响。均值插补、k近邻(KNN)、回归插补和随机森林的插补效果相似,且较为稳定。然而,热卡插补不如上述插补方法。K均值聚类和期望最大化(EM)算法是最差且不稳定的。均值插补、EM算法、随机森林、KNN和回归插补可以控制Ⅰ类错误。但多重插补、热卡插补和K均值聚类不能有效控制Ⅰ类错误。对于纵向研究中的缺失数据,在随机缺失机制下,均值插补、KNN、回归插补和随机森林可作为较好的插补方法。当缺失率不是太大时,多重插补和热卡插补也能表现良好,但不推荐使用K均值聚类和EM算法。

相似文献

1
[Simulation study on missing data imputation methods for longitudinal data in cohort studies].队列研究中纵向数据缺失值插补方法的模拟研究
Zhonghua Liu Xing Bing Xue Za Zhi. 2021 Oct 10;42(10):1889-1894. doi: 10.3760/cma.j.cn112338-20201130-01363.
2
Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets.缺失数据插补方法对队列研究数据集预测建模效果的比较。
BMC Med Res Methodol. 2024 Feb 16;24(1):41. doi: 10.1186/s12874-024-02173-x.
3
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
4
A wide range of missing imputation approaches in longitudinal data: a simulation study and real data analysis.多种缺失值插补方法在纵向数据分析中的应用:一项模拟研究与真实数据分析。
BMC Med Res Methodol. 2023 Jul 6;23(1):161. doi: 10.1186/s12874-023-01968-8.
5
Identifying reprioritization response shift in a stroke caregiver population: a comparison of missing data methods.识别中风患者照料者群体中的重新排序反应转移:缺失数据方法的比较。
Qual Life Res. 2015 Mar;24(3):529-40. doi: 10.1007/s11136-014-0824-3. Epub 2014 Oct 26.
6
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
7
A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.存在与时间呈非线性关联的时变协变量时,用于处理纵向数据中缺失值的多种多重填补方法的比较:一项模拟研究。
BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.
8
Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值:一项模拟研究。
BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.
9
Implementing Multiple Imputation for Missing Data in Longitudinal Studies When Models are Not Feasible: An Example Using the Random Hot Deck Approach.当模型不可行时,在纵向研究中对缺失数据实施多重填补:使用随机热卡方法的一个示例。
Clin Epidemiol. 2022 Nov 15;14:1387-1403. doi: 10.2147/CLEP.S368303. eCollection 2022.
10
NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data.NS-kNN:一种改进的 k-最近邻方法,用于代谢组学数据插补。
Metabolomics. 2018 Nov 23;14(12):153. doi: 10.1007/s11306-018-1451-8.

引用本文的文献

1
A validation study of three early warning scores in early identification of gastric cancer patients with deteriorating condition after gastrectomy.三种早期预警评分在胃癌患者胃切除术后病情恶化早期识别中的验证研究
BMC Gastroenterol. 2025 Feb 24;25(1):108. doi: 10.1186/s12876-024-03586-0.
2
Predictive model of risk factors for 28-day mortality in patients with sepsis or sepsis-associated delirium based on the MIMIC-IV database.基于 MIMIC-IV 数据库的脓毒症或脓毒症相关性谵妄患者 28 天死亡率风险因素预测模型。
Sci Rep. 2024 Aug 13;14(1):18751. doi: 10.1038/s41598-024-69332-4.
3
Association between activities of daily living and depressive symptoms among older adults in China: evidence from the CHARLS.
中国老年人日常生活活动与抑郁症状的关联:来自 CHARLS 的证据。
Front Public Health. 2023 Nov 16;11:1249208. doi: 10.3389/fpubh.2023.1249208. eCollection 2023.
4
Mediating Effects of Academic Self-Efficacy and Depressive Symptoms on Prosocial/Antisocial Behavior Among Youths.学术自我效能感和抑郁症状对青少年亲社会/反社会行为的中介作用。
Prev Sci. 2024 Jul;25(5):711-723. doi: 10.1007/s11121-023-01611-4. Epub 2023 Nov 8.
5
Prediction of the risk of developing end-stage renal diseases in newly diagnosed type 2 diabetes mellitus using artificial intelligence algorithms.使用人工智能算法预测新诊断2型糖尿病患者发生终末期肾病的风险。
BioData Min. 2023 Mar 10;16(1):8. doi: 10.1186/s13040-023-00324-2.
6
Predicting Six-Month Re-Admission Risk in Heart Failure Patients Using Multiple Machine Learning Methods: A Study Based on the Chinese Heart Failure Population Database.使用多种机器学习方法预测心力衰竭患者的六个月再入院风险:一项基于中国心力衰竭人群数据库的研究。
J Clin Med. 2023 Jan 21;12(3):870. doi: 10.3390/jcm12030870.
7
Dynamic nomogram for predicting generalized conversion in adult-onset ocular myasthenia gravis.用于预测成人眼肌型重症肌无力广义转化的动态列线图。
Neurol Sci. 2023 Apr;44(4):1383-1391. doi: 10.1007/s10072-022-06519-5. Epub 2022 Dec 5.
8
Associations among Body Mass Index, Waist-to-Hip Ratio, and Cognitive Impairment Tend to Follow an Opposite Trend and Are Sex Specific: A Population-Based Cross-Sectional Study in a Rural Area of Xi'an, China.体质指数、腰臀比与认知障碍之间的关联呈相反趋势且具有性别特异性:中国西安农村地区一项基于人群的横断面研究。
Neuroepidemiology. 2023;57(2):100-111. doi: 10.1159/000527444. Epub 2022 Oct 13.
9
The association between social engagement and depressive symptoms in middle-aged and elderly Chinese: A longitudinal subgroup identification analysis under causal inference frame.中国中老年人群社交参与与抑郁症状的关联:因果推断框架下的纵向亚组识别分析
Front Aging Neurosci. 2022 Sep 1;14:934801. doi: 10.3389/fnagi.2022.934801. eCollection 2022.