• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

医学中缺失实验室数据的插补方法比较。

Comparison of imputation methods for missing laboratory data in medicine.

机构信息

Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA.

出版信息

BMJ Open. 2013 Aug 1;3(8):e002847. doi: 10.1136/bmjopen-2013-002847.

DOI:10.1136/bmjopen-2013-002847
PMID:23906948
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3733317/
Abstract

OBJECTIVES

Missing laboratory data is a common issue, but the optimal method of imputation of missing values has not been determined. The aims of our study were to compare the accuracy of four imputation methods for missing completely at random laboratory data and to compare the effect of the imputed values on the accuracy of two clinical predictive models.

DESIGN

Retrospective cohort analysis of two large data sets.

SETTING

A tertiary level care institution in Ann Arbor, Michigan.

PARTICIPANTS

The Cirrhosis cohort had 446 patients and the Inflammatory Bowel Disease cohort had 395 patients.

METHODS

Non-missing laboratory data were randomly removed with varying frequencies from two large data sets, and we then compared the ability of four methods-missForest, mean imputation, nearest neighbour imputation and multivariate imputation by chained equations (MICE)-to impute the simulated missing data. We characterised the accuracy of the imputation and the effect of the imputation on predictive ability in two large data sets.

RESULTS

MissForest had the least imputation error for both continuous and categorical variables at each frequency of missingness, and it had the smallest prediction difference when models used imputed laboratory values. In both data sets, MICE had the second least imputation error and prediction difference, followed by the nearest neighbour and mean imputation.

CONCLUSIONS

MissForest is a highly accurate method of imputation for missing laboratory data and outperforms other common imputation techniques in terms of imputation error and maintenance of predictive ability with imputed values in two clinical predicative models.

摘要

目的

缺失的实验室数据是一个常见的问题,但最佳的缺失值插补方法尚未确定。我们的研究目的是比较四种完全随机缺失的实验室数据插补方法的准确性,并比较插补值对两种临床预测模型准确性的影响。

设计

对两个大型数据集进行回顾性队列分析。

地点

密歇根州安阿伯市的一个三级护理机构。

参与者

肝硬化队列有 446 名患者,炎症性肠病队列有 395 名患者。

方法

两种大型数据集的非缺失实验室数据随机以不同频率缺失,并比较四种方法(missForest、均值插补、最近邻插补和链式方程多元插补(MICE))模拟缺失数据的插补能力。我们描述了插补的准确性以及插补对两种大型数据集预测能力的影响。

结果

在每种缺失频率下,missForest 对连续和分类变量的插补误差最小,当模型使用插补的实验室值时,其预测差异最小。在两个数据集中,MICE 的插补误差和预测差异最小,其次是最近邻和均值插补。

结论

missForest 是一种高度准确的缺失实验室数据插补方法,在两种临床预测模型中,在插补误差和维持插补值的预测能力方面优于其他常见的插补技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af6/3733317/1d6f3611c971/bmjopen2013002847f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af6/3733317/722abf542ac6/bmjopen2013002847f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af6/3733317/8953873c68ba/bmjopen2013002847f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af6/3733317/5ab19971e722/bmjopen2013002847f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af6/3733317/1d6f3611c971/bmjopen2013002847f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af6/3733317/722abf542ac6/bmjopen2013002847f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af6/3733317/8953873c68ba/bmjopen2013002847f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af6/3733317/5ab19971e722/bmjopen2013002847f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5af6/3733317/1d6f3611c971/bmjopen2013002847f04.jpg

相似文献

1
Comparison of imputation methods for missing laboratory data in medicine.医学中缺失实验室数据的插补方法比较。
BMJ Open. 2013 Aug 1;3(8):e002847. doi: 10.1136/bmjopen-2013-002847.
2
Generative adversarial networks for imputing missing data for big data clinical research.生成对抗网络在大数据临床研究中用于填补缺失数据。
BMC Med Res Methodol. 2021 Apr 20;21(1):78. doi: 10.1186/s12874-021-01272-3.
3
The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.预后模型的性能取决于缺失值插补算法的选择:一项模拟研究。
J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24.
4
missForest with feature selection using binary particle swarm optimization improves the imputation accuracy of continuous data.使用二进制粒子群优化进行特征选择的 missForest 提高了连续数据的插补准确性。
Genes Genomics. 2022 Jun;44(6):651-658. doi: 10.1007/s13258-022-01247-8. Epub 2022 Apr 6.
5
A nonparametric multiple imputation approach for missing categorical data.一种针对缺失分类数据的非参数多重填补方法。
BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2.
6
A real data-driven simulation strategy to select an imputation method for mixed-type trait data.一种基于真实数据驱动的选择混合类型性状数据插补方法的模拟策略。
PLoS Comput Biol. 2023 Mar 22;19(3):e1010154. doi: 10.1371/journal.pcbi.1010154. eCollection 2023 Mar.
7
MissForest--non-parametric missing value imputation for mixed-type data.MissForest--用于混合类型数据的非参数缺失值插补。
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.
8
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
9
Handling missing data in a rheumatoid arthritis registry using random forest approach.采用随机森林方法处理类风湿关节炎注册研究中的缺失数据。
Int J Rheum Dis. 2021 Oct;24(10):1282-1293. doi: 10.1111/1756-185X.14203. Epub 2021 Aug 12.
10
Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction.基于随机森林的缺失数据插补在非正态性、非线性和交互作用存在下的准确性。
BMC Med Res Methodol. 2020 Jul 25;20(1):199. doi: 10.1186/s12874-020-01080-1.

引用本文的文献

1
Benchmarking Missing Data Imputation Methods for Time Series Using Real-World Test Cases.使用实际测试案例对时间序列的缺失数据插补方法进行基准测试。
Proc Mach Learn Res. 2025 Jun;287:480-501.
2
Predicting the Risk of Deep Venous Thrombosis in Elderly Patients: A Comparative Analysis of Seven Machine Learning Models.预测老年患者深静脉血栓形成的风险:七种机器学习模型的比较分析
Clin Appl Thromb Hemost. 2025 Jan-Dec;31:10760296251375842. doi: 10.1177/10760296251375842. Epub 2025 Sep 2.
3
Perinatal insult dimensions and developmental trajectories of psychotic-like experiences.

本文引用的文献

1
Machine learning algorithms outperform conventional regression models in predicting development of hepatocellular carcinoma.机器学习算法在预测肝细胞癌的发生方面优于传统的回归模型。
Am J Gastroenterol. 2013 Nov;108(11):1723-30. doi: 10.1038/ajg.2013.332. Epub 2013 Oct 29.
2
Effectiveness of hepatocellular carcinoma surveillance in patients with cirrhosis.肝硬化患者肝癌监测的效果。
Cancer Epidemiol Biomarkers Prev. 2012 May;21(5):793-9. doi: 10.1158/1055-9965.EPI-11-1005. Epub 2012 Feb 28.
3
MissForest--non-parametric missing value imputation for mixed-type data.
围产期损伤维度与类精神病体验的发展轨迹
Schizophrenia (Heidelb). 2025 Aug 25;11(1):115. doi: 10.1038/s41537-025-00662-6.
4
Bridging the Gap: Missing Data Imputation Methods and Their Effect on Dementia Classification Performance.弥合差距:缺失数据插补方法及其对痴呆症分类性能的影响。
Brain Sci. 2025 Jun 13;15(6):639. doi: 10.3390/brainsci15060639.
5
Risk factors, behaviors, and adverse health outcomes of financial toxicity in adult chronic myeloid leukemia survivors: a theory-based structural equation model.成年慢性髓性白血病幸存者中财务毒性的风险因素、行为及不良健康结局:基于理论的结构方程模型
Support Care Cancer. 2025 Jun 5;33(7):542. doi: 10.1007/s00520-025-09618-z.
6
Development and validation of a nomogram for predicting the occurrence of acute kidney injury in patients with pulmonary embolism.用于预测肺栓塞患者急性肾损伤发生的列线图的开发与验证
Ren Fail. 2025 Dec;47(1):2510003. doi: 10.1080/0886022X.2025.2510003. Epub 2025 Jun 3.
7
Development and validation of an interpretable nomogram for predicting the risk of the prolonged postoperative length of stay for tuberculous spondylitis: a novel approach for risk stratification.用于预测结核性脊柱炎术后住院时间延长风险的可解释列线图的开发与验证:一种新的风险分层方法
BMC Musculoskelet Disord. 2025 Jun 2;26(1):539. doi: 10.1186/s12891-025-08807-5.
8
Machine Learning Computer Vision Point of Care Decision Support of Echocardiographic Identification of Hypertrophic Cardiomyopathy.机器学习计算机视觉在超声心动图识别肥厚型心肌病中的床旁决策支持
JACC Adv. 2025 May;4(5):101746. doi: 10.1016/j.jacadv.2025.101746.
9
Rigorous validation of machine learning in laboratory medicine: guidance toward quality improvement.实验室医学中机器学习的严格验证:质量改进指南。
Crit Rev Clin Lab Sci. 2025 Aug;62(5):327-346. doi: 10.1080/10408363.2025.2488842. Epub 2025 Apr 17.
10
Whole-brain white matter variation across childhood environments.童年环境中全脑白质的变化
Proc Natl Acad Sci U S A. 2025 Apr 15;122(15):e2409985122. doi: 10.1073/pnas.2409985122. Epub 2025 Apr 7.
MissForest--用于混合类型数据的非参数缺失值插补。
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.
4
Algorithms outperform metabolite tests in predicting response of patients with inflammatory bowel disease to thiopurines.算法在预测炎症性肠病患者对硫嘌呤类药物的反应方面优于代谢物检测。
Clin Gastroenterol Hepatol. 2010 Feb;8(2):143-50. doi: 10.1016/j.cgh.2009.09.031. Epub 2009 Oct 14.
5
Missing value estimation methods for DNA microarrays.DNA微阵列的缺失值估计方法。
Bioinformatics. 2001 Jun;17(6):520-5. doi: 10.1093/bioinformatics/17.6.520.
6
Multiple imputation of missing blood pressure covariates in survival analysis.生存分析中缺失血压协变量的多重填补
Stat Med. 1999 Mar 30;18(6):681-94. doi: 10.1002/(sici)1097-0258(19990330)18:6<681::aid-sim71>3.0.co;2-r.