• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超越医学统计学:电子健康记录中缺失数据处理的系统评价

Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.

作者信息

Ren Wenhui, Liu Zheng, Wu Yanqiu, Zhang Zhilong, Hong Shenda, Liu Huixin

机构信息

Department of Clinical Epidemiology and Biostatistics, Peking University People's Hospital, Beijing, China.

National Institute of Health Data Science, Peking University, Beijing, China.

出版信息

Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024.

DOI:10.34133/hds.0176
PMID:39635227
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11615160/
Abstract

Missing data in electronic health records (EHRs) presents significant challenges in medical studies. Many methods have been proposed, but uncertainty exists regarding the current state of missing data addressing methods applied for EHR and which strategy performs better within specific contexts. All studies referencing EHR and missing data methods published from their inception until 2024 March 30 were searched via the MEDLINE, EMBASE, and Digital Bibliography and Library Project databases. The characteristics of the included studies were extracted. We also compared the performance of various methods under different missingness scenarios. After screening, 46 studies published between 2010 and 2024 were included. Three missingness mechanisms were simulated when evaluating the missing data methods: missing completely at random (29/46), missing at random (20/46), and missing not at random (21/46). Multiple imputation by chained equations (MICE) was the most popular statistical method, whereas generative adversarial network-based methods and the k nearest neighbor (KNN) classification were the common deep-learning-based or traditional machine-learning-based methods, respectively. Among the 26 articles comparing the performance among medical statistical and machine learning approaches, traditional machine learning or deep learning methods generally outperformed statistical methods. Med.KNN and context-aware time-series imputation performed better for longitudinal datasets, whereas probabilistic principal component analysis and MICE-based methods were optimal for cross-sectional datasets. Machine learning methods show significant promise for addressing missing data in EHRs. However, no single approach provides a universally generalizable solution. Standardized benchmarking analyses are essential to evaluate these methods across different missingness scenarios.

摘要

电子健康记录(EHR)中的缺失数据给医学研究带来了重大挑战。虽然已经提出了许多方法,但对于EHR中应用的缺失数据处理方法的现状以及哪种策略在特定情况下表现更好仍存在不确定性。通过MEDLINE、EMBASE以及数字文献与图书馆项目数据库,检索了从开始到2024年3月30日发表的所有引用EHR和缺失数据方法的研究。提取了纳入研究的特征。我们还比较了不同缺失情况下来各种方法的性能。经过筛选,纳入了2010年至2024年间发表的46项研究。在评估缺失数据方法时,模拟了三种缺失机制:完全随机缺失(46项中的29项)、随机缺失(46项中的20项)和非随机缺失(46项中的21项)。链式方程多重插补(MICE)是最常用的统计方法,而基于生成对抗网络的方法和k近邻(KNN)分类分别是常见的基于深度学习或传统机器学习的方法。在比较医学统计和机器学习方法性能的26篇文章中,传统机器学习或深度学习方法通常优于统计方法。Med.KNN和上下文感知时间序列插补在纵向数据集中表现更好,而概率主成分分析和基于MICE的方法在横断面数据集中表现最佳。机器学习方法在解决EHR中的缺失数据方面显示出巨大潜力。然而,没有一种方法能提供普遍适用的解决方案。标准化的基准分析对于评估这些方法在不同缺失情况下的性能至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea13/11615160/c07bb8f0e53b/hds.0176.fig.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea13/11615160/c07bb8f0e53b/hds.0176.fig.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea13/11615160/c07bb8f0e53b/hds.0176.fig.001.jpg

相似文献

1
Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.超越医学统计学:电子健康记录中缺失数据处理的系统评价
Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024.
2
Performance of Multiple Imputation Using Modern Machine Learning Methods in Electronic Health Records Data.基于现代机器学习方法在电子健康记录数据中的应用表现。
Epidemiology. 2023 Mar 1;34(2):206-215. doi: 10.1097/EDE.0000000000001578. Epub 2022 Dec 9.
3
Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data.处理纵向缺失数据的插补与缺失指示符:基于电子健康记录数据的模拟分析
JMIR Med Inform. 2025 Mar 13;13:e64354. doi: 10.2196/64354.
4
Extremely missing numerical data in Electronic Health Records for machine learning can be managed through simple imputation methods considering informative missingness: A comparative of solutions in a COVID-19 mortality case study.在电子健康记录中,针对机器学习的极度缺失数值数据可以通过考虑信息性缺失的简单插补方法来处理:一项关于COVID-19死亡率案例研究中各种解决方案的比较
Comput Methods Programs Biomed. 2023 Dec;242:107803. doi: 10.1016/j.cmpb.2023.107803. Epub 2023 Sep 7.
5
Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation Methods.应对老年健康监测中的数据缺失挑战:统计与机器学习插补方法研究
Sensors (Basel). 2025 Jan 21;25(3):614. doi: 10.3390/s25030614.
6
A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets.基于临床条件生成对抗网络的新型缺失数据插补方法在电子健康记录数据集的应用。
Comput Biol Med. 2023 Sep;163:107188. doi: 10.1016/j.compbiomed.2023.107188. Epub 2023 Jun 22.
7
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
8
Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis.电子健康记录中结构化缺失数据的特征描述与管理:数据分析
JMIR Med Inform. 2018 Feb 23;6(1):e11. doi: 10.2196/medinform.8960.
9
Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review.识别处理临床结构化数据集缺失值的最合适插补方法:系统评价。
BMC Med Res Methodol. 2024 Aug 28;24(1):188. doi: 10.1186/s12874-024-02310-6.
10
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

1
Benchmarking Missing Data Imputation Methods for Time Series Using Real-World Test Cases.使用实际测试案例对时间序列的缺失数据插补方法进行基准测试。
Proc Mach Learn Res. 2025 Jun;287:480-501.
2
Missing data imputation of climate time series: A review.气候时间序列的缺失数据插补:综述
MethodsX. 2025 Jun 19;15:103455. doi: 10.1016/j.mex.2025.103455. eCollection 2025 Dec.
3
Leveraging Kaizen with Process Mining in Healthcare Settings: A Conceptual Framework for Data-Driven Continuous Improvement.在医疗环境中结合改善法与流程挖掘:一个数据驱动的持续改进概念框架

本文引用的文献

1
A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks.基于生成对抗网络的电子健康记录中不完全和不平衡数据的联合学习方法。
Comput Biol Med. 2024 Jan;168:107687. doi: 10.1016/j.compbiomed.2023.107687. Epub 2023 Nov 14.
2
A Multilevel Primary Care Intervention to Improve Follow-Up of Overdue Abnormal Cancer Screening Test Results: A Cluster Randomized Clinical Trial.多层面初级保健干预以改善逾期异常癌症筛查检测结果的随访:一项群组随机临床试验。
JAMA. 2023 Oct 10;330(14):1348-1358. doi: 10.1001/jama.2023.18755.
3
The impact of imputation quality on machine learning classifiers for datasets with missing values.
Healthcare (Basel). 2025 Apr 19;13(8):941. doi: 10.3390/healthcare13080941.
4
miss-SNF: a multimodal patient similarity network integration approach to handle completely missing data sources.缺失值-SNF:一种用于处理完全缺失数据源的多模态患者相似性网络集成方法。
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf150.
插补质量对具有缺失值数据集的机器学习分类器的影响。
Commun Med (Lond). 2023 Oct 6;3(1):139. doi: 10.1038/s43856-023-00356-z.
4
Extremely missing numerical data in Electronic Health Records for machine learning can be managed through simple imputation methods considering informative missingness: A comparative of solutions in a COVID-19 mortality case study.在电子健康记录中,针对机器学习的极度缺失数值数据可以通过考虑信息性缺失的简单插补方法来处理:一项关于COVID-19死亡率案例研究中各种解决方案的比较
Comput Methods Programs Biomed. 2023 Dec;242:107803. doi: 10.1016/j.cmpb.2023.107803. Epub 2023 Sep 7.
5
MVIRA: A model based on Missing Value Imputation and Reliability Assessment for mortality risk prediction.MVIRA:一种基于缺失值插补和可靠性评估的死亡率风险预测模型。
Int J Med Inform. 2023 Oct;178:105191. doi: 10.1016/j.ijmedinf.2023.105191. Epub 2023 Aug 14.
6
Sepsis Prediction Model for Determining Sepsis vs SIRS, qSOFA, and SOFA.用于区分脓毒症与全身炎症反应综合征(qSOFA 和 SOFA)的脓毒症预测模型。
JAMA Netw Open. 2023 Aug 1;6(8):e2329729. doi: 10.1001/jamanetworkopen.2023.29729.
7
Endovascular Aneurysm Repair Devices as a Use Case for Postmarketing Surveillance of Medical Devices.血管内动脉瘤修复装置:医疗器械上市后监测的应用案例
JAMA Intern Med. 2023 Oct 1;183(10):1090-1097. doi: 10.1001/jamainternmed.2023.3562.
8
Methodological issues of the electronic health records' use in the context of epidemiological investigations, in light of missing data: a review of the recent literature.电子健康记录在流行病学调查中应用的方法学问题,鉴于数据缺失:对近期文献的综述。
BMC Med Res Methodol. 2023 Aug 9;23(1):180. doi: 10.1186/s12874-023-02004-5.
9
Comparative Effectiveness of Anticoagulants in Patients With Cancer-Associated Thrombosis.癌症相关血栓患者的抗凝药物比较疗效。
JAMA Netw Open. 2023 Jul 3;6(7):e2325283. doi: 10.1001/jamanetworkopen.2023.25283.
10
Deep imputation of missing values in time series health data: A review with benchmarking.时间序列健康数据中缺失值的深度插补:综述与基准测试。
J Biomed Inform. 2023 Aug;144:104440. doi: 10.1016/j.jbi.2023.104440. Epub 2023 Jul 8.