• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于健康记录中缺失值插补的实用集成策略。

A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records.

作者信息

Batra Shivani, Khurana Rohan, Khan Mohammad Zubair, Boulila Wadii, Koubaa Anis, Srivastava Prakash

机构信息

Department of Computer Science and Engineering, KIET Group of Institutions, Delhi-NCR, Ghaziabad 201206, India.

Department of Computer Science and Information, Taibah University, Medina 42353, Saudi Arabia.

出版信息

Entropy (Basel). 2022 Apr 10;24(4):533. doi: 10.3390/e24040533.

DOI:10.3390/e24040533
PMID:35455196
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9030272/
Abstract

Pristine and trustworthy data are required for efficient computer modelling for medical decision-making, yet data in medical care is frequently missing. As a result, missing values may occur not just in training data but also in testing data that might contain a single undiagnosed episode or a participant. This study evaluates different imputation and regression procedures identified based on regressor performance and computational expense to fix the issues of missing values in both training and testing datasets. In the context of healthcare, several procedures are introduced for dealing with missing values. However, there is still a discussion concerning which imputation strategies are better in specific cases. This research proposes an ensemble imputation model that is educated to use a combination of simple mean imputation, k-nearest neighbour imputation, and iterative imputation methods, and then leverages them in a manner where the ideal imputation strategy is opted among them based on attribute correlations on missing value features. We introduce a unique Ensemble Strategy for Missing Value to analyse healthcare data with considerable missing values to identify unbiased and accurate prediction statistical modelling. The performance metrics have been generated using the eXtreme gradient boosting regressor, random forest regressor, and support vector regressor. The current study uses real-world healthcare data to conduct experiments and simulations of data with varying feature-wise missing frequencies indicating that the proposed technique surpasses standard missing value imputation approaches as well as the approach of dropping records holding missing values in terms of accuracy.

摘要

高效的医学决策计算机建模需要原始且可靠的数据,但医疗保健中的数据经常缺失。因此,缺失值不仅可能出现在训练数据中,还可能出现在可能包含单个未诊断病例或参与者的测试数据中。本研究评估了基于回归器性能和计算成本确定的不同插补和回归程序,以解决训练和测试数据集中的缺失值问题。在医疗保健领域,已经引入了几种处理缺失值的程序。然而,对于在特定情况下哪种插补策略更好仍存在讨论。本研究提出了一种集成插补模型,该模型通过使用简单均值插补、k近邻插补和迭代插补方法的组合进行训练,然后根据缺失值特征的属性相关性在这些方法中选择理想的插补策略来利用它们。我们引入了一种独特的缺失值集成策略,用于分析具有大量缺失值的医疗保健数据,以识别无偏且准确的预测统计模型。使用极端梯度提升回归器、随机森林回归器和支持向量回归器生成性能指标。当前研究使用真实世界的医疗保健数据进行实验和模拟,数据具有不同的按特征缺失频率,结果表明所提出的技术在准确性方面优于标准的缺失值插补方法以及丢弃包含缺失值记录的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/e8553a8d916c/entropy-24-00533-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/bc32fd77c530/entropy-24-00533-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/4827697cec89/entropy-24-00533-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/48c1d5f1141f/entropy-24-00533-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/a7e2327a0d2a/entropy-24-00533-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/97c07dac124e/entropy-24-00533-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/3dae327f9fbb/entropy-24-00533-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/2b04658b186f/entropy-24-00533-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/e8553a8d916c/entropy-24-00533-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/bc32fd77c530/entropy-24-00533-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/4827697cec89/entropy-24-00533-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/48c1d5f1141f/entropy-24-00533-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/a7e2327a0d2a/entropy-24-00533-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/97c07dac124e/entropy-24-00533-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/3dae327f9fbb/entropy-24-00533-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/2b04658b186f/entropy-24-00533-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e28/9030272/e8553a8d916c/entropy-24-00533-g008.jpg

相似文献

1
A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records.一种用于健康记录中缺失值插补的实用集成策略。
Entropy (Basel). 2022 Apr 10;24(4):533. doi: 10.3390/e24040533.
2
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
3
Extremely missing numerical data in Electronic Health Records for machine learning can be managed through simple imputation methods considering informative missingness: A comparative of solutions in a COVID-19 mortality case study.在电子健康记录中,针对机器学习的极度缺失数值数据可以通过考虑信息性缺失的简单插补方法来处理:一项关于COVID-19死亡率案例研究中各种解决方案的比较
Comput Methods Programs Biomed. 2023 Dec;242:107803. doi: 10.1016/j.cmpb.2023.107803. Epub 2023 Sep 7.
4
Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets.缺失数据插补方法对队列研究数据集预测建模效果的比较。
BMC Med Res Methodol. 2024 Feb 16;24(1):41. doi: 10.1186/s12874-024-02173-x.
5
A hybrid of whale optimization and late acceptance hill climbing based imputation to enhance classification performance in electronic health records.基于鲸鱼优化算法和后期接受爬山算法的混合插补方法提高电子健康记录中的分类性能。
J Biomed Inform. 2019 Jun;94:103190. doi: 10.1016/j.jbi.2019.103190. Epub 2019 May 2.
6
R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data.R-Ensembler:一种基于粗糙集的贪婪集成属性选择算法,具有 kNN 插补功能,用于医学数据的分类。
Comput Methods Programs Biomed. 2020 Feb;184:105122. doi: 10.1016/j.cmpb.2019.105122. Epub 2019 Oct 8.
7
Analyzing the Effect of Imputation on Classification Performance under MCAR and MAR Missing Mechanisms.分析在完全随机缺失(MCAR)和随机缺失(MAR)缺失机制下插补对分类性能的影响。
Entropy (Basel). 2023 Mar 17;25(3):521. doi: 10.3390/e25030521.
8
On mining incomplete medical datasets: Ordering imputation and classification.关于挖掘不完整医学数据集:排序插补与分类。
Technol Health Care. 2015;23(5):619-25. doi: 10.3233/THC-151018.
9
Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework.在多重填补框架内使用聚类和深度学习进行缺失值估计
Knowl Based Syst. 2022 Aug 5;249. doi: 10.1016/j.knosys.2022.108968. Epub 2022 May 10.
10
Robust imputation method with context-aware voting ensemble model for management of water-quality data.具有上下文感知投票集成模型的稳健插补方法用于水质数据管理。
Water Res. 2023 Sep 1;243:120369. doi: 10.1016/j.watres.2023.120369. Epub 2023 Jul 16.

引用本文的文献

1
Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset.概念框架作为选择临床结构化数据集中缺失值的适当插补方法的指南。
BMC Med Res Methodol. 2025 Feb 20;25(1):43. doi: 10.1186/s12874-025-02496-3.
2
Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review.识别处理临床结构化数据集缺失值的最合适插补方法:系统评价。
BMC Med Res Methodol. 2024 Aug 28;24(1):188. doi: 10.1186/s12874-024-02310-6.
3
A Machine Learning Framework for Diagnosing and Predicting the Severity of Coronary Artery Disease.

本文引用的文献

1
Missing Value Imputation Method for Multiclass Matrix Data Based on Closed Itemset.基于封闭项集的多类矩阵数据缺失值插补方法
Entropy (Basel). 2022 Feb 16;24(2):286. doi: 10.3390/e24020286.
2
One- and Two-Phase Software Requirement Classification Using Ensemble Deep Learning.使用集成深度学习的一阶段和两阶段软件需求分类
Entropy (Basel). 2021 Sep 28;23(10):1264. doi: 10.3390/e23101264.
3
Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data.在存在缺失数据的情况下,通过分类树、随机森林和XGBoost对青少年情绪障碍进行检测。
一种用于诊断和预测冠状动脉疾病严重程度的机器学习框架。
Rev Cardiovasc Med. 2023 Jun 8;24(6):168. doi: 10.31083/j.rcm2406168. eCollection 2023 Jun.
4
A novel hybrid supervised and unsupervised hierarchical ensemble for COVID-19 cases and mortality prediction.一种新型的混合监督和无监督层次集成方法,用于 COVID-19 病例和死亡率预测。
Sci Rep. 2024 Apr 29;14(1):9782. doi: 10.1038/s41598-024-60637-y.
5
New Classification Method for Independent Data Sources Using Pawlak Conflict Model and Decision Trees.基于 Pawlak 冲突模型和决策树的独立数据源新分类方法
Entropy (Basel). 2022 Nov 4;24(11):1604. doi: 10.3390/e24111604.
6
An Intelligent Sensor Based Decision Support System for Diagnosing Pulmonary Ailment through Standardized Chest X-ray Scans.基于智能传感器的决策支持系统,用于通过标准化的胸部 X 光扫描诊断肺部疾病。
Sensors (Basel). 2022 Oct 2;22(19):7474. doi: 10.3390/s22197474.
Entropy (Basel). 2021 Sep 14;23(9):1210. doi: 10.3390/e23091210.
4
Dataset of COVID-19 outbreak and potential predictive features in the USA.美国新冠肺炎疫情数据集及潜在预测特征
Data Brief. 2021 Oct;38:107360. doi: 10.1016/j.dib.2021.107360. Epub 2021 Sep 10.
5
Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy.具有动态选择策略的不平衡集成分类器的实验研究与比较
Entropy (Basel). 2021 Jun 28;23(7):822. doi: 10.3390/e23070822.
6
Shaping a data-driven era in dementia care pathway through computational neurology approaches.通过计算神经科学方法在痴呆症护理路径中塑造数据驱动的时代。
BMC Med. 2020 Dec 16;18(1):398. doi: 10.1186/s12916-020-01841-1.
7
Ensemble Learning Using Fuzzy Weights to Improve Learning Style Identification for Adapted Instructional Routines.使用模糊权重的集成学习以改进用于适应性教学程序的学习风格识别
Entropy (Basel). 2020 Jul 2;22(7):735. doi: 10.3390/e22070735.
8
An Improved Method of Handling Missing Values in the Analysis of Sample Entropy for Continuous Monitoring of Physiological Signals.一种在生理信号连续监测的样本熵分析中处理缺失值的改进方法。
Entropy (Basel). 2019 Mar 12;21(3):274. doi: 10.3390/e21030274.
9
Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis.电子健康记录中结构化缺失数据的特征描述与管理:数据分析
JMIR Med Inform. 2018 Feb 23;6(1):e11. doi: 10.2196/medinform.8960.
10
Missing data and multiple imputation in clinical epidemiological research.临床流行病学研究中的缺失数据与多重填补
Clin Epidemiol. 2017 Mar 15;9:157-166. doi: 10.2147/CLEP.S129785. eCollection 2017.