一种具有不确定性的缺失数据插补和分类的新分析框架：缺失数据插补和心力衰竭再入院预测。

A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction.

机构信息

Department of Industrial, Manufacturing and Systems Engineering, Texas Tech University, Lubbock, TX, United States of America.

出版信息

PLoS One. 2020 Sep 21;15(9):e0237724. doi: 10.1371/journal.pone.0237724. eCollection 2020.

DOI:10.1371/journal.pone.0237724

PMID:32956366

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7505424/

Abstract

BACKGROUND

The wide adoption of electronic health records (EHR) system has provided vast opportunities to advance health care services. However, the prevalence of missing values in EHR system poses a great challenge on data analysis to support clinical decision-making. The objective of this study is to develop a new methodological framework that can address the missing data challenge and provide a reliable tool to predict the hospital readmission among Heart Failure patients.

METHODS

We used Gaussian Process Latent Variable Model (GPLVM) to impute the missing values. Specifically, a lower dimensional embedding was learned from a small complete dataset and then used to impute the missing values in the incomplete dataset. The GPLVM-based missing data imputation can provide both the mean estimate and the uncertainty associated with the mean estimate. To incorporate the uncertainty in prediction, a constrained support vector machine (cSVM) was developed to obtain robust predictions. We first sampled multiple datasets from the distributions of input uncertainty and trained a support vector machine for each dataset. Then an optimal classifier was identified by selecting the support vectors that maximize the separation margin of a newly sampled dataset and minimize the similarity with the pre-trained support vectors.

RESULTS

The proposed model was derived and validated using Physionet MIMIC-III clinical database. The GPLVM imputation provided normalized mean absolute errors of 0.11 and 0.12 respectively when 20% and 30% of instances contained missing values, and the confidence bounds of the estimations captures 97% of the true values. The cSVM model provided an average Area Under Curve of 0.68, which improves the prediction accuracy by 7% as compared to some existing classifiers.

CONCLUSIONS

The proposed method provides accurate imputation of missing values and has a better prediction performance as compared to existing models that can only deal with deterministic inputs.

摘要

背景

电子健康记录 (EHR) 系统的广泛采用为提升医疗服务提供了广阔的机会。然而，EHR 系统中缺失值的普遍存在给支持临床决策的数据分析带来了巨大挑战。本研究的目的是开发一种新的方法框架，以解决缺失数据的挑战，并提供一种可靠的工具来预测心力衰竭患者的医院再入院率。

方法

我们使用高斯过程潜在变量模型 (GPLVM) 进行缺失值插补。具体来说，从一个小的完整数据集学习一个低维嵌入，然后用于插补不完整数据集中的缺失值。基于 GPLVM 的缺失值插补可以提供均值估计和与均值估计相关的不确定性。为了在预测中纳入不确定性，开发了约束支持向量机 (cSVM) 以获得稳健的预测。我们首先从输入不确定性的分布中采样多个数据集，并为每个数据集训练一个支持向量机。然后，通过选择最大化新采样数据集的分离边界并最小化与预训练支持向量的相似性的支持向量来确定最优分类器。

结果

该模型是使用 Physionet MIMIC-III 临床数据库推导和验证的。当 20%和 30%的实例包含缺失值时，GPLVM 插补分别提供了归一化平均绝对误差 0.11 和 0.12，并且估计的置信区间捕获了 97%的真实值。cSVM 模型提供了平均 AUC 为 0.68，与一些现有的分类器相比，预测准确性提高了 7%。

结论

与只能处理确定性输入的现有模型相比，该方法提供了缺失值的准确插补，并且具有更好的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6678/7505424/cd214afa9a3c/pone.0237724.g001.jpg

相似文献

A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction.一种具有不确定性的缺失数据插补和分类的新分析框架：缺失数据插补和心力衰竭再入院预测。

PLoS One. 2020 Sep 21;15(9):e0237724. doi: 10.1371/journal.pone.0237724. eCollection 2020.

Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。

PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.

On mining incomplete medical datasets: Ordering imputation and classification.关于挖掘不完整医学数据集：排序插补与分类。

Technol Health Care. 2015;23(5):619-25. doi: 10.3233/THC-151018.

Uncertainty-Gated Stochastic Sequential Model for EHR Mortality Prediction.基于不确定性门控的电子病历死亡率预测随机序贯模型。

IEEE Trans Neural Netw Learn Syst. 2021 Sep;32(9):4052-4062. doi: 10.1109/TNNLS.2020.3016670. Epub 2021 Aug 31.

Improving Outcome Predictions for Patients Receiving Mechanical Circulatory Support by Optimizing Imputation of Missing Values.通过优化缺失值的插补来提高接受机械循环支持的患者的预后预测。

Circ Cardiovasc Qual Outcomes. 2021 Sep;14(9):e007071. doi: 10.1161/CIRCOUTCOMES.120.007071. Epub 2021 Sep 14.

Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example.机器学习缺失数据插补技术在临床决策中的应用：以自发性幕上脑出血患者出院评估为例。

BMC Med Inform Decis Mak. 2022 Jan 13;22(1):13. doi: 10.1186/s12911-022-01752-6.

Autopopulus: A Novel Framework for Autoencoder Imputation on Large Clinical Datasets.自动填充：一种用于大型临床数据集上自动编码器插补的新框架。

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2303-2309. doi: 10.1109/EMBC46164.2021.9630135.

Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.并行缺失值插补：一种用于微阵列数据的新型稳健缺失值估计算法。

Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.

A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study.使用运营数据集推导和验证高性能预测模型的三步法：充血性心力衰竭再入院案例研究。

BMC Med Inform Decis Mak. 2014 May 27;14:41. doi: 10.1186/1472-6947-14-41.

Integration of genetic and clinical information to improve imputation of data missing from electronic health records.整合遗传和临床信息，以改善电子健康记录中缺失数据的推断。

J Am Med Inform Assoc. 2019 Oct 1;26(10):1056-1063. doi: 10.1093/jamia/ocz041.

引用本文的文献

Predicting ICU Readmission from Electronic Health Records via BERTopic with Long Short Term Memory Network Approach.通过带有长短期记忆网络方法的BERTopic从电子健康记录预测重症监护病房再入院情况。

J Clin Med. 2024 Sep 18;13(18):5503. doi: 10.3390/jcm13185503.

Data-Fusion-Based Quality Enhancement for HR Measurements Collected by Wearable Sensors.基于数据融合的可穿戴传感器采集 HR 测量质量增强。

Sensors (Basel). 2024 May 7;24(10):2970. doi: 10.3390/s24102970.

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research.缺失数据很重要：缺失电子健康记录数据对比较有效性研究影响的实证评估。

J Am Med Inform Assoc. 2023 Jun 20;30(7):1246-1256. doi: 10.1093/jamia/ocad066.

Applying an Improved Stacking Ensemble Model to Predict the Mortality of ICU Patients with Heart Failure.应用改进的堆叠集成模型预测重症监护病房心力衰竭患者的死亡率。

J Clin Med. 2022 Oct 31;11(21):6460. doi: 10.3390/jcm11216460.

Predicting the Mortality of ICU Patients by Topic Model with Machine-Learning Techniques.运用机器学习技术的主题模型预测重症监护病房患者的死亡率

Healthcare (Basel). 2022 Jun 11;10(6):1087. doi: 10.3390/healthcare10061087.

Prediction of unplanned 30-day readmission for ICU patients with heart failure.预测 ICU 心力衰竭患者的 30 天非计划性再入院。

BMC Med Inform Decis Mak. 2022 May 2;22(1):117. doi: 10.1186/s12911-022-01857-y.

Assessing the Value of Unsupervised Clustering in Predicting Persistent High Health Care Utilizers: Retrospective Analysis of Insurance Claims Data.评估无监督聚类在预测持续高医疗保健使用者方面的价值：保险理赔数据的回顾性分析

JMIR Med Inform. 2021 Nov 25;9(11):e31442. doi: 10.2196/31442.

Current Trends in Readmission Prediction: An Overview of Approaches.再入院预测的当前趋势：方法概述

Arab J Sci Eng. 2021 Aug 16:1-18. doi: 10.1007/s13369-021-06040-5.

本文引用的文献

Scalable and accurate deep learning with electronic health records.借助电子健康记录实现可扩展且准确的深度学习。

NPJ Digit Med. 2018 May 8;1:18. doi: 10.1038/s41746-018-0029-1. eCollection 2018.

Analysis and prediction of unplanned intensive care unit readmission using recurrent neural networks with long short-term memory.基于长短时记忆递归神经网络的非计划性重症监护病房再入院分析与预测。

PLoS One. 2019 Jul 8;14(7):e0218942. doi: 10.1371/journal.pone.0218942. eCollection 2019.

Machine learning-based prediction of heart failure readmission or death: implications of choosing the right model and the right metrics.基于机器学习的心力衰竭再入院或死亡预测：选择正确模型和指标的意义。

ESC Heart Fail. 2019 Apr;6(2):428-435. doi: 10.1002/ehf2.12419. Epub 2019 Feb 27.

Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease.机器学习模型在电子健康记录中可以优于传统的生存模型，用于预测冠心病患者的死亡率。

PLoS One. 2018 Aug 31;13(8):e0202344. doi: 10.1371/journal.pone.0202344. eCollection 2018.

Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis.电子健康记录中结构化缺失数据的特征描述与管理：数据分析

JMIR Med Inform. 2018 Feb 23;6(1):e11. doi: 10.2196/medinform.8960.

Predicting Unplanned Transfers to the Intensive Care Unit: A Machine Learning Approach Leveraging Diverse Clinical Elements.预测非计划转入重症监护病房：一种利用多种临床要素的机器学习方法

JMIR Med Inform. 2017 Nov 22;5(4):e45. doi: 10.2196/medinform.8680.

High Heart Failure Readmission Rates: Is It the Health System's Fault?高心力衰竭再入院率：这是医疗系统的错吗？

JACC Heart Fail. 2017 May;5(5):393. doi: 10.1016/j.jchf.2017.03.011.

Cost and mortality impact of an algorithm-driven sepsis prediction system.算法驱动的脓毒症预测系统的成本及死亡率影响

J Med Econ. 2017 Jun;20(6):646-651. doi: 10.1080/13696998.2017.1307203. Epub 2017 Apr 3.

Analysis of Machine Learning Techniques for Heart Failure Readmissions.心力衰竭再入院的机器学习技术分析

Circ Cardiovasc Qual Outcomes. 2016 Nov;9(6):629-640. doi: 10.1161/CIRCOUTCOMES.116.003039. Epub 2016 Nov 8.

Patient and clinical characteristics that heighten risk for heart failure readmission.可能增加心力衰竭再入院风险的患者和临床特征。

Res Social Adm Pharm. 2017 Nov;13(6):1070-1081. doi: 10.1016/j.sapharm.2016.11.002. Epub 2016 Nov 10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种具有不确定性的缺失数据插补和分类的新分析框架：缺失数据插补和心力衰竭再入院预测。

A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献