Suppr超能文献

一种具有不确定性的缺失数据插补和分类的新分析框架:缺失数据插补和心力衰竭再入院预测。

A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction.

机构信息

Department of Industrial, Manufacturing and Systems Engineering, Texas Tech University, Lubbock, TX, United States of America.

出版信息

PLoS One. 2020 Sep 21;15(9):e0237724. doi: 10.1371/journal.pone.0237724. eCollection 2020.

Abstract

BACKGROUND

The wide adoption of electronic health records (EHR) system has provided vast opportunities to advance health care services. However, the prevalence of missing values in EHR system poses a great challenge on data analysis to support clinical decision-making. The objective of this study is to develop a new methodological framework that can address the missing data challenge and provide a reliable tool to predict the hospital readmission among Heart Failure patients.

METHODS

We used Gaussian Process Latent Variable Model (GPLVM) to impute the missing values. Specifically, a lower dimensional embedding was learned from a small complete dataset and then used to impute the missing values in the incomplete dataset. The GPLVM-based missing data imputation can provide both the mean estimate and the uncertainty associated with the mean estimate. To incorporate the uncertainty in prediction, a constrained support vector machine (cSVM) was developed to obtain robust predictions. We first sampled multiple datasets from the distributions of input uncertainty and trained a support vector machine for each dataset. Then an optimal classifier was identified by selecting the support vectors that maximize the separation margin of a newly sampled dataset and minimize the similarity with the pre-trained support vectors.

RESULTS

The proposed model was derived and validated using Physionet MIMIC-III clinical database. The GPLVM imputation provided normalized mean absolute errors of 0.11 and 0.12 respectively when 20% and 30% of instances contained missing values, and the confidence bounds of the estimations captures 97% of the true values. The cSVM model provided an average Area Under Curve of 0.68, which improves the prediction accuracy by 7% as compared to some existing classifiers.

CONCLUSIONS

The proposed method provides accurate imputation of missing values and has a better prediction performance as compared to existing models that can only deal with deterministic inputs.

摘要

背景

电子健康记录 (EHR) 系统的广泛采用为提升医疗服务提供了广阔的机会。然而,EHR 系统中缺失值的普遍存在给支持临床决策的数据分析带来了巨大挑战。本研究的目的是开发一种新的方法框架,以解决缺失数据的挑战,并提供一种可靠的工具来预测心力衰竭患者的医院再入院率。

方法

我们使用高斯过程潜在变量模型 (GPLVM) 进行缺失值插补。具体来说,从一个小的完整数据集学习一个低维嵌入,然后用于插补不完整数据集中的缺失值。基于 GPLVM 的缺失值插补可以提供均值估计和与均值估计相关的不确定性。为了在预测中纳入不确定性,开发了约束支持向量机 (cSVM) 以获得稳健的预测。我们首先从输入不确定性的分布中采样多个数据集,并为每个数据集训练一个支持向量机。然后,通过选择最大化新采样数据集的分离边界并最小化与预训练支持向量的相似性的支持向量来确定最优分类器。

结果

该模型是使用 Physionet MIMIC-III 临床数据库推导和验证的。当 20%和 30%的实例包含缺失值时,GPLVM 插补分别提供了归一化平均绝对误差 0.11 和 0.12,并且估计的置信区间捕获了 97%的真实值。cSVM 模型提供了平均 AUC 为 0.68,与一些现有的分类器相比,预测准确性提高了 7%。

结论

与只能处理确定性输入的现有模型相比,该方法提供了缺失值的准确插补,并且具有更好的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6678/7505424/cd214afa9a3c/pone.0237724.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验