无监督深度学习在电子健康记录预测模型中的应用。

The application of unsupervised deep learning in predictive models using electronic health records.

机构信息

School of Statistics, Renmin University of China, 59 Zhong Guan Cun Ave, Hai Dian District, Beijing, People's Republic of China.

Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, 851 S Morgan St, Chicago, IL, 60607, USA.

出版信息

BMC Med Res Methodol. 2020 Feb 26;20(1):37. doi: 10.1186/s12874-020-00923-1.

DOI:10.1186/s12874-020-00923-1

PMID:32101147

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7043035/

Abstract

BACKGROUND

The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks.

METHODS

We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals.

RESULTS

On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper.

CONCLUSIONS

We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.

摘要

背景

本研究的主要目的是探索使用无监督深度学习算法自动编码器生成的代表患者级电子健康记录（EHR）数据的特征进行预测建模。由于自动编码器特征是无监督的，因此本文侧重于它们在各种预测任务中对 EHR 信息的通用低维表示。

方法

我们将具有自动编码器特征的模型与传统模型进行比较：逻辑模型最小绝对收缩和选择算子（LASSO）和随机森林算法。此外，我们还包括一个使用响应特定变量的小子集的预测模型（Simple Reg）和一个将这些变量与自动编码器特征相结合的模型（Enhanced Reg）。我们首先在模拟数据上进行了研究，该模拟数据模拟了真实世界的 EHR 数据，然后在来自八个 Advocate 医院的实际 EHR 数据上进行了研究。

结果

在具有错误类别和缺失数据的模拟数据上，当固定召回率为 0.7 时，自动编码器的精度为 24.16%，高于随机森林（23.61%），低于 LASSO（25.32%）。Simple Reg 的精度为 20.92%，在增强的 Reg 中提高到 24.89%。当使用实际的 EHR 数据来预测 30 天再入院率时，自动编码器的精度为 19.04%，再次高于随机森林（18.48%），低于 LASSO（19.70%）。Simple Reg 和增强的 Reg 的精度分别为 18.70%和 19.69%。也就是说，增强的 Reg 可以与 LASSO 具有竞争力的预测性能。此外，结果表明，在本文模拟的设置下，增强的 Reg 通常依赖于更少的特征。

结论

我们得出结论，自动编码器可以创建有用的特征来表示整个 EHR 数据空间，并适用于广泛的预测任务。与重要的响应特定预测因子一起，我们可以在数据提取和模型训练方面减少工作量，从而获得高效且稳健的预测模型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

无监督深度学习在电子健康记录预测模型中的应用。

The application of unsupervised deep learning in predictive models using electronic health records.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

无监督深度学习在电子健康记录预测模型中的应用。

The application of unsupervised deep learning in predictive models using electronic health records.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献