• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无监督深度学习在电子健康记录预测模型中的应用。

The application of unsupervised deep learning in predictive models using electronic health records.

机构信息

School of Statistics, Renmin University of China, 59 Zhong Guan Cun Ave, Hai Dian District, Beijing, People's Republic of China.

Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, 851 S Morgan St, Chicago, IL, 60607, USA.

出版信息

BMC Med Res Methodol. 2020 Feb 26;20(1):37. doi: 10.1186/s12874-020-00923-1.

DOI:10.1186/s12874-020-00923-1
PMID:32101147
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7043035/
Abstract

BACKGROUND

The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks.

METHODS

We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals.

RESULTS

On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper.

CONCLUSIONS

We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.

摘要

背景

本研究的主要目的是探索使用无监督深度学习算法自动编码器生成的代表患者级电子健康记录(EHR)数据的特征进行预测建模。由于自动编码器特征是无监督的,因此本文侧重于它们在各种预测任务中对 EHR 信息的通用低维表示。

方法

我们将具有自动编码器特征的模型与传统模型进行比较:逻辑模型最小绝对收缩和选择算子(LASSO)和随机森林算法。此外,我们还包括一个使用响应特定变量的小子集的预测模型(Simple Reg)和一个将这些变量与自动编码器特征相结合的模型(Enhanced Reg)。我们首先在模拟数据上进行了研究,该模拟数据模拟了真实世界的 EHR 数据,然后在来自八个 Advocate 医院的实际 EHR 数据上进行了研究。

结果

在具有错误类别和缺失数据的模拟数据上,当固定召回率为 0.7 时,自动编码器的精度为 24.16%,高于随机森林(23.61%),低于 LASSO(25.32%)。Simple Reg 的精度为 20.92%,在增强的 Reg 中提高到 24.89%。当使用实际的 EHR 数据来预测 30 天再入院率时,自动编码器的精度为 19.04%,再次高于随机森林(18.48%),低于 LASSO(19.70%)。Simple Reg 和增强的 Reg 的精度分别为 18.70%和 19.69%。也就是说,增强的 Reg 可以与 LASSO 具有竞争力的预测性能。此外,结果表明,在本文模拟的设置下,增强的 Reg 通常依赖于更少的特征。

结论

我们得出结论,自动编码器可以创建有用的特征来表示整个 EHR 数据空间,并适用于广泛的预测任务。与重要的响应特定预测因子一起,我们可以在数据提取和模型训练方面减少工作量,从而获得高效且稳健的预测模型。

相似文献

1
The application of unsupervised deep learning in predictive models using electronic health records.无监督深度学习在电子健康记录预测模型中的应用。
BMC Med Res Methodol. 2020 Feb 26;20(1):37. doi: 10.1186/s12874-020-00923-1.
2
Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes.利用电子病历中的行政索赔数据进行机器学习方法与传统模型预测心力衰竭结局的比较。
JAMA Netw Open. 2020 Jan 3;3(1):e1918962. doi: 10.1001/jamanetworkopen.2019.18962.
3
Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择
Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.
4
Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.使用树套索逻辑回归构建用于儿科医院再入院的可解释预测模型。
Artif Intell Med. 2016 Sep;72:12-21. doi: 10.1016/j.artmed.2016.07.003. Epub 2016 Jul 29.
5
Prediction of 30-Day All-Cause Readmissions in Patients Hospitalized for Heart Failure: Comparison of Machine Learning and Other Statistical Approaches.预测因心力衰竭住院患者的 30 天全因再入院率:机器学习与其他统计学方法的比较。
JAMA Cardiol. 2017 Feb 1;2(2):204-209. doi: 10.1001/jamacardio.2016.3956.
6
CPAE: Contrastive predictive autoencoder for unsupervised pre-training in health status prediction.CPAE:用于健康状态预测中无监督预训练的对比预测自动编码器。
Comput Methods Programs Biomed. 2023 Jun;234:107484. doi: 10.1016/j.cmpb.2023.107484. Epub 2023 Mar 23.
7
Predicting post-stroke pneumonia using deep neural network approaches.使用深度神经网络方法预测卒中后肺炎。
Int J Med Inform. 2019 Dec;132:103986. doi: 10.1016/j.ijmedinf.2019.103986. Epub 2019 Oct 1.
8
Applying interpretable deep learning models to identify chronic cough patients using EHR data.应用可解释的深度学习模型,利用电子病历数据识别慢性咳嗽患者。
Comput Methods Programs Biomed. 2021 Oct;210:106395. doi: 10.1016/j.cmpb.2021.106395. Epub 2021 Sep 4.
9
Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project.使用链接的索赔-电子健康记录数据库对感兴趣的健康结果进行电子表型分析:来自机器学习试点项目的结果。
J Am Med Inform Assoc. 2021 Jul 14;28(7):1507-1517. doi: 10.1093/jamia/ocab036.
10
Predicting hospitalizations from electronic health record data.从电子健康记录数据预测住院情况。
Am J Manag Care. 2020 Jan 1;26(1):e7-e13. doi: 10.37765/ajmc.2020.42147.

引用本文的文献

1
A scoping review of self-supervised representation learning for clinical decision making using EHR categorical data.一项使用电子健康记录分类数据进行临床决策的自监督表征学习的范围综述。
NPJ Digit Med. 2025 Jun 14;8(1):362. doi: 10.1038/s41746-025-01692-1.
2
Integrating Multi-sensor Time-series Data for ALSFRS-R Clinical Scale Predictions in an ALS Patient Case Study.在一项肌萎缩侧索硬化症(ALS)患者案例研究中,整合多传感器时间序列数据用于ALS功能评定量表修订版(ALSFRS-R)临床量表预测
AMIA Annu Symp Proc. 2025 May 22;2024:788-797. eCollection 2024.
3
Development and validation of 'Patient Optimizer' (POP) algorithms for predicting surgical risk with machine learning.用于通过机器学习预测手术风险的“患者优化器”(POP)算法的开发与验证
BMC Med Inform Decis Mak. 2024 Mar 11;24(1):70. doi: 10.1186/s12911-024-02463-w.
4
Research Hotspots and Trends of Deep Learning in Critical Care Medicine: A Bibliometric and Visualized Study.重症医学中深度学习的研究热点与趋势:一项文献计量学与可视化研究
J Multidiscip Healthc. 2023 Jul 29;16:2155-2166. doi: 10.2147/JMDH.S420709. eCollection 2023.
5
Improving Diagnostics with Deep Forest Applied to Electronic Health Records.深度学习森林在电子健康记录中的应用提高诊断能力。
Sensors (Basel). 2023 Jul 21;23(14):6571. doi: 10.3390/s23146571.
6
Applications of deep learning for phishing detection: a systematic literature review.深度学习在网络钓鱼检测中的应用:一项系统的文献综述。
Knowl Inf Syst. 2022;64(6):1457-1500. doi: 10.1007/s10115-022-01672-x. Epub 2022 May 23.
7
Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study.基于嵌入技术的结构化电子病历患者表征:开发与验证研究
JMIR Med Inform. 2021 Jul 23;9(7):e19905. doi: 10.2196/19905.
8
Clinical applications of artificial intelligence in cardiology on the verge of the decade.人工智能在心脏病学中的临床应用即将进入十年。
Cardiol J. 2021;28(3):460-472. doi: 10.5603/CJ.a2020.0093. Epub 2020 Jul 10.
9
Inferring multimodal latent topics from electronic health records.从电子健康记录中推断多模态潜在主题。
Nat Commun. 2020 May 21;11(1):2536. doi: 10.1038/s41467-020-16378-3.

本文引用的文献

1
Treatment of missing data in follow-up studies of randomised controlled trials: A systematic review of the literature.随机对照试验随访研究中缺失数据的处理:文献系统评价
Clin Trials. 2017 Aug;14(4):387-395. doi: 10.1177/1740774517703319. Epub 2017 Apr 6.
2
Missing Value Imputation Improves Mortality Risk Prediction Following Cardiac Surgery: An Investigation of an Australian Patient Cohort.缺失值插补改善心脏手术后的死亡风险预测:对澳大利亚患者队列的一项调查。
Heart Lung Circ. 2017 Mar;26(3):301-308. doi: 10.1016/j.hlc.2016.06.1214. Epub 2016 Aug 8.
3
Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records.深度患者:一种从电子健康记录中预测患者未来的无监督表示。
Sci Rep. 2016 May 17;6:26094. doi: 10.1038/srep26094.
4
How to Establish Clinical Prediction Models.如何建立临床预测模型。
Endocrinol Metab (Seoul). 2016 Mar;31(1):38-44. doi: 10.3803/EnM.2016.31.1.38.
5
Comparison of predictive modeling approaches for 30-day all-cause non-elective readmission risk.30天全因非择期再入院风险预测建模方法的比较
BMC Med Res Methodol. 2016 Feb 27;16:26. doi: 10.1186/s12874-016-0128-0.
6
Multimodal Deep Autoencoder for Human Pose Recovery.多模态深度自动编码器的人体姿态恢复。
IEEE Trans Image Process. 2015 Dec;24(12):5659-70. doi: 10.1109/TIP.2015.2487860. Epub 2015 Oct 7.
7
Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images.用于乳腺癌组织病理学图像细胞核检测的堆叠稀疏自动编码器(SSAE)
IEEE Trans Med Imaging. 2016 Jan;35(1):119-30. doi: 10.1109/TMI.2015.2458702. Epub 2015 Jul 20.
8
Handling missing data in RCTs; a review of the top medical journals.随机对照试验中缺失数据的处理;顶级医学期刊综述
BMC Med Res Methodol. 2014 Nov 19;14:118. doi: 10.1186/1471-2288-14-118.
9
Towards better clinical prediction models: seven steps for development and an ABCD for validation.迈向更好的临床预测模型:开发的七个步骤及验证的ABCD法
Eur Heart J. 2014 Aug 1;35(29):1925-31. doi: 10.1093/eurheartj/ehu207. Epub 2014 Jun 4.
10
A public-private partnership develops and externally validates a 30-day hospital readmission risk prediction model.公私合作开发并对外验证了一个30天医院再入院风险预测模型。
Online J Public Health Inform. 2013 Jul 1;5(2):219. doi: 10.5210/ojphi.v5i2.4726. eCollection 2013.