基于自动编码器的电子健康记录中相似患者检索的表示学习：比较研究

Autoencoder-Based Representation Learning for Similar Patients Retrieval From Electronic Health Records: Comparative Study.

作者信息

Li Deyi, Shukla Aditi, Chandaka Sravani, Taylor Bradley, Xu Jie, Liu Mei

机构信息

Department of Health Outcomes & Biomedical Informatics, University of Florida, 1889 Museum Rd, 7th Floor, Suite 7000, Room 7012, Gainesville, FL, 32611, United States, 1 352-627-9143.

Department of Mathematics, College of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, United States.

出版信息

JMIR Med Inform. 2025 Jul 24;13:e68830. doi: 10.2196/68830.

DOI:10.2196/68830

PMID:40706557

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12289314/

Abstract

BACKGROUND

By analyzing electronic health record snapshots of similar patients, physicians can proactively predict disease onsets, customize treatment plans, and anticipate patient-specific trajectories. However, the modeling of electronic health record data is inherently challenging due to its high dimensionality, mixed feature types, noise, bias, and sparsity. Patient representation learning using autoencoders (AEs) presents promising opportunities to address these challenges. A critical question remains: how do different AE designs and distance measures impact the quality of retrieved similar patient cohorts?

OBJECTIVE

This study aims to evaluate the performance of 5 common AE variants-vanilla autoencoder, denoising autoencoder, contractive autoencoder, sparse autoencoder, and robust autoencoder-in retrieving similar patients. Additionally, it investigates the impact of different distance measures and hyperparameter configurations on model performance.

METHODS

We tested the 5 AE variants on 2 real-world datasets-the University of Kansas Medical Center (n=13,752) and the Medical College of Wisconsin (n=9568)-across 168 different hyperparameter configurations. To retrieve similar patients based on the AE-produced latent representations, we applied k-nearest neighbors (k-NN) using Euclidean and Mahalanobis distances. Two prediction targets were evaluated: acute kidney injury onset and postdischarge 1-year mortality.

RESULTS

Our findings demonstrate that (1) denoising autoencoders outperformed other AE variants when paired with Euclidean distance (P<.001), followed by vanilla autoencoders and contractive autoencoders; (2) learning rates significantly influenced the performance of AE variants; and (3) Mahalanobis distance-based k-NN frequently outperformed Euclidean distance-based k-NN when applied to latent representations. However, whether AE models are superior in transforming raw data into latent representations, compared with applying Mahalanobis distance-based k-NN directly to raw data, appears to be data-dependent.

CONCLUSIONS

This study provides a comprehensive analysis of the performance of different AE variants in retrieving similar patients and evaluates the impact of various hyperparameter configurations on model performance. The findings lay the groundwork for future development of AE-based patient similarity estimation and personalized medicine.

摘要

背景

通过分析相似患者的电子健康记录快照，医生可以主动预测疾病发作、定制治疗方案并预测患者特定的病程。然而，由于电子健康记录数据具有高维度、混合特征类型、噪声、偏差和稀疏性，对其进行建模具有内在的挑战性。使用自动编码器（AE）进行患者表示学习为应对这些挑战提供了有希望的机会。一个关键问题仍然存在：不同的AE设计和距离度量如何影响检索到的相似患者队列的质量？

目的

本研究旨在评估5种常见AE变体——普通自动编码器、去噪自动编码器、收缩自动编码器、稀疏自动编码器和鲁棒自动编码器——在检索相似患者方面的性能。此外，还研究了不同距离度量和超参数配置对模型性能的影响。

方法

我们在2个真实世界数据集——堪萨斯大学医学中心（n = 13752）和威斯康星医学院（n = 9568）——上测试了这5种AE变体，涉及168种不同的超参数配置。为了基于AE生成的潜在表示检索相似患者，我们使用欧几里得距离和马氏距离应用k近邻（k-NN）算法。评估了两个预测目标：急性肾损伤发作和出院后1年死亡率。

结果

我们的研究结果表明：（1）与欧几里得距离配对时，去噪自动编码器的表现优于其他AE变体（P <.001），其次是普通自动编码器和收缩自动编码器；（2）学习率显著影响AE变体的性能；（3）应用于潜在表示时，基于马氏距离的k-NN通常优于基于欧几里得距离的k-NN。然而，与直接将基于马氏距离的k-NN应用于原始数据相比，AE模型在将原始数据转换为潜在表示方面是否更具优势似乎取决于数据。

结论

本研究对不同AE变体在检索相似患者方面的性能进行了全面分析，并评估了各种超参数配置对模型性能的影响。研究结果为基于AE的患者相似性估计和个性化医疗的未来发展奠定了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e61/12289314/d7b5d9d55295/medinform-v13-e68830-g001.jpg

相似文献

Autoencoder-Based Representation Learning for Similar Patients Retrieval From Electronic Health Records: Comparative Study.基于自动编码器的电子健康记录中相似患者检索的表示学习：比较研究

JMIR Med Inform. 2025 Jul 24;13:e68830. doi: 10.2196/68830.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。

Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备：证据综合和成本效益分析。

Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.

Short-Term Memory Impairment短期记忆障碍

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验：对定性文献的系统综述

JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.

Sexual Harassment and Prevention Training性骚扰与预防培训

本文引用的文献

JMIR Med Inform. 2024 Jan 19;12:e49138. doi: 10.2196/49138.

Development and Validation of a Personalized Model With Transfer Learning for Acute Kidney Injury Risk Estimation Using Electronic Health Records.基于电子健康记录的应用迁移学习构建急性肾损伤风险预测个体化模型的建立与验证。

JAMA Netw Open. 2022 Jul 1;5(7):e2219776. doi: 10.1001/jamanetworkopen.2022.19776.

Sequential Data-Based Patient Similarity Framework for Patient Outcome Prediction: Algorithm Development.基于序列数据的患者相似性框架用于患者预后预测：算法开发。

J Med Internet Res. 2022 Jan 6;24(1):e30720. doi: 10.2196/30720.

Generating sequential electronic health records using dual adversarial autoencoder.使用对偶对抗自动编码器生成连续的电子健康记录。

J Am Med Inform Assoc. 2020 Jul 1;27(9):1411-1419. doi: 10.1093/jamia/ocaa119.

Deep representation learning of electronic health records to unlock patient stratification at scale.电子健康记录的深度表征学习，以大规模实现患者分层。

NPJ Digit Med. 2020 Jul 17;3:96. doi: 10.1038/s41746-020-0301-z. eCollection 2020.

Parameter tuning is a key part of dimensionality reduction via deep variational autoencoders for single cell RNA transcriptomics.参数调整是通过深度变分自编码器进行单细胞RNA转录组学降维的关键部分。

Pac Symp Biocomput. 2019;24:362-373.

The clinical heterogeneity of Parkinson's disease and its therapeutic implications.帕金森病的临床异质性及其治疗意义。

Eur J Neurosci. 2019 Feb;49(3):328-338. doi: 10.1111/ejn.14094. Epub 2018 Oct 14.

J Biomed Inform. 2018 Jul;83:87-96. doi: 10.1016/j.jbi.2018.06.001. Epub 2018 Jun 1.

Secondary use of electronic medical records for clinical research: Challenges and Opportunities.电子病历在临床研究中的二次利用：挑战与机遇

Converg Sci Phys Oncol. 2018 Mar;4(1). doi: 10.1088/2057-1739/aaa905. Epub 2018 Feb 12.

Personalized medicine could transform healthcare.个性化医疗可能会改变医疗保健。

Biomed Rep. 2017 Jul;7(1):3-5. doi: 10.3892/br.2017.922. Epub 2017 Jun 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于自动编码器的电子健康记录中相似患者检索的表示学习：比较研究

Autoencoder-Based Representation Learning for Similar Patients Retrieval From Electronic Health Records: Comparative Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献