用于表型分层的电子健康记录的半监督学习

Semi-supervised learning of the electronic health record for phenotype stratification.

作者信息

Beaulieu-Jones Brett K, Greene Casey S

机构信息

Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, United States; Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, United States.

Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, United States; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, United States; Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Perelman School of Medicine, University of Pennsylvania, United States.

出版信息

J Biomed Inform. 2016 Dec;64:168-178. doi: 10.1016/j.jbi.2016.10.007. Epub 2016 Oct 12.

DOI:10.1016/j.jbi.2016.10.007

PMID:27744022

Abstract

Patient interactions with health care providers result in entries to electronic health records (EHRs). EHRs were built for clinical and billing purposes but contain many data points about an individual. Mining these records provides opportunities to extract electronic phenotypes, which can be paired with genetic data to identify genes underlying common human diseases. This task remains challenging: high quality phenotyping is costly and requires physician review; many fields in the records are sparsely filled; and our definitions of diseases are continuing to improve over time. Here we develop and evaluate a semi-supervised learning method for EHR phenotype extraction using denoising autoencoders for phenotype stratification. By combining denoising autoencoders with random forests we find classification improvements across multiple simulation models and improved survival prediction in ALS clinical trial data. This is particularly evident in cases where only a small number of patients have high quality phenotypes, a common scenario in EHR-based research. Denoising autoencoders perform dimensionality reduction enabling visualization and clustering for the discovery of new subtypes of disease. This method represents a promising approach to clarify disease subtypes and improve genotype-phenotype association studies that leverage EHRs.

摘要

患者与医疗服务提供者的互动会生成电子健康记录（EHR）中的条目。EHR是为临床和计费目的而建立的，但包含了关于个人的许多数据点。挖掘这些记录为提取电子表型提供了机会，这些电子表型可与遗传数据配对，以识别常见人类疾病背后的基因。这项任务仍然具有挑战性：高质量的表型分析成本高昂且需要医生审核；记录中的许多字段填写稀疏；而且我们对疾病的定义也在不断随着时间改进。在这里，我们开发并评估了一种用于EHR表型提取的半监督学习方法，该方法使用去噪自动编码器进行表型分层。通过将去噪自动编码器与随机森林相结合，我们发现在多个模拟模型中分类得到了改进，并且在ALS临床试验数据中的生存预测也得到了改善。在只有少数患者具有高质量表型的情况下，这一点尤为明显，这是基于EHR的研究中的常见情况。去噪自动编码器执行降维，从而实现可视化和聚类，以发现疾病的新亚型。这种方法是一种很有前景的方法，可用于阐明疾病亚型并改善利用EHR的基因型-表型关联研究。

相似文献

Semi-supervised learning of the electronic health record for phenotype stratification.用于表型分层的电子健康记录的半监督学习

J Biomed Inform. 2016 Dec;64:168-178. doi: 10.1016/j.jbi.2016.10.007. Epub 2016 Oct 12.

Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

Mapping Patient Trajectories using Longitudinal Extraction and Deep Learning in the MIMIC-III Critical Care Database.在MIMIC-III重症监护数据库中使用纵向提取和深度学习绘制患者轨迹

Pac Symp Biocomput. 2018;23:123-132.

EHR-based phenotyping: Bulk learning and evaluation.基于电子健康记录的表型分析：批量学习与评估

J Biomed Inform. 2017 Jun;70:35-51. doi: 10.1016/j.jbi.2017.04.009. Epub 2017 Apr 12.

Relational machine learning for electronic health record-driven phenotyping.用于电子健康记录驱动的表型分析的关系机器学习。

J Biomed Inform. 2014 Dec;52:260-70. doi: 10.1016/j.jbi.2014.07.007. Epub 2014 Jul 15.

High-throughput phenotyping with temporal sequences.高通量表型分析与时间序列。

J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

Performance and clinical utility of a new supervised machine-learning pipeline in detecting rare ciliopathy patients based on deep phenotyping from electronic health records and semantic similarity.基于电子健康记录的深度表型和语义相似性的新型监督机器学习管道在检测罕见纤毛病患者中的性能和临床实用性。

Orphanet J Rare Dis. 2024 Feb 10;19(1):55. doi: 10.1186/s13023-024-03063-7.

Semi-supervised Double Deep Learning Temporal Risk Prediction (SeDDLeR) with Electronic Health Records.基于电子健康记录的半监督双深度学习时间风险预测（SeDDLeR）

J Biomed Inform. 2024 Sep;157:104685. doi: 10.1016/j.jbi.2024.104685. Epub 2024 Jul 14.

Deep Phenotyping of Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives With a Sequence Motif Discovery Tool: Algorithm Development and Validation.利用序列基序发现工具识别表型叙述的语言模式对中国电子健康记录进行深度表型分析：算法开发与验证

J Med Internet Res. 2022 Jun 3;24(6):e37213. doi: 10.2196/37213.

引用本文的文献

Optimizing deep learning models to combat amyotrophic lateral sclerosis (ALS) disease progression.优化深度学习模型以对抗肌萎缩侧索硬化症（ALS）的疾病进展。

Digit Health. 2025 Jun 30;11:20552076251349719. doi: 10.1177/20552076251349719. eCollection 2025 Jan-Dec.

Towards automated phenotype definition extraction using large language models.迈向使用大语言模型进行自动化表型定义提取

Genomics Inform. 2024 Oct 31;22(1):21. doi: 10.1186/s44342-024-00023-2.

Machine learning and brain-computer interface approaches in prognosis and individualized care strategies for individuals with amyotrophic lateral sclerosis: A systematic review.机器学习和脑机接口方法在肌萎缩侧索硬化症患者预后及个体化护理策略中的应用：一项系统综述

MethodsX. 2024 May 25;13:102765. doi: 10.1016/j.mex.2024.102765. eCollection 2024 Dec.

PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK.在选择偏倚下利用电子健康记录进行患者招募：一种两阶段抽样框架

Ann Appl Stat. 2024 Sep;18(3):1858-1878. doi: 10.1214/23-aoas1860. Epub 2024 Aug 5.

Generating Complex Explanations for Artificial Intelligence Models: An Application to Clinical Data on Severe Mental Illness.为人工智能模型生成复杂解释：在严重精神疾病临床数据中的应用

Life (Basel). 2024 Jun 26;14(7):807. doi: 10.3390/life14070807.

LATTE: Label-efficient incident phenotyping from longitudinal electronic health records.LATTE：从纵向电子健康记录中进行高效标签事件表型分析。

Patterns (N Y). 2023 Dec 27;5(1):100906. doi: 10.1016/j.patter.2023.100906. eCollection 2024 Jan 12.

Clinical Phenotyping with an Outcomes-driven Mixture of Experts for Patient Matching and Risk Estimation.基于结果驱动的专家混合模型进行临床表型分析以实现患者匹配和风险评估。

ACM Trans Comput Healthc. 2023 Oct;4(4):1-18. doi: 10.1145/3616021. Epub 2023 Sep 13.

Machine Learning and Pharmacogenomics at the Time of Precision Psychiatry.精准精神医学时代的机器学习与药物基因组学

Curr Neuropharmacol. 2023;21(12):2395-2408. doi: 10.2174/1570159X21666230808170123.

Improving an Electronic Health Record-Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach.在标签不足的情况下改进基于电子健康记录的临床预测模型：基于网络的生成对抗半监督方法。

JMIR Med Inform. 2023 Jun 13;11:e47862. doi: 10.2196/47862.

A flexible symbolic regression method for constructing interpretable clinical prediction models.一种用于构建可解释临床预测模型的灵活符号回归方法。

NPJ Digit Med. 2023 Jun 5;6(1):107. doi: 10.1038/s41746-023-00833-8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于表型分层的电子健康记录的半监督学习

Semi-supervised learning of the electronic health record for phenotype stratification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献