Suppr超能文献

基于电子健康记录的表型分析:批量学习与评估

EHR-based phenotyping: Bulk learning and evaluation.

作者信息

Chiu Po-Hsiang, Hripcsak George

机构信息

Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.

出版信息

J Biomed Inform. 2017 Jun;70:35-51. doi: 10.1016/j.jbi.2017.04.009. Epub 2017 Apr 12.

Abstract

In data-driven phenotyping, a core computational task is to identify medical concepts and their variations from sources of electronic health records (EHR) to stratify phenotypic cohorts. A conventional analytic framework for phenotyping largely uses a manual knowledge engineering approach or a supervised learning approach where clinical cases are represented by variables encompassing diagnoses, medicinal treatments and laboratory tests, among others. In such a framework, tasks associated with feature engineering and data annotation remain a tedious and expensive exercise, resulting in poor scalability. In addition, certain clinical conditions, such as those that are rare and acute in nature, may never accumulate sufficient data over time, which poses a challenge to establishing accurate and informative statistical models. In this paper, we use infectious diseases as the domain of study to demonstrate a hierarchical learning method based on ensemble learning that attempts to address these issues through feature abstraction. We use a sparse annotation set to train and evaluate many phenotypes at once, which we call bulk learning. In this batch-phenotyping framework, disease cohort definitions can be learned from within the abstract feature space established by using multiple diseases as a substrate and diagnostic codes as surrogates. In particular, using surrogate labels for model training renders possible its subsequent evaluation using only a sparse annotated sample. Moreover, statistical models can be trained and evaluated, using the same sparse annotation, from within the abstract feature space of low dimensionality that encapsulates the shared clinical traits of these target diseases, collectively referred to as the bulk learning set.

摘要

在数据驱动的表型分析中,一个核心计算任务是从电子健康记录(EHR)源中识别医学概念及其变体,以对表型队列进行分层。传统的表型分析框架主要使用手动知识工程方法或监督学习方法,其中临床病例由包括诊断、药物治疗和实验室检查等变量表示。在这样的框架中,与特征工程和数据注释相关的任务仍然是一项繁琐且昂贵的工作,导致可扩展性较差。此外,某些临床病症,例如那些罕见且急性的病症,可能永远无法随着时间积累足够的数据,这对建立准确且信息丰富的统计模型构成了挑战。在本文中,我们以传染病作为研究领域,展示一种基于集成学习的分层学习方法,该方法试图通过特征抽象来解决这些问题。我们使用一个稀疏注释集来一次性训练和评估多个表型,我们将其称为批量学习。在这个批量表型分析框架中,可以从以多种疾病为基础、诊断代码为替代物建立的抽象特征空间中学习疾病队列定义。特别是,使用替代标签进行模型训练使得随后仅使用稀疏注释样本进行评估成为可能。此外,可以在封装这些目标疾病共同临床特征的低维抽象特征空间内(统称为批量学习集),使用相同的稀疏注释来训练和评估统计模型。

相似文献

1
EHR-based phenotyping: Bulk learning and evaluation.
J Biomed Inform. 2017 Jun;70:35-51. doi: 10.1016/j.jbi.2017.04.009. Epub 2017 Apr 12.
2
Weakly Semi-supervised phenotyping using Electronic Health records.
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.
3
Automated feature selection of predictors in electronic medical records data.
Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.
4
Relational machine learning for electronic health record-driven phenotyping.
J Biomed Inform. 2014 Dec;52:260-70. doi: 10.1016/j.jbi.2014.07.007. Epub 2014 Jul 15.
5
Semi-supervised learning of the electronic health record for phenotype stratification.
J Biomed Inform. 2016 Dec;64:168-178. doi: 10.1016/j.jbi.2016.10.007. Epub 2016 Oct 12.
6
Surrogate-assisted feature extraction for high-throughput phenotyping.
J Am Med Inform Assoc. 2017 Apr 1;24(e1):e143-e149. doi: 10.1093/jamia/ocw135.
7
Enabling phenotypic big data with PheNorm.
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. doi: 10.1093/jamia/ocx111.
8
Feature extraction for phenotyping from semantic and knowledge resources.
J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.
10
Applying active learning to high-throughput phenotyping algorithms for electronic health records data.
J Am Med Inform Assoc. 2013 Dec;20(e2):e253-9. doi: 10.1136/amiajnl-2013-001945. Epub 2013 Jul 13.

引用本文的文献

1
Artificial Intelligence in Biomedical Sciences: A Scoping Review.
Br J Biomed Sci. 2025 Aug 5;82:14362. doi: 10.3389/bjbs.2025.14362. eCollection 2025.
5
Opportunities and challenges for biomarker discovery using electronic health record data.
Trends Mol Med. 2023 Sep;29(9):765-776. doi: 10.1016/j.molmed.2023.06.006. Epub 2023 Jul 18.
6
Federated Learning in Health care Using Structured Medical Data.
Adv Kidney Dis Health. 2023 Jan;30(1):4-16. doi: 10.1053/j.akdh.2022.11.007.
8
Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records.
Patterns (N Y). 2021 Sep 2;2(9):100337. doi: 10.1016/j.patter.2021.100337. eCollection 2021 Sep 10.
9
Defining Phenotypes from Clinical Data to Drive Genomic Research.
Annu Rev Biomed Data Sci. 2018 Jul;1:69-92. doi: 10.1146/annurev-biodatasci-080917-013335. Epub 2018 Apr 25.
10
High-throughput phenotyping with temporal sequences.
J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

本文引用的文献

1
Learning statistical models of phenotypes using noisy labeled training data.
J Am Med Inform Assoc. 2016 Nov;23(6):1166-1173. doi: 10.1093/jamia/ocw028. Epub 2016 May 12.
2
Electronic medical record phenotyping using the anchor and learn framework.
J Am Med Inform Assoc. 2016 Jul;23(4):731-40. doi: 10.1093/jamia/ocw011. Epub 2016 Apr 23.
3
Deep phenotyping: The details of disease.
Nature. 2015 Nov 5;527(7576):S14-5. doi: 10.1038/527S14a.
4
Identification of type 2 diabetes subgroups through topological analysis of patient similarity.
Sci Transl Med. 2015 Oct 28;7(311):311ra174. doi: 10.1126/scitranslmed.aaa9364.
5
Learning probabilistic phenotypes from heterogeneous EHR data.
J Biomed Inform. 2015 Dec;58:156-165. doi: 10.1016/j.jbi.2015.10.001. Epub 2015 Oct 14.
7
Using Anchors to Estimate Clinical State without Labeled Data.
AMIA Annu Symp Proc. 2014 Nov 14;2014:606-15. eCollection 2014.
8
Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.
J Am Med Inform Assoc. 2015 Sep;22(5):993-1000. doi: 10.1093/jamia/ocv034. Epub 2015 Apr 29.
9
Representation learning: a unified deep learning framework for automatic prostate MR segmentation.
Med Image Comput Comput Assist Interv. 2013;16(Pt 2):254-61. doi: 10.1007/978-3-642-40763-5_32.
10
A review of approaches to identifying patient phenotype cohorts using electronic health records.
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):221-30. doi: 10.1136/amiajnl-2013-001935. Epub 2013 Nov 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验