Suppr超能文献

基于电子健康记录的表型分析:批量学习与评估

EHR-based phenotyping: Bulk learning and evaluation.

作者信息

Chiu Po-Hsiang, Hripcsak George

机构信息

Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.

出版信息

J Biomed Inform. 2017 Jun;70:35-51. doi: 10.1016/j.jbi.2017.04.009. Epub 2017 Apr 12.

Abstract

In data-driven phenotyping, a core computational task is to identify medical concepts and their variations from sources of electronic health records (EHR) to stratify phenotypic cohorts. A conventional analytic framework for phenotyping largely uses a manual knowledge engineering approach or a supervised learning approach where clinical cases are represented by variables encompassing diagnoses, medicinal treatments and laboratory tests, among others. In such a framework, tasks associated with feature engineering and data annotation remain a tedious and expensive exercise, resulting in poor scalability. In addition, certain clinical conditions, such as those that are rare and acute in nature, may never accumulate sufficient data over time, which poses a challenge to establishing accurate and informative statistical models. In this paper, we use infectious diseases as the domain of study to demonstrate a hierarchical learning method based on ensemble learning that attempts to address these issues through feature abstraction. We use a sparse annotation set to train and evaluate many phenotypes at once, which we call bulk learning. In this batch-phenotyping framework, disease cohort definitions can be learned from within the abstract feature space established by using multiple diseases as a substrate and diagnostic codes as surrogates. In particular, using surrogate labels for model training renders possible its subsequent evaluation using only a sparse annotated sample. Moreover, statistical models can be trained and evaluated, using the same sparse annotation, from within the abstract feature space of low dimensionality that encapsulates the shared clinical traits of these target diseases, collectively referred to as the bulk learning set.

摘要

在数据驱动的表型分析中,一个核心计算任务是从电子健康记录(EHR)源中识别医学概念及其变体,以对表型队列进行分层。传统的表型分析框架主要使用手动知识工程方法或监督学习方法,其中临床病例由包括诊断、药物治疗和实验室检查等变量表示。在这样的框架中,与特征工程和数据注释相关的任务仍然是一项繁琐且昂贵的工作,导致可扩展性较差。此外,某些临床病症,例如那些罕见且急性的病症,可能永远无法随着时间积累足够的数据,这对建立准确且信息丰富的统计模型构成了挑战。在本文中,我们以传染病作为研究领域,展示一种基于集成学习的分层学习方法,该方法试图通过特征抽象来解决这些问题。我们使用一个稀疏注释集来一次性训练和评估多个表型,我们将其称为批量学习。在这个批量表型分析框架中,可以从以多种疾病为基础、诊断代码为替代物建立的抽象特征空间中学习疾病队列定义。特别是,使用替代标签进行模型训练使得随后仅使用稀疏注释样本进行评估成为可能。此外,可以在封装这些目标疾病共同临床特征的低维抽象特征空间内(统称为批量学习集),使用相同的稀疏注释来训练和评估统计模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cda/5934756/e0925a80bde6/nihms962422f1.jpg

相似文献

1
EHR-based phenotyping: Bulk learning and evaluation.基于电子健康记录的表型分析:批量学习与评估
J Biomed Inform. 2017 Jun;70:35-51. doi: 10.1016/j.jbi.2017.04.009. Epub 2017 Apr 12.
2
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.
7
Enabling phenotypic big data with PheNorm.利用 PheNorm 实现表型大数据。
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. doi: 10.1093/jamia/ocx111.
8
Feature extraction for phenotyping from semantic and knowledge resources.从语义和知识资源中进行表型特征提取。
J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.

引用本文的文献

1
Artificial Intelligence in Biomedical Sciences: A Scoping Review.生物医学科学中的人工智能:一项范围综述
Br J Biomed Sci. 2025 Aug 5;82:14362. doi: 10.3389/bjbs.2025.14362. eCollection 2025.
9
Defining Phenotypes from Clinical Data to Drive Genomic Research.从临床数据定义表型以推动基因组研究。
Annu Rev Biomed Data Sci. 2018 Jul;1:69-92. doi: 10.1146/annurev-biodatasci-080917-013335. Epub 2018 Apr 25.
10
High-throughput phenotyping with temporal sequences.高通量表型分析与时间序列。
J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

本文引用的文献

2
Electronic medical record phenotyping using the anchor and learn framework.使用锚定与学习框架进行电子病历表型分析。
J Am Med Inform Assoc. 2016 Jul;23(4):731-40. doi: 10.1093/jamia/ocw011. Epub 2016 Apr 23.
3
Deep phenotyping: The details of disease.深度表型分析:疾病的细节
Nature. 2015 Nov 5;527(7576):S14-5. doi: 10.1038/527S14a.
5
Learning probabilistic phenotypes from heterogeneous EHR data.从异构电子健康记录数据中学习概率性表型。
J Biomed Inform. 2015 Dec;58:156-165. doi: 10.1016/j.jbi.2015.10.001. Epub 2015 Oct 14.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验