Suppr超能文献

使用一种常见的半监督方法(PheCAP)对电子病历数据进行高通量表型分析。

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA.

出版信息

Nat Protoc. 2019 Dec;14(12):3426-3444. doi: 10.1038/s41596-019-0227-6. Epub 2019 Nov 20.

Abstract

Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).

摘要

表型是疾病风险和结果的临床和遗传研究的基础。与电子病历 (EMR) 数据相关的生物库的增长既促进了对高效、准确和强大的方法的需求,也增加了对这些方法的需求,以便对数百万患者进行表型分析。使用 EMR 数据进行表型分析的挑战包括代码准确性的差异,以及为算法识别特征和获得金标准标签所需的大量手动输入。为了解决这些挑战,我们开发了 PheCAP,这是一种高通量的半监督表型分析管道。PheCAP 从 EMR 中的数据开始,包括使用自然语言处理 (NLP) 从叙述性注释中提取的结构化数据和信息。标准化步骤集成了自动化程序,从而减少了手动输入的程度,并为算法训练提供了机器学习方法。如果所有数据都可用,PheCAP 本身可以在 1-2 天内执行;但是,时间主要取决于图表审查阶段,该阶段通常至少需要 2 周。PheCAP 的最终产品包括一个表型算法、所有患者的表型概率以及表型分类(是或否)。

相似文献

3
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.
7
Development of an automated phenotyping algorithm for hepatorenal syndrome.开发用于肝肾综合征的自动表型算法。
J Biomed Inform. 2018 Apr;80:87-95. doi: 10.1016/j.jbi.2018.03.001. Epub 2018 Mar 9.

引用本文的文献

4
Clinical Research Informatics: a Decade-in-Review.临床研究信息学:十年回顾
Yearb Med Inform. 2024 Aug;33(1):127-142. doi: 10.1055/s-0044-1800732. Epub 2025 Apr 8.

本文引用的文献

2
An atlas of genetic associations in UK Biobank.英国生物银行中的遗传关联图谱
Nat Genet. 2018 Nov;50(11):1593-1599. doi: 10.1038/s41588-018-0248-z. Epub 2018 Oct 22.
5
Informatics and machine learning to define the phenotype.信息学和机器学习定义表型。
Expert Rev Mol Diagn. 2018 Mar;18(3):219-226. doi: 10.1080/14737159.2018.1439380. Epub 2018 Feb 16.
6
Enabling phenotypic big data with PheNorm.利用 PheNorm 实现表型大数据。
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. doi: 10.1093/jamia/ocx111.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验