Suppr超能文献

在医生笔记的高通量表型分析中,大型语言模型优于其他计算方法。

A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes.

作者信息

Munzir Syed I, Hier Daniel B, Oommen Chelsea, Carrithers Michael D

机构信息

Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, USA.

Kummer Institute, Missouri University of Science and Technology, Rolla, MO, USA.

出版信息

AMIA Annu Symp Proc. 2025 May 22;2024:838-846. eCollection 2024.

Abstract

High-throughput phenotyping, the automated mapping of patient signs and symptoms to standardized ontology concepts, is essential for realizing value from electronic health records (EHR) in support of precision medicine. Despite technological advances, high-throughput phenotyping remains a challenge. This study compares three computational approaches to high-throughput phenotyping: a large language model (LLM) incorporating generative AI, a deep learning (DL) approach utilizing span categorization, and a machine learning (ML) approach with word embeddings. The LLM approach that implemented GPT-4 demonstrated superior performance, suggesting that large language models are poised to become the preferred method for high-throughput phenotyping ofphysician notes.

摘要

高通量表型分析,即将患者体征和症状自动映射到标准化本体概念,对于从电子健康记录(EHR)中实现价值以支持精准医学至关重要。尽管有技术进步,但高通量表型分析仍然是一项挑战。本研究比较了三种高通量表型分析的计算方法:一种结合生成式人工智能的大语言模型(LLM)、一种利用跨度分类的深度学习(DL)方法以及一种带有词嵌入的机器学习(ML)方法。实施GPT-4的LLM方法表现出卓越的性能,这表明大语言模型有望成为对医生记录进行高通量表型分析的首选方法。

相似文献

2
High Throughput Phenotyping of Physician Notes with Large Language and Hybrid NLP Models.
Annu Int Conf IEEE Eng Med Biol Soc. 2024 Jul;2024:1-5. doi: 10.1109/EMBC53108.2024.10782119.
4
Engineering of Generative Artificial Intelligence and Natural Language Processing Models to Accurately Identify Arrhythmia Recurrence.
Circ Arrhythm Electrophysiol. 2025 Jan;18(1):e013023. doi: 10.1161/CIRCEP.124.013023. Epub 2024 Dec 16.
6
Integrating large language models with human expertise for disease detection in electronic health records.
Comput Biol Med. 2025 Jun;191:110161. doi: 10.1016/j.compbiomed.2025.110161. Epub 2025 Apr 7.
7
Natural Language Processing for EHR-Based Computational Phenotyping.
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25.
8
A dataset and benchmark for hospital course summarization with adapted large language models.
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
9
Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes.
BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):295. doi: 10.1186/s12911-020-01318-4.
10

本文引用的文献

1
High Throughput Phenotyping of Physician Notes with Large Language and Hybrid NLP Models.
Annu Int Conf IEEE Eng Med Biol Soc. 2024 Jul;2024:1-5. doi: 10.1109/EMBC53108.2024.10782119.
2
Fine-tuning large language models for rare disease concept normalization.
J Am Med Inform Assoc. 2024 Sep 1;31(9):2076-2083. doi: 10.1093/jamia/ocae133.
3
Large language models facilitate the generation of electronic health record phenotyping algorithms.
J Am Med Inform Assoc. 2024 Sep 1;31(9):1994-2001. doi: 10.1093/jamia/ocae072.
4
Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT.
Patterns (N Y). 2023 Dec 5;5(1):100887. doi: 10.1016/j.patter.2023.100887. eCollection 2024 Jan 12.
5
Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records.
Front Digit Health. 2023 Jun 13;5:1075771. doi: 10.3389/fdgth.2023.1075771. eCollection 2023.
6
Subtypes of relapsing-remitting multiple sclerosis identified by network analysis.
Front Digit Health. 2023 Jan 11;4:1063264. doi: 10.3389/fdgth.2022.1063264. eCollection 2022.
7
Artificial intelligence and machine learning in precision medicine: A paradigm shift in big data analysis.
Prog Mol Biol Transl Sci. 2022;190(1):57-100. doi: 10.1016/bs.pmbts.2022.03.002. Epub 2022 Apr 8.
8
The Human Phenotype Ontology in 2021.
Nucleic Acids Res. 2021 Jan 8;49(D1):D1207-D1217. doi: 10.1093/nar/gkaa1043.
9
BERT-based Ranking for Biomedical Entity Normalization.
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:269-277. eCollection 2020.
10
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验