在医生笔记的高通量表型分析中，大型语言模型优于其他计算方法。

A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes.

作者信息

Munzir Syed I, Hier Daniel B, Oommen Chelsea, Carrithers Michael D

机构信息

Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, USA.

Kummer Institute, Missouri University of Science and Technology, Rolla, MO, USA.

出版信息

AMIA Annu Symp Proc. 2025 May 22;2024:838-846. eCollection 2024.

PMID:40417529

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12099424/

Abstract

High-throughput phenotyping, the automated mapping of patient signs and symptoms to standardized ontology concepts, is essential for realizing value from electronic health records (EHR) in support of precision medicine. Despite technological advances, high-throughput phenotyping remains a challenge. This study compares three computational approaches to high-throughput phenotyping: a large language model (LLM) incorporating generative AI, a deep learning (DL) approach utilizing span categorization, and a machine learning (ML) approach with word embeddings. The LLM approach that implemented GPT-4 demonstrated superior performance, suggesting that large language models are poised to become the preferred method for high-throughput phenotyping ofphysician notes.

摘要

高通量表型分析，即将患者体征和症状自动映射到标准化本体概念，对于从电子健康记录（EHR）中实现价值以支持精准医学至关重要。尽管有技术进步，但高通量表型分析仍然是一项挑战。本研究比较了三种高通量表型分析的计算方法：一种结合生成式人工智能的大语言模型（LLM）、一种利用跨度分类的深度学习（DL）方法以及一种带有词嵌入的机器学习（ML）方法。实施GPT-4的LLM方法表现出卓越的性能，这表明大语言模型有望成为对医生记录进行高通量表型分析的首选方法。

相似文献

A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes.在医生笔记的高通量表型分析中，大型语言模型优于其他计算方法。

AMIA Annu Symp Proc. 2025 May 22;2024:838-846. eCollection 2024.

High Throughput Phenotyping of Physician Notes with Large Language and Hybrid NLP Models.使用大语言模型和混合自然语言处理模型对医生笔记进行高通量表型分析

Annu Int Conf IEEE Eng Med Biol Soc. 2024 Jul;2024:1-5. doi: 10.1109/EMBC53108.2024.10782119.

Large Language Model-Based Assessment of Clinical Reasoning Documentation in the Electronic Health Record Across Two Institutions: Development and Validation Study.基于大语言模型对两个机构电子健康记录中临床推理文档的评估：开发与验证研究

J Med Internet Res. 2025 Mar 21;27:e67967. doi: 10.2196/67967.

Engineering of Generative Artificial Intelligence and Natural Language Processing Models to Accurately Identify Arrhythmia Recurrence.用于准确识别心律失常复发的生成式人工智能和自然语言处理模型的工程设计。

Circ Arrhythm Electrophysiol. 2025 Jan;18(1):e013023. doi: 10.1161/CIRCEP.124.013023. Epub 2024 Dec 16.

A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records.基于大型语言模型的生成式自然语言处理框架，在临床笔记上进行了微调，能够从电子健康记录中准确提取头痛频率。

Headache. 2024 Apr;64(4):400-409. doi: 10.1111/head.14702. Epub 2024 Mar 25.

Integrating large language models with human expertise for disease detection in electronic health records.将大语言模型与人类专业知识相结合用于电子健康记录中的疾病检测。

Comput Biol Med. 2025 Jun;191:110161. doi: 10.1016/j.compbiomed.2025.110161. Epub 2025 Apr 7.

Natural Language Processing for EHR-Based Computational Phenotyping.基于电子健康记录的自然语言处理计算表型。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25.

A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试，采用了适配的大语言模型。

J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes.使用机器学习和临床记录预测危重症糖尿病患者的死亡率。

BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):295. doi: 10.1186/s12911-020-01318-4.

Automated anonymization of radiology reports: comparison of publicly available natural language processing and large language models.放射学报告的自动匿名化：公开可用的自然语言处理与大语言模型的比较

Eur Radiol. 2025 May;35(5):2634-2641. doi: 10.1007/s00330-024-11148-x. Epub 2024 Oct 31.

本文引用的文献

High Throughput Phenotyping of Physician Notes with Large Language and Hybrid NLP Models.使用大语言模型和混合自然语言处理模型对医生笔记进行高通量表型分析

Annu Int Conf IEEE Eng Med Biol Soc. 2024 Jul;2024:1-5. doi: 10.1109/EMBC53108.2024.10782119.

Fine-tuning large language models for rare disease concept normalization.微调大型语言模型以实现罕见病概念规范化。

J Am Med Inform Assoc. 2024 Sep 1;31(9):2076-2083. doi: 10.1093/jamia/ocae133.

Large language models facilitate the generation of electronic health record phenotyping algorithms.大语言模型有助于电子健康记录表型算法的生成。

J Am Med Inform Assoc. 2024 Sep 1;31(9):1994-2001. doi: 10.1093/jamia/ocae072.

Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT.使用大语言模型增强临床笔记中的表型识别：PhenoBCBERT和PhenoGPT。

Patterns (N Y). 2023 Dec 5;5(1):100887. doi: 10.1016/j.patter.2023.100887. eCollection 2024 Jan 12.

Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records.电子健康记录中神经体征和症状标注的评分者间一致性。

Front Digit Health. 2023 Jun 13;5:1075771. doi: 10.3389/fdgth.2023.1075771. eCollection 2023.

Subtypes of relapsing-remitting multiple sclerosis identified by network analysis.通过网络分析确定的复发缓解型多发性硬化症亚型。

Front Digit Health. 2023 Jan 11;4:1063264. doi: 10.3389/fdgth.2022.1063264. eCollection 2022.

Artificial intelligence and machine learning in precision medicine: A paradigm shift in big data analysis.人工智能和机器学习在精准医学中的应用：大数据分析的范式转变。

Prog Mol Biol Transl Sci. 2022;190(1):57-100. doi: 10.1016/bs.pmbts.2022.03.002. Epub 2022 Apr 8.

The Human Phenotype Ontology in 2021.2021 年人类表型本体论。

Nucleic Acids Res. 2021 Jan 8;49(D1):D1207-D1217. doi: 10.1093/nar/gkaa1043.

BERT-based Ranking for Biomedical Entity Normalization.基于BERT的生物医学实体规范化排序

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:269-277. eCollection 2020.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验