• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大语言模型增强临床笔记中的表型识别:PhenoBCBERT和PhenoGPT。

Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT.

作者信息

Yang Jingye, Liu Cong, Deng Wendy, Wu Da, Weng Chunhua, Zhou Yunyun, Wang Kai

机构信息

Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.

Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA.

出版信息

Patterns (N Y). 2023 Dec 5;5(1):100887. doi: 10.1016/j.patter.2023.100887. eCollection 2024 Jan 12.

DOI:10.1016/j.patter.2023.100887
PMID:38264716
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10801236/
Abstract

To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models-PhenoBCBERT and PhenoGPT-for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models to automate the detection of phenotype terms, including those not in the current HPO. We compare these models with PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also show strong performance in case studies on biomedical literature. We evaluate the strengths and weaknesses of BERT- and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.

摘要

为了提高遗传疾病临床记录中的表型识别能力,我们开发了两种模型——PhenoBCBERT和PhenoGPT,用于扩展人类表型本体(HPO)术语的词汇表。虽然HPO为表型提供了标准化词汇,但由于传统启发式或基于规则的方法存在局限性,现有工具往往无法涵盖表型的全部范围。我们的模型利用大语言模型自动检测表型术语,包括当前HPO中未有的术语。我们将这些模型与另一种HPO识别工具PhenoTagger进行比较,发现我们的模型能够识别更广泛的表型概念,包括以前未表征的概念。我们的模型在生物医学文献的案例研究中也表现出强大的性能。我们在架构和准确性等方面评估了基于BERT和GPT的模型的优缺点。总体而言,我们的模型增强了从临床文本中自动检测表型的能力,改善了对人类疾病的下游分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/40c68ffb5bee/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/0858ca2c61f9/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/6f95e8156fab/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/680e09ea33ac/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/1806375331d7/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/40c68ffb5bee/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/0858ca2c61f9/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/6f95e8156fab/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/680e09ea33ac/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/1806375331d7/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/40c68ffb5bee/gr5.jpg

相似文献

1
Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT.使用大语言模型增强临床笔记中的表型识别:PhenoBCBERT和PhenoGPT。
Patterns (N Y). 2023 Dec 5;5(1):100887. doi: 10.1016/j.patter.2023.100887. eCollection 2024 Jan 12.
2
Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT.使用大语言模型增强临床记录中的表型识别:PhenoBCBERT和PhenoGPT
ArXiv. 2023 Nov 9:arXiv:2308.06294v2.
3
Improving Automated Deep Phenotyping Through Large Language Models Using Retrieval Augmented Generation.通过使用检索增强生成的大语言模型改进自动深度表型分析
medRxiv. 2024 Dec 2:2024.12.01.24318253. doi: 10.1101/2024.12.01.24318253.
4
Data-driven method to enhance craniofacial and oral phenotype vocabularies.基于数据驱动的方法来增强颅面和口腔表型词汇。
J Am Dent Assoc. 2019 Nov;150(11):933-939.e2. doi: 10.1016/j.adaj.2019.05.029.
5
PhenoBERT: A Combined Deep Learning Method for Automated Recognition of Human Phenotype Ontology.PhenoBERT:一种用于自动识别人类表型本体的深度学习组合方法。
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1269-1277. doi: 10.1109/TCBB.2022.3170301. Epub 2023 Apr 3.
6
Fine-tuning large language models for rare disease concept normalization.微调大型语言模型以实现罕见病概念规范化。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2076-2083. doi: 10.1093/jamia/ocae133.
7
Enhancing human phenotype ontology term extraction through synthetic case reports and embedding-based retrieval: A novel approach for improved biomedical data annotation.通过合成病例报告和基于嵌入的检索增强人类表型本体术语提取:一种改进生物医学数据注释的新方法。
J Pathol Inform. 2024 Nov 16;16:100409. doi: 10.1016/j.jpi.2024.100409. eCollection 2025 Jan.
8
Termviewer - A Web Application for Streamlined Human Phenotype Ontology (HPO) Tagging and Document Annotation.Termviewer - 一个用于简化人类表型本体 (HPO) 标记和文档注释的 Web 应用程序。
Chem Biodivers. 2022 Dec;19(12):e202200805. doi: 10.1002/cbdv.202200805. Epub 2022 Nov 3.
9
CancerBERT: a cancer domain-specific language model for extracting breast cancer phenotypes from electronic health records.CancerBERT:一种癌症领域特定的语言模型,用于从电子健康记录中提取乳腺癌表型。
J Am Med Inform Assoc. 2022 Jun 14;29(7):1208-1216. doi: 10.1093/jamia/ocac040.
10
Development of a phenotype ontology for autism spectrum disorder by natural language processing on electronic health records.通过电子健康记录中的自然语言处理开发自闭症谱系障碍的表型本体。
J Neurodev Disord. 2022 May 23;14(1):32. doi: 10.1186/s11689-022-09442-0.

引用本文的文献

1
Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records.用于从电子健康记录中提取精神疾病表型的大语言模型
medRxiv. 2025 Aug 12:2025.08.07.25333172. doi: 10.1101/2025.08.07.25333172.
2
Improving automated deep phenotyping through large language models using retrieval-augmented generation.通过使用检索增强生成的大语言模型改进自动化深度表型分析。
Genome Med. 2025 Aug 18;17(1):91. doi: 10.1186/s13073-025-01521-w.
3
Do LLMs Surpass Encoders for Biomedical NER?大型语言模型在生物医学命名实体识别方面是否超越了编码器?

本文引用的文献

1
Phen2Disease: a phenotype-driven model for disease and gene prioritization by bidirectional maximum matching semantic similarities.Phen2Disease:一种基于表型驱动的疾病和基因优先级排序模型,通过双向最大匹配语义相似性实现。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad172.
2
A large language model for electronic health records.用于电子健康记录的大型语言模型。
NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.
3
Automated deidentification of radiology reports combining transformer and "hide in plain sight" rule-based methods.
Proc (IEEE Int Conf Healthc Inform). 2025 Jun;2025:352-358. doi: 10.1109/ICHI64645.2025.00048. Epub 2025 Jul 22.
4
Finding buried genetic test results in the electronic health record is inefficient and variable across institutions.在电子健康记录中查找隐藏的基因检测结果效率低下,且各机构之间存在差异。
Ther Adv Rare Dis. 2025 Jul 11;6:26330040251356521. doi: 10.1177/26330040251356521. eCollection 2025 Jan-Dec.
5
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
6
Large Language Models Struggle in Token-Level Clinical Named Entity Recognition.大型语言模型在词元级临床命名实体识别方面存在困难。
AMIA Annu Symp Proc. 2025 May 22;2024:748-757. eCollection 2024.
7
A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes.在医生笔记的高通量表型分析中,大型语言模型优于其他计算方法。
AMIA Annu Symp Proc. 2025 May 22;2024:838-846. eCollection 2024.
8
Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications.通过偏好优化将多模态集成知识转移到具有生物医学应用的大语言模型
ArXiv. 2025 May 9:arXiv:2505.05736v1.
9
Applying artificial intelligence to rare diseases: a literature review highlighting lessons from Fabry disease.将人工智能应用于罕见病:一项以法布里病为例的文献综述
Orphanet J Rare Dis. 2025 Apr 17;20(1):186. doi: 10.1186/s13023-025-03655-x.
10
A phenotype-based AI pipeline outperforms human experts in differentially diagnosing rare diseases using EHRs.一种基于表型的人工智能流程在使用电子健康记录对罕见疾病进行鉴别诊断方面比人类专家表现更出色。
NPJ Digit Med. 2025 Jan 28;8(1):68. doi: 10.1038/s41746-025-01452-1.
基于 Transformer 和“隐藏在明处”规则的放射学报告自动去识别化。
J Am Med Inform Assoc. 2023 Jan 18;30(2):318-328. doi: 10.1093/jamia/ocac219.
4
Chemical-protein relation extraction with ensembles of carefully tuned pretrained language models.基于精心调优的预训练语言模型集成的化学-蛋白质关系抽取。
Database (Oxford). 2022 Nov 18;2022. doi: 10.1093/database/baac098.
5
BERN2: an advanced neural biomedical named entity recognition and normalization tool.BERN2:一种先进的神经生物医学命名实体识别和标准化工具。
Bioinformatics. 2022 Oct 14;38(20):4837-4839. doi: 10.1093/bioinformatics/btac598.
6
Phenotype-aware prioritisation of rare Mendelian disease variants.表型感知的罕见孟德尔疾病变异优先级排序。
Trends Genet. 2022 Dec;38(12):1271-1283. doi: 10.1016/j.tig.2022.07.002. Epub 2022 Aug 4.
7
Development of a phenotype ontology for autism spectrum disorder by natural language processing on electronic health records.通过电子健康记录中的自然语言处理开发自闭症谱系障碍的表型本体。
J Neurodev Disord. 2022 May 23;14(1):32. doi: 10.1186/s11689-022-09442-0.
8
Recommendations from the IRDiRC Working Group on methodologies to assess the impact of diagnoses and therapies on rare disease patients.IRDiRC 工作组关于评估诊断和疗法对罕见病患者影响的方法学建议。
Orphanet J Rare Dis. 2022 May 7;17(1):181. doi: 10.1186/s13023-022-02337-2.
9
PhenoBERT: A Combined Deep Learning Method for Automated Recognition of Human Phenotype Ontology.PhenoBERT:一种用于自动识别人类表型本体的深度学习组合方法。
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1269-1277. doi: 10.1109/TCBB.2022.3170301. Epub 2023 Apr 3.
10
PhenoRerank: A re-ranking model for phenotypic concept recognition pre-trained on human phenotype ontology.PhenoRerank:基于人类表型本体预训练的表型概念识别重新排序模型。
J Biomed Inform. 2022 May;129:104059. doi: 10.1016/j.jbi.2022.104059. Epub 2022 Mar 26.