Suppr超能文献

使用大语言模型增强临床笔记中的表型识别:PhenoBCBERT和PhenoGPT。

Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT.

作者信息

Yang Jingye, Liu Cong, Deng Wendy, Wu Da, Weng Chunhua, Zhou Yunyun, Wang Kai

机构信息

Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.

Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA.

出版信息

Patterns (N Y). 2023 Dec 5;5(1):100887. doi: 10.1016/j.patter.2023.100887. eCollection 2024 Jan 12.

Abstract

To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models-PhenoBCBERT and PhenoGPT-for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models to automate the detection of phenotype terms, including those not in the current HPO. We compare these models with PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also show strong performance in case studies on biomedical literature. We evaluate the strengths and weaknesses of BERT- and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.

摘要

为了提高遗传疾病临床记录中的表型识别能力,我们开发了两种模型——PhenoBCBERT和PhenoGPT,用于扩展人类表型本体(HPO)术语的词汇表。虽然HPO为表型提供了标准化词汇,但由于传统启发式或基于规则的方法存在局限性,现有工具往往无法涵盖表型的全部范围。我们的模型利用大语言模型自动检测表型术语,包括当前HPO中未有的术语。我们将这些模型与另一种HPO识别工具PhenoTagger进行比较,发现我们的模型能够识别更广泛的表型概念,包括以前未表征的概念。我们的模型在生物医学文献的案例研究中也表现出强大的性能。我们在架构和准确性等方面评估了基于BERT和GPT的模型的优缺点。总体而言,我们的模型增强了从临床文本中自动检测表型的能力,改善了对人类疾病的下游分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adcb/10801236/0858ca2c61f9/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验