Yang Jingye, Liu Cong, Deng Wendy, Wu Da, Weng Chunhua, Zhou Yunyun, Wang Kai
Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA.
Patterns (N Y). 2023 Dec 5;5(1):100887. doi: 10.1016/j.patter.2023.100887. eCollection 2024 Jan 12.
To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models-PhenoBCBERT and PhenoGPT-for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models to automate the detection of phenotype terms, including those not in the current HPO. We compare these models with PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also show strong performance in case studies on biomedical literature. We evaluate the strengths and weaknesses of BERT- and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.
为了提高遗传疾病临床记录中的表型识别能力,我们开发了两种模型——PhenoBCBERT和PhenoGPT,用于扩展人类表型本体(HPO)术语的词汇表。虽然HPO为表型提供了标准化词汇,但由于传统启发式或基于规则的方法存在局限性,现有工具往往无法涵盖表型的全部范围。我们的模型利用大语言模型自动检测表型术语,包括当前HPO中未有的术语。我们将这些模型与另一种HPO识别工具PhenoTagger进行比较,发现我们的模型能够识别更广泛的表型概念,包括以前未表征的概念。我们的模型在生物医学文献的案例研究中也表现出强大的性能。我们在架构和准确性等方面评估了基于BERT和GPT的模型的优缺点。总体而言,我们的模型增强了从临床文本中自动检测表型的能力,改善了对人类疾病的下游分析。