Jui Jayati H, Hauskrecht Milos
University of Pittsburgh, PA 15260, USA.
AMIA Annu Symp Proc. 2025 May 22;2024:571-580. eCollection 2024.
Recent advancements in Large Language Models (LLMs) have ushered in a new era for knowledge extraction in the domains of biological and clinical natural language processing (NLP). In this research, we present a novel approach to understanding the regulatory effects of genes and medications on biological processes central to wound healing. Utilizing the capabilities of Generative Pre-trained Transformer (GPT) models by OpenAI, specifically GPT-3.5 and GPT-4, we developed a comprehensive pipeline for the identification and grounding of biological processes and the extraction of such regulatory relations. The performances of both GPTs were rigorously evaluated against a manually annotated corpus of 104 PubMed titles, focusing on their ability to accurately identify and ground biological process concepts and extract relevant regulatory relationships from the text. Our findings demonstrate that GPT-4, in particular, exhibits superior performance in all the tasks, showcasing its potential to facilitate significant advancements in biomedical research without requiring model fine-tuning.
大语言模型(LLMs)的最新进展为生物和临床自然语言处理(NLP)领域的知识提取开创了一个新时代。在本研究中,我们提出了一种新颖的方法,用于理解基因和药物对伤口愈合核心生物过程的调节作用。利用OpenAI的生成式预训练变换器(GPT)模型,特别是GPT-3.5和GPT-4的能力,我们开发了一个综合管道,用于识别和定位生物过程以及提取此类调节关系。针对104个PubMed标题的人工注释语料库,对两种GPT的性能进行了严格评估,重点关注它们准确识别和定位生物过程概念以及从文本中提取相关调节关系的能力。我们的研究结果表明,特别是GPT-4在所有任务中都表现出卓越的性能,展示了其在无需模型微调的情况下推动生物医学研究取得重大进展的潜力。