一项使用GPT模型进行生物医学关系提取的研究。

A Study of Biomedical Relation Extraction Using GPT Models.

作者信息

Zhang Jeffrey, Wibert Maxwell, Zhou Huixue, Peng Xueqing, Chen Qingyu, Keloth Vipina K, Hu Yan, Zhang Rui, Xu Hua, Raja Kalpana

机构信息

Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA.

Institute for Health Informatics, University of Minnesota, Twin Cities, USA.

出版信息

AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:391-400. eCollection 2024.

PMID:38827097

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11141827/

Abstract

Relation Extraction (RE) is a natural language processing (NLP) task for extracting semantic relations between biomedical entities. Recent developments in pre-trained large language models (LLM) motivated NLP researchers to use them for various NLP tasks. We investigated GPT-3.5-turbo and GPT-4 on extracting the relations from three standard datasets, EU-ADR, Gene Associations Database (GAD), and ChemProt. Unlike the existing approaches using datasets with masked entities, we used three versions for each dataset for our experiment: a version with masked entities, a second version with the original entities (unmasked), and a third version with abbreviations replaced with the original terms. We developed the prompts for various versions and used the chat completion model from GPT API. Our approach achieved a F1-score of 0.498 to 0.809 for GPT-3.5-turbo, and a highest F1-score of 0.84 for GPT-4. For certain experiments, the performance of GPT, BioBERT, and PubMedBERT are almost the same.

摘要

关系提取（RE）是一种用于提取生物医学实体之间语义关系的自然语言处理（NLP）任务。预训练大语言模型（LLM）的最新进展促使NLP研究人员将其用于各种NLP任务。我们研究了GPT-3.5-turbo和GPT-4从三个标准数据集（欧盟-药物不良反应（EU-ADR）、基因关联数据库（GAD）和化学蛋白质（ChemProt））中提取关系的能力。与现有的使用带有掩码实体的数据集的方法不同，我们在实验中为每个数据集使用了三个版本：一个带有掩码实体的版本、一个带有原始实体（未掩码）的第二个版本以及一个用原始术语替换缩写的第三个版本。我们为各种版本开发了提示，并使用了来自GPT API的聊天完成模型。我们的方法在GPT-3.5-turbo上的F1分数为0.498至0.809，GPT-4的最高F1分数为0.84。在某些实验中，GPT、BioBERT和PubMedBERT的性能几乎相同。

相似文献

A Study of Biomedical Relation Extraction Using GPT Models.

AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:391-400. eCollection 2024.

Generative large language models are all-purpose text analytics engines: text-to-text learning is all your need.

J Am Med Inform Assoc. 2024 Sep 1;31(9):1892-1903. doi: 10.1093/jamia/ocae078.

Enhancing Relation Extraction for COVID-19 Vaccine Shot-Adverse Event Associations with Large Language Models.

Res Sq. 2025 Mar 17:rs.3.rs-6201919. doi: 10.21203/rs.3.rs-6201919/v1.

The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.

Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.

Relation extraction using large language models: a case study on acupuncture point locations.

J Am Med Inform Assoc. 2024 Nov 1;31(11):2622-2631. doi: 10.1093/jamia/ocae233.

A dataset and benchmark for hospital course summarization with adapted large language models.

J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.

Sentiment Analysis Using a Large Language Model-Based Approach to Detect Opioids Mixed With Other Substances Via Social Media: Method Development and Validation.

JMIR Infodemiology. 2025 Jun 19;5:e70525. doi: 10.2196/70525.

Development of a GPT-4-Powered Virtual Simulated Patient and Communication Training Platform for Medical Students to Practice Discussing Abnormal Mammogram Results With Patients: Multiphase Study.

JMIR Form Res. 2025 Apr 17;9:e65670. doi: 10.2196/65670.

Use of ChatGPT Large Language Models to Extract Details of Recommendations for Additional Imaging From Free-Text Impressions of Radiology Reports.

AJR Am J Roentgenol. 2025 Apr;224(4):e2432341. doi: 10.2214/AJR.24.32341. Epub 2025 Jan 29.

Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI's GPT-3.5 Turbo Model: Classification Model Validation and Case Study.

J Med Internet Res. 2025 Jul 3;27:e53332. doi: 10.2196/53332.

引用本文的文献

Reduction of supervision for biomedical knowledge discovery.

BMC Bioinformatics. 2025 Sep 1;26(1):225. doi: 10.1186/s12859-025-06187-0.

Large Language Models and Genomics for Summarizing the Role of microRNA in Regulating mRNA Expression.

Biomedicines. 2024 Jul 10;12(7):1535. doi: 10.3390/biomedicines12071535.

本文引用的文献

Revisiting Relation Extraction in the era of Large Language Models.

Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:15566-15589. doi: 10.18653/v1/2023.acl-long.868.

BioGPT: generative pre-trained transformer for biomedical text generation and mining.

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac409.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Mining protein phosphorylation information from biomedical literature using NLP parsing and Support Vector Machines.

Comput Methods Programs Biomed. 2018 Jul;160:57-64. doi: 10.1016/j.cmpb.2018.03.022. Epub 2018 Mar 22.

Extracting drug-enzyme relation from literature as evidence for drug drug interaction.

J Biomed Semantics. 2016 Mar 7;7:11. doi: 10.1186/s13326-016-0052-6. eCollection 2016.

Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.

BMC Bioinformatics. 2015 Feb 21;16:55. doi: 10.1186/s12859-015-0472-9.

Biomedical relation extraction: from binary to complex.

Comput Math Methods Med. 2014;2014:298473. doi: 10.1155/2014/298473. Epub 2014 Aug 19.

PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.

Database (Oxford). 2013 Jan 15;2013:bas052. doi: 10.1093/database/bas052. Print 2013.

The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships.

J Biomed Inform. 2012 Oct;45(5):879-84. doi: 10.1016/j.jbi.2012.04.004. Epub 2012 Apr 25.

Allie: a database and a search service of abbreviations and long forms.

Database (Oxford). 2011 Apr 15;2011:bar013. doi: 10.1093/database/bar013. Print 2011.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一项使用GPT模型进行生物医学关系提取的研究。

A Study of Biomedical Relation Extraction Using GPT Models.

作者信息

Zhang Jeffrey, Wibert Maxwell, Zhou Huixue, Peng Xueqing, Chen Qingyu, Keloth Vipina K, Hu Yan, Zhang Rui, Xu Hua, Raja Kalpana

机构信息

Section for Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, USA.

Institute for Health Informatics, University of Minnesota, Twin Cities, USA.

出版信息

AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:391-400. eCollection 2024.

PMID:38827097

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11141827/

Abstract

摘要

一项使用GPT模型进行生物医学关系提取的研究。

A Study of Biomedical Relation Extraction Using GPT Models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一项使用GPT模型进行生物医学关系提取的研究。

A Study of Biomedical Relation Extraction Using GPT Models.

作者信息

机构信息

出版信息