Suppr超能文献

基于依存图和译码器增强型变压器模型的生物医学关系抽取

Biomedical Relation Extraction Using Dependency Graph and Decoder-Enhanced Transformer Model.

作者信息

Kim Seonho, Yoon Juntae, Kwon Ohyoung

机构信息

Department of Computer Science and Engineering, Sogang University, Seoul 04107, Republic of Korea.

VAIV Company, Seoul 04107, Republic of Korea.

出版信息

Bioengineering (Basel). 2023 May 12;10(5):586. doi: 10.3390/bioengineering10050586.

Abstract

The identification of drug-drug and chemical-protein interactions is essential for understanding unpredictable changes in the pharmacological effects of drugs and mechanisms of diseases and developing therapeutic drugs. In this study, we extract drug-related interactions from the DDI (Drug-Drug Interaction) Extraction-2013 Shared Task dataset and the BioCreative ChemProt (Chemical-Protein) dataset using various transfer transformers. We propose BERT that uses a graph attention network (GAT) to take into account the local structure of sentences and embedding features of nodes under the self-attention scheme and investigate whether incorporating syntactic structure can help relation extraction. In addition, we suggest T5, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the relation classification problem by removing the self-attention layer in the decoder block. Furthermore, we evaluated the potential of biomedical relation extraction of GPT-3 (Generative Pre-trained Transformer) using GPT-3 variant models. As a result, T5, which is a model with a tailored decoder designed for classification problems within the T5 architecture, demonstrated very promising performances for both tasks. We achieved an accuracy of 91.15% in the DDI dataset and an accuracy of 94.29% for the CPR (Chemical-Protein Relation) class group in ChemProt dataset. However, BERT did not show a significant performance improvement in the aspect of relation extraction. We demonstrated that transformer-based approaches focused only on relationships between words are implicitly eligible to understand language well without additional knowledge such as structural information.

摘要

药物与药物以及化学物质与蛋白质相互作用的识别对于理解药物药理作用的不可预测变化、疾病机制以及开发治疗药物至关重要。在本研究中,我们使用各种迁移变换器从药物 - 药物相互作用(DDI)提取2013共享任务数据集和生物创意化学蛋白质(化学物质 - 蛋白质)数据集中提取与药物相关的相互作用。我们提出了BERT,它在自注意力机制下使用图注意力网络(GAT)来考虑句子的局部结构和节点的嵌入特征,并研究纳入句法结构是否有助于关系提取。此外,我们提出了T5,它通过去除解码器块中的自注意力层,将T5(文本到文本迁移变换器)的自回归生成任务应用于关系分类问题。此外,我们使用GPT - 3变体模型评估了GPT - 3(生成式预训练变换器)在生物医学关系提取方面的潜力。结果,T5作为一种在T5架构内为分类问题设计了定制解码器的模型,在这两项任务中都表现出非常有前景的性能。我们在DDI数据集中达到了91.15%的准确率,在化学蛋白质数据集中的化学物质 - 蛋白质关系(CPR)类组中达到了94.29%的准确率。然而,BERT在关系提取方面并没有表现出显著的性能提升。我们证明了仅关注单词之间关系的基于变换器的方法在没有诸如结构信息等额外知识的情况下,隐含地有能力很好地理解语言。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5045/10215465/7089cdf167c3/bioengineering-10-00586-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验