Kim Seonho, Yoon Juntae, Kwon Ohyoung
Department of Computer Science and Engineering, Sogang University, Seoul 04107, Republic of Korea.
VAIV Company, Seoul 04107, Republic of Korea.
Bioengineering (Basel). 2023 May 12;10(5):586. doi: 10.3390/bioengineering10050586.
The identification of drug-drug and chemical-protein interactions is essential for understanding unpredictable changes in the pharmacological effects of drugs and mechanisms of diseases and developing therapeutic drugs. In this study, we extract drug-related interactions from the DDI (Drug-Drug Interaction) Extraction-2013 Shared Task dataset and the BioCreative ChemProt (Chemical-Protein) dataset using various transfer transformers. We propose BERT that uses a graph attention network (GAT) to take into account the local structure of sentences and embedding features of nodes under the self-attention scheme and investigate whether incorporating syntactic structure can help relation extraction. In addition, we suggest T5, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the relation classification problem by removing the self-attention layer in the decoder block. Furthermore, we evaluated the potential of biomedical relation extraction of GPT-3 (Generative Pre-trained Transformer) using GPT-3 variant models. As a result, T5, which is a model with a tailored decoder designed for classification problems within the T5 architecture, demonstrated very promising performances for both tasks. We achieved an accuracy of 91.15% in the DDI dataset and an accuracy of 94.29% for the CPR (Chemical-Protein Relation) class group in ChemProt dataset. However, BERT did not show a significant performance improvement in the aspect of relation extraction. We demonstrated that transformer-based approaches focused only on relationships between words are implicitly eligible to understand language well without additional knowledge such as structural information.
药物与药物以及化学物质与蛋白质相互作用的识别对于理解药物药理作用的不可预测变化、疾病机制以及开发治疗药物至关重要。在本研究中,我们使用各种迁移变换器从药物 - 药物相互作用(DDI)提取2013共享任务数据集和生物创意化学蛋白质(化学物质 - 蛋白质)数据集中提取与药物相关的相互作用。我们提出了BERT,它在自注意力机制下使用图注意力网络(GAT)来考虑句子的局部结构和节点的嵌入特征,并研究纳入句法结构是否有助于关系提取。此外,我们提出了T5,它通过去除解码器块中的自注意力层,将T5(文本到文本迁移变换器)的自回归生成任务应用于关系分类问题。此外,我们使用GPT - 3变体模型评估了GPT - 3(生成式预训练变换器)在生物医学关系提取方面的潜力。结果,T5作为一种在T5架构内为分类问题设计了定制解码器的模型,在这两项任务中都表现出非常有前景的性能。我们在DDI数据集中达到了91.15%的准确率,在化学蛋白质数据集中的化学物质 - 蛋白质关系(CPR)类组中达到了94.29%的准确率。然而,BERT在关系提取方面并没有表现出显著的性能提升。我们证明了仅关注单词之间关系的基于变换器的方法在没有诸如结构信息等额外知识的情况下,隐含地有能力很好地理解语言。