Suppr超能文献

基于高斯概率分布和外部生物医学知识的化学-蛋白质相互作用提取。

Chemical-protein interaction extraction via Gaussian probability distribution and external biomedical knowledge.

机构信息

School of Computer Science and Technology.

School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China.

出版信息

Bioinformatics. 2020 Aug 1;36(15):4323-4330. doi: 10.1093/bioinformatics/btaa491.

Abstract

MOTIVATION

The biomedical literature contains a wealth of chemical-protein interactions (CPIs). Automatically extracting CPIs described in biomedical literature is essential for drug discovery, precision medicine, as well as basic biomedical research. Most existing methods focus only on the sentence sequence to identify these CPIs. However, the local structure of sentences and external biomedical knowledge also contain valuable information. Effective use of such information may improve the performance of CPI extraction.

RESULTS

In this article, we propose a novel neural network-based approach to improve CPI extraction. Specifically, the approach first employs BERT to generate high-quality contextual representations of the title sequence, instance sequence and knowledge sequence. Then, the Gaussian probability distribution is introduced to capture the local structure of the instance. Meanwhile, the attention mechanism is applied to fuse the title information and biomedical knowledge, respectively. Finally, the related representations are concatenated and fed into the softmax function to extract CPIs. We evaluate our proposed model on the CHEMPROT corpus. Our proposed model is superior in performance as compared with other state-of-the-art models. The experimental results show that the Gaussian probability distribution and external knowledge are complementary to each other. Integrating them can effectively improve the CPI extraction performance. Furthermore, the Gaussian probability distribution can effectively improve the extraction performance of sentences with overlapping relations in biomedical relation extraction tasks.

AVAILABILITY AND IMPLEMENTATION

Data and code are available at https://github.com/CongSun-dlut/CPI_extraction.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

生物医学文献中包含丰富的化学-蛋白质相互作用(CPIs)。自动提取生物医学文献中描述的 CPIs 对于药物发现、精准医学以及基础生物医学研究至关重要。大多数现有方法仅关注句子序列来识别这些 CPIs。然而,句子的局部结构和外部生物医学知识也包含有价值的信息。有效利用这些信息可以提高 CPI 提取的性能。

结果

在本文中,我们提出了一种基于神经网络的新方法来改进 CPI 提取。具体来说,该方法首先使用 BERT 生成标题序列、实例序列和知识序列的高质量上下文表示。然后,引入高斯概率分布来捕获实例的局部结构。同时,应用注意力机制分别融合标题信息和生物医学知识。最后,将相关表示串联起来并输入到 softmax 函数中以提取 CPIs。我们在 CHEMPROT 语料库上评估了我们提出的模型。与其他最先进的模型相比,我们提出的模型在性能上更优。实验结果表明,高斯概率分布和外部知识是互补的。将它们集成可以有效地提高 CPI 提取性能。此外,高斯概率分布可以有效地提高生物医学关系提取任务中具有重叠关系的句子的提取性能。

可用性和实现

数据和代码可在 https://github.com/CongSun-dlut/CPI_extraction 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验