Suppr超能文献

EGFI:融合丰富的实体和句子信息进行药物-药物相互作用提取和生成。

EGFI: drug-drug interaction extraction and generation with fusion of enriched entity and sentence information.

机构信息

Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.

School of Artificial Intelligence, Jilin University, China.

出版信息

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab451.

Abstract

MOTIVATION

The rapid growth in literature accumulates diverse and yet comprehensive biomedical knowledge hidden to be mined such as drug interactions. However, it is difficult to extract the heterogeneous knowledge to retrieve or even discover the latest and novel knowledge in an efficient manner. To address such a problem, we propose EGFI for extracting and consolidating drug interactions from large-scale medical literature text data. Specifically, EGFI consists of two parts: classification and generation. In the classification part, EGFI encompasses the language model BioBERT which has been comprehensively pretrained on biomedical corpus. In particular, we propose the multihead self-attention mechanism and packed BiGRU to fuse multiple semantic information for rigorous context modeling. In the generation part, EGFI utilizes another pretrained language model BioGPT-2 where the generation sentences are selected based on filtering rules.

RESULTS

We evaluated the classification part on 'DDIs 2013' dataset and 'DTIs' dataset, achieving the F1 scores of 0.842 and 0.720 respectively. Moreover, we applied the classification part to distinguish high-quality generated sentences and verified with the existing growth truth to confirm the filtered sentences. The generated sentences that are not recorded in DrugBank and DDIs 2013 dataset demonstrated the potential of EGFI to identify novel drug relationships.

AVAILABILITY

Source code are publicly available at https://github.com/Layne-Huang/EGFI.

摘要

动机

文献的快速增长积累了各种综合的生物医学知识,这些知识有待挖掘,例如药物相互作用。然而,很难提取异构知识以高效地检索甚至发现最新和新颖的知识。为了解决这个问题,我们提出了 EGFI,用于从大规模医学文献文本数据中提取和整合药物相互作用。具体来说,EGFI 由两部分组成:分类和生成。在分类部分,EGFI 包含了经过全面生物医学语料库预训练的语言模型 BioBERT。特别是,我们提出了多头自注意力机制和打包的 BiGRU,以融合多种语义信息进行严格的上下文建模。在生成部分,EGFI 利用了另一个经过预训练的语言模型 BioGPT-2,根据过滤规则选择生成的句子。

结果

我们在“DDIs 2013”数据集和“DTIs”数据集上评估了分类部分,分别获得了 0.842 和 0.720 的 F1 分数。此外,我们将分类部分应用于区分高质量生成的句子,并通过与现有增长事实进行验证,以确认过滤后的句子。在 DrugBank 和 DDIs 2013 数据集未记录的生成句子表明 EGFI 有潜力识别新的药物关系。

可用性

源代码可在 https://github.com/Layne-Huang/EGFI 上公开获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验