EGFI：融合丰富的实体和句子信息进行药物-药物相互作用提取和生成。

EGFI: drug-drug interaction extraction and generation with fusion of enriched entity and sentence information.

机构信息

Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.

School of Artificial Intelligence, Jilin University, China.

出版信息

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab451.

DOI:10.1093/bib/bbab451

PMID:34791012

Abstract

MOTIVATION

The rapid growth in literature accumulates diverse and yet comprehensive biomedical knowledge hidden to be mined such as drug interactions. However, it is difficult to extract the heterogeneous knowledge to retrieve or even discover the latest and novel knowledge in an efficient manner. To address such a problem, we propose EGFI for extracting and consolidating drug interactions from large-scale medical literature text data. Specifically, EGFI consists of two parts: classification and generation. In the classification part, EGFI encompasses the language model BioBERT which has been comprehensively pretrained on biomedical corpus. In particular, we propose the multihead self-attention mechanism and packed BiGRU to fuse multiple semantic information for rigorous context modeling. In the generation part, EGFI utilizes another pretrained language model BioGPT-2 where the generation sentences are selected based on filtering rules.

RESULTS

We evaluated the classification part on 'DDIs 2013' dataset and 'DTIs' dataset, achieving the F1 scores of 0.842 and 0.720 respectively. Moreover, we applied the classification part to distinguish high-quality generated sentences and verified with the existing growth truth to confirm the filtered sentences. The generated sentences that are not recorded in DrugBank and DDIs 2013 dataset demonstrated the potential of EGFI to identify novel drug relationships.

AVAILABILITY

Source code are publicly available at https://github.com/Layne-Huang/EGFI.

摘要

动机

文献的快速增长积累了各种综合的生物医学知识，这些知识有待挖掘，例如药物相互作用。然而，很难提取异构知识以高效地检索甚至发现最新和新颖的知识。为了解决这个问题，我们提出了 EGFI，用于从大规模医学文献文本数据中提取和整合药物相互作用。具体来说，EGFI 由两部分组成：分类和生成。在分类部分，EGFI 包含了经过全面生物医学语料库预训练的语言模型 BioBERT。特别是，我们提出了多头自注意力机制和打包的 BiGRU，以融合多种语义信息进行严格的上下文建模。在生成部分，EGFI 利用了另一个经过预训练的语言模型 BioGPT-2，根据过滤规则选择生成的句子。

结果

我们在“DDIs 2013”数据集和“DTIs”数据集上评估了分类部分，分别获得了 0.842 和 0.720 的 F1 分数。此外，我们将分类部分应用于区分高质量生成的句子，并通过与现有增长事实进行验证，以确认过滤后的句子。在 DrugBank 和 DDIs 2013 数据集未记录的生成句子表明 EGFI 有潜力识别新的药物关系。

可用性

源代码可在 https://github.com/Layne-Huang/EGFI 上公开获取。

相似文献

EGFI: drug-drug interaction extraction and generation with fusion of enriched entity and sentence information.EGFI：融合丰富的实体和句子信息进行药物-药物相互作用提取和生成。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab451.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Extracting drug-drug interactions from texts with BioBERT and multiple entity-aware attentions.利用BioBERT和多种实体感知注意力从文本中提取药物-药物相互作用。

J Biomed Inform. 2020 Jun;106:103451. doi: 10.1016/j.jbi.2020.103451. Epub 2020 May 23.

LBERT: Lexically aware Transformer-based Bidirectional Encoder Representation model for learning universal bio-entity relations.LBERT：基于词汇感知的基于Transformer的双向编码器表示模型，用于学习通用生物实体关系。

Bioinformatics. 2021 Apr 20;37(3):404-412. doi: 10.1093/bioinformatics/btaa721.

Enhancing Biomedical Relation Extraction with Transformer Models using Shortest Dependency Path Features and Triplet Information.利用最短依赖路径特征和三元组信息增强基于 Transformer 的生物医学关系抽取

J Biomed Inform. 2021 Oct;122:103893. doi: 10.1016/j.jbi.2021.103893. Epub 2021 Sep 2.

Extraction of microRNA-target interaction sentences from biomedical literature by deep learning approach.通过深度学习方法从生物医学文献中提取微小RNA-靶标相互作用句子。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac497.

Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text.评估GPT和BERT模型用于生物医学文本中蛋白质-蛋白质相互作用的识别

Bioinform Adv. 2024 Sep 11;4(1):vbae133. doi: 10.1093/bioadv/vbae133. eCollection 2024.

Protocol for a reproducible experimental survey on biomedical sentence similarity.生物医学句子相似度可重复实验调查方案

PLoS One. 2021 Mar 24;16(3):e0248663. doi: 10.1371/journal.pone.0248663. eCollection 2021.

A Sentence-Level Joint Relation Classification Model Based on Reinforcement Learning.一种基于强化学习的句子级联合关系分类模型。

Comput Intell Neurosci. 2021 May 26;2021:5557184. doi: 10.1155/2021/5557184. eCollection 2021.

Improving biomedical named entity recognition by dynamic caching inter-sentence information.通过动态缓存句间信息来改进生物医学命名实体识别。

Bioinformatics. 2022 Aug 10;38(16):3976-3983. doi: 10.1093/bioinformatics/btac422.

引用本文的文献

A scoping review on generative AI and large language models in mitigating medication related harm.关于生成式人工智能和大语言模型在减轻药物相关危害方面的范围综述。

NPJ Digit Med. 2025 Mar 28;8(1):182. doi: 10.1038/s41746-025-01565-7.

Elucidating the role of artificial intelligence in drug development from the perspective of drug-target interactions.从药物-靶点相互作用的角度阐明人工智能在药物开发中的作用。

J Pharm Anal. 2025 Mar;15(3):101144. doi: 10.1016/j.jpha.2024.101144. Epub 2024 Nov 14.

Improving Drug-Drug Interaction Extraction with Gaussian Noise.利用高斯噪声改进药物相互作用提取

Pharmaceutics. 2023 Jun 26;15(7):1823. doi: 10.3390/pharmaceutics15071823.

MTMG: A multi-task model with multi-granularity information for drug-drug interaction extraction.MTMG：一种用于药物相互作用提取的具有多粒度信息的多任务模型。

Heliyon. 2023 Jun 1;9(6):e16819. doi: 10.1016/j.heliyon.2023.e16819. eCollection 2023 Jun.

DDI-MuG: Multi-aspect graphs for drug-drug interaction extraction.DDI-MuG：用于药物相互作用提取的多方面图

Front Digit Health. 2023 Apr 24;5:1154133. doi: 10.3389/fdgth.2023.1154133. eCollection 2023.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

EGFI：融合丰富的实体和句子信息进行药物-药物相互作用提取和生成。

EGFI: drug-drug interaction extraction and generation with fusion of enriched entity and sentence information.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献