Suppr超能文献

使用基于化学和基因描述的集成变压器模型从生物医学文献中挖掘药物-靶点相互作用。

Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model.

作者信息

Aldahdooh Jehad, Tanoli Ziaurrehman, Tang Jing

机构信息

Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki 00290, Finland.

Doctoral Programme in Computer Science, University of Helsinki, Helsinki 00290, Finland.

出版信息

Bioinform Adv. 2024 Jul 22;4(1):vbae106. doi: 10.1093/bioadv/vbae106. eCollection 2024.

Abstract

MOTIVATION

Drug-target interactions (DTIs) play a pivotal role in drug discovery, as it aims to identify potential drug targets and elucidate their mechanism of action. In recent years, the application of natural language processing (NLP), particularly when combined with pre-trained language models, has gained considerable momentum in the biomedical domain, with the potential to mine vast amounts of texts to facilitate the efficient extraction of DTIs from the literature.

RESULTS

In this article, we approach the task of DTIs as an entity-relationship extraction problem, utilizing different pre-trained transformer language models, such as BERT, to extract DTIs. Our results indicate that an ensemble approach, by combining gene descriptions from the Entrez Gene database with chemical descriptions from the Comparative Toxicogenomics Database (CTD), is critical for achieving optimal performance. The proposed model achieves an 1 score of 80.6 on the hidden DrugProt test set, which is the top-ranked performance among all the submitted models in the official evaluation. Furthermore, we conduct a comparative analysis to evaluate the effectiveness of various gene textual descriptions sourced from Entrez Gene and UniProt databases to gain insights into their impact on the performance. Our findings highlight the potential of NLP-based text mining using gene and chemical descriptions to improve drug-target extraction tasks.

AVAILABILITY AND IMPLEMENTATION

Datasets utilized in this study are accessible at https://dtis.drugtargetcommons.org/.

摘要

动机

药物-靶点相互作用(DTIs)在药物发现中起着关键作用,因为其旨在识别潜在的药物靶点并阐明其作用机制。近年来,自然语言处理(NLP)的应用,特别是与预训练语言模型相结合时,在生物医学领域获得了显著发展,有潜力挖掘大量文本以促进从文献中高效提取药物-靶点相互作用。

结果

在本文中,我们将药物-靶点相互作用任务视为实体关系提取问题,利用不同的预训练Transformer语言模型,如BERT,来提取药物-靶点相互作用。我们的结果表明,通过将来自Entrez Gene数据库的基因描述与来自比较毒理基因组学数据库(CTD)的化学描述相结合的集成方法,对于实现最佳性能至关重要。所提出的模型在隐藏的DrugProt测试集上的F1分数为80.6,这在官方评估中所有提交的模型中排名第一。此外,我们进行了比较分析,以评估源自Entrez Gene和UniProt数据库的各种基因文本描述的有效性,以深入了解它们对性能的影响。我们的研究结果突出了使用基于NLP的文本挖掘结合基因和化学描述来改进药物-靶点提取任务的潜力。

可用性和实现

本研究中使用的数据集可在https://dtis.drugtargetcommons.org/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/d99643046999/vbae106f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验