• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于化学和基因描述的集成变压器模型从生物医学文献中挖掘药物-靶点相互作用。

Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model.

作者信息

Aldahdooh Jehad, Tanoli Ziaurrehman, Tang Jing

机构信息

Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki 00290, Finland.

Doctoral Programme in Computer Science, University of Helsinki, Helsinki 00290, Finland.

出版信息

Bioinform Adv. 2024 Jul 22;4(1):vbae106. doi: 10.1093/bioadv/vbae106. eCollection 2024.

DOI:10.1093/bioadv/vbae106
PMID:39092007
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11293871/
Abstract

MOTIVATION

Drug-target interactions (DTIs) play a pivotal role in drug discovery, as it aims to identify potential drug targets and elucidate their mechanism of action. In recent years, the application of natural language processing (NLP), particularly when combined with pre-trained language models, has gained considerable momentum in the biomedical domain, with the potential to mine vast amounts of texts to facilitate the efficient extraction of DTIs from the literature.

RESULTS

In this article, we approach the task of DTIs as an entity-relationship extraction problem, utilizing different pre-trained transformer language models, such as BERT, to extract DTIs. Our results indicate that an ensemble approach, by combining gene descriptions from the Entrez Gene database with chemical descriptions from the Comparative Toxicogenomics Database (CTD), is critical for achieving optimal performance. The proposed model achieves an 1 score of 80.6 on the hidden DrugProt test set, which is the top-ranked performance among all the submitted models in the official evaluation. Furthermore, we conduct a comparative analysis to evaluate the effectiveness of various gene textual descriptions sourced from Entrez Gene and UniProt databases to gain insights into their impact on the performance. Our findings highlight the potential of NLP-based text mining using gene and chemical descriptions to improve drug-target extraction tasks.

AVAILABILITY AND IMPLEMENTATION

Datasets utilized in this study are accessible at https://dtis.drugtargetcommons.org/.

摘要

动机

药物-靶点相互作用(DTIs)在药物发现中起着关键作用,因为其旨在识别潜在的药物靶点并阐明其作用机制。近年来,自然语言处理(NLP)的应用,特别是与预训练语言模型相结合时,在生物医学领域获得了显著发展,有潜力挖掘大量文本以促进从文献中高效提取药物-靶点相互作用。

结果

在本文中,我们将药物-靶点相互作用任务视为实体关系提取问题,利用不同的预训练Transformer语言模型,如BERT,来提取药物-靶点相互作用。我们的结果表明,通过将来自Entrez Gene数据库的基因描述与来自比较毒理基因组学数据库(CTD)的化学描述相结合的集成方法,对于实现最佳性能至关重要。所提出的模型在隐藏的DrugProt测试集上的F1分数为80.6,这在官方评估中所有提交的模型中排名第一。此外,我们进行了比较分析,以评估源自Entrez Gene和UniProt数据库的各种基因文本描述的有效性,以深入了解它们对性能的影响。我们的研究结果突出了使用基于NLP的文本挖掘结合基因和化学描述来改进药物-靶点提取任务的潜力。

可用性和实现

本研究中使用的数据集可在https://dtis.drugtargetcommons.org/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/5529c1fdbf88/vbae106f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/d99643046999/vbae106f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/eb7dffcf9edf/vbae106f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/441045374b2a/vbae106f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/24367c92d8d8/vbae106f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/666c53a78f0f/vbae106f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/bd323484ce89/vbae106f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/f7757b1a1640/vbae106f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/5529c1fdbf88/vbae106f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/d99643046999/vbae106f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/eb7dffcf9edf/vbae106f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/441045374b2a/vbae106f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/24367c92d8d8/vbae106f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/666c53a78f0f/vbae106f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/bd323484ce89/vbae106f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/f7757b1a1640/vbae106f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d15e/11293871/5529c1fdbf88/vbae106f8.jpg

相似文献

1
Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model.使用基于化学和基因描述的集成变压器模型从生物医学文献中挖掘药物-靶点相互作用。
Bioinform Adv. 2024 Jul 22;4(1):vbae106. doi: 10.1093/bioadv/vbae106. eCollection 2024.
2
Chemical-protein relation extraction with ensembles of carefully tuned pretrained language models.基于精心调优的预训练语言模型集成的化学-蛋白质关系抽取。
Database (Oxford). 2022 Nov 18;2022. doi: 10.1093/database/baac098.
3
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
4
Do syntactic trees enhance Bidirectional Encoder Representations from Transformers (BERT) models for chemical-drug relation extraction?句法树是否能增强用于化学药物关系抽取的基于转换器的双向编码器表示(BERT)模型?
Database (Oxford). 2022 Aug 25;2022. doi: 10.1093/database/baac070.
5
A sequence labeling framework for extracting drug-protein relations from biomedical literature.一种从生物医学文献中提取药物-蛋白质关系的序列标注框架。
Database (Oxford). 2022 Jul 19;2022. doi: 10.1093/database/baac058.
6
BioGPT: generative pre-trained transformer for biomedical text generation and mining.BioGPT:用于生物医学文本生成和挖掘的生成式预训练转换器。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac409.
7
Evaluation of GPT and BERT-based models on identifying proteinprotein interactions in biomedical text.基于GPT和BERT模型在生物医学文本中识别蛋白质-蛋白质相互作用的评估
ArXiv. 2023 Dec 13:arXiv:2303.17728v2.
8
Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text.评估GPT和BERT模型用于生物医学文本中蛋白质-蛋白质相互作用的识别
Bioinform Adv. 2024 Sep 11;4(1):vbae133. doi: 10.1093/bioadv/vbae133. eCollection 2024.
9
BERT-GT: cross-sentence n-ary relation extraction with BERT and Graph Transformer.BERT-GT:使用BERT和图变换器进行跨句子n元关系提取
Bioinformatics. 2021 Apr 5;36(24):5678-5685. doi: 10.1093/bioinformatics/btaa1087.
10
Bioformer: an efficient transformer language model for biomedical text mining.生物former:一种用于生物医学文本挖掘的高效Transformer语言模型。
ArXiv. 2023 Feb 3:arXiv:2302.01588v1.

引用本文的文献

1
Advances and challenges in drug repurposing in precision therapeutics of colorectal cancer.结直肠癌精准治疗中药物重新利用的进展与挑战
World J Gastrointest Oncol. 2025 Jul 15;17(7):107681. doi: 10.4251/wjgo.v17.i7.107681.

本文引用的文献

1
Overview of DrugProt task at BioCreative VII: data and methods for large-scale text mining and knowledge graph generation of heterogenous chemical-protein relations.DrugProt 任务概述在 BioCreative VII 上:大规模文本挖掘和异构化学-蛋白质关系知识图生成的数据和方法。
Database (Oxford). 2023 Nov 28;2023. doi: 10.1093/database/baad080.
2
Annotation of biologically relevant ligands in UniProtKB using ChEBI.使用 ChEBI 对 UniProtKB 中的生物相关配体进行注释。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac793.
3
Integrating heterogeneous knowledge graphs into drug-drug interaction extraction from the literature.
将异质知识图谱整合到文献中的药物-药物相互作用提取中。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac754.
4
Chemical-protein relation extraction with ensembles of carefully tuned pretrained language models.基于精心调优的预训练语言模型集成的化学-蛋白质关系抽取。
Database (Oxford). 2022 Nov 18;2022. doi: 10.1093/database/baac098.
5
PubChem 2023 update.PubChem 2023 更新。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1373-D1380. doi: 10.1093/nar/gkac956.
6
BioGPT: generative pre-trained transformer for biomedical text generation and mining.BioGPT:用于生物医学文本生成和挖掘的生成式预训练转换器。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac409.
7
A sequence labeling framework for extracting drug-protein relations from biomedical literature.一种从生物医学文献中提取药物-蛋白质关系的序列标注框架。
Database (Oxford). 2022 Jul 19;2022. doi: 10.1093/database/baac058.
8
Using BERT to identify drug-target interactions from whole PubMed.使用 BERT 从整个 PubMed 中识别药物-靶标相互作用。
BMC Bioinformatics. 2022 Jun 21;23(1):245. doi: 10.1186/s12859-022-04768-x.
9
Relation classification via BERT with piecewise convolution and focal loss.基于分段卷积和焦点损失的 BERT 关系分类。
PLoS One. 2021 Sep 10;16(9):e0257092. doi: 10.1371/journal.pone.0257092. eCollection 2021.
10
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.