• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通用 DTA:结合预训练和多任务学习,预测未知药物发现的药物-靶标结合亲和力。

GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery.

机构信息

Faculty of Information Technology, Beijing University of Technology, No. 100, Pingleyuan, Chaoyang District, Beijing, 100124, China.

Beijing International Collaboration Base on Brain Informatics and Wisdom Services, Beijing University of Technology, No. 100, Pingleyuan, Chaoyang District, Beijing, 100124, China.

出版信息

BMC Bioinformatics. 2022 Sep 7;23(1):367. doi: 10.1186/s12859-022-04905-6.

DOI:10.1186/s12859-022-04905-6
PMID:36071406
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9449940/
Abstract

BACKGROUND

Accurately predicting drug-target binding affinity (DTA) in silico plays an important role in drug discovery. Most of the computational methods developed for predicting DTA use machine learning models, especially deep neural networks, and depend on large-scale labelled data. However, it is difficult to learn enough feature representation from tens of millions of compounds and hundreds of thousands of proteins only based on relatively limited labelled drug-target data. There are a large number of unknown drugs, which never appear in the labelled drug-target data. This is a kind of out-of-distribution problems in bio-medicine. Some recent studies adopted self-supervised pre-training tasks to learn structural information of amino acid sequences for enhancing the feature representation of proteins. However, the task gap between pre-training and DTA prediction brings the catastrophic forgetting problem, which hinders the full application of feature representation in DTA prediction and seriously affects the generalization capability of models for unknown drug discovery.

RESULTS

To address these problems, we propose the GeneralizedDTA, which is a new DTA prediction model oriented to unknown drug discovery, by combining pre-training and multi-task learning. We introduce self-supervised protein and drug pre-training tasks to learn richer structural information from amino acid sequences of proteins and molecular graphs of drug compounds, in order to alleviate the problem of high variance caused by encoding based on deep neural networks and accelerate the convergence of prediction model on small-scale labelled data. We also develop a multi-task learning framework with a dual adaptation mechanism to narrow the task gap between pre-training and prediction for preventing overfitting and improving the generalization capability of DTA prediction model on unknown drug discovery. To validate the effectiveness of our model, we construct an unknown drug data set to simulate the scenario of unknown drug discovery. Compared with existing DTA prediction models, the experimental results show that our model has the higher generalization capability in the DTA prediction of unknown drugs.

CONCLUSIONS

The advantages of our model are mainly attributed to two kinds of pre-training tasks and the multi-task learning framework, which can learn richer structural information of proteins and drugs from large-scale unlabeled data, and then effectively integrate it into the downstream prediction task for obtaining a high-quality DTA prediction in unknown drug discovery.

摘要

背景

准确地在计算机上预测药物-靶标结合亲和力(DTA)在药物发现中起着重要作用。大多数用于预测 DTA 的计算方法都使用机器学习模型,尤其是深度神经网络,并依赖于大规模的标记数据。然而,仅基于相对有限的标记药物-靶标数据,从数千万种化合物和数十万种蛋白质中学习足够的特征表示是很困难的。有大量未知药物从未出现在标记的药物-靶标数据中。这是生物医学中一种分布外问题。一些最近的研究采用自监督预训练任务来学习氨基酸序列的结构信息,以增强蛋白质的特征表示。然而,预训练和 DTA 预测之间的任务差距带来了灾难性遗忘问题,这阻碍了特征表示在 DTA 预测中的充分应用,并严重影响了模型对未知药物发现的泛化能力。

结果

为了解决这些问题,我们提出了一种新的面向未知药物发现的 DTA 预测模型 GeneralizedDTA,该模型通过结合预训练和多任务学习来实现。我们引入了自监督的蛋白质和药物预训练任务,从蛋白质的氨基酸序列和药物化合物的分子图中学习更丰富的结构信息,以减轻基于深度神经网络的编码引起的高方差问题,并加速预测模型在小规模标记数据上的收敛。我们还开发了一种具有双重自适应机制的多任务学习框架,以缩小预训练和预测之间的任务差距,防止过拟合,并提高 DTA 预测模型在未知药物发现中的泛化能力。为了验证我们模型的有效性,我们构建了一个未知药物数据集来模拟未知药物发现的场景。与现有的 DTA 预测模型相比,实验结果表明,我们的模型在未知药物的 DTA 预测中具有更高的泛化能力。

结论

我们模型的优势主要归因于两种预训练任务和多任务学习框架,它们可以从大规模未标记数据中学习蛋白质和药物的更丰富的结构信息,然后将其有效地整合到下游预测任务中,从而在未知药物发现中获得高质量的 DTA 预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/62e5fbbd9e13/12859_2022_4905_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/e722c1364715/12859_2022_4905_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/69245b166cfd/12859_2022_4905_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/a74c17c11262/12859_2022_4905_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/28d019bc42d1/12859_2022_4905_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/b65f47b1fb60/12859_2022_4905_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/62e5fbbd9e13/12859_2022_4905_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/e722c1364715/12859_2022_4905_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/69245b166cfd/12859_2022_4905_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/a74c17c11262/12859_2022_4905_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/28d019bc42d1/12859_2022_4905_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/b65f47b1fb60/12859_2022_4905_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/564e/9450265/62e5fbbd9e13/12859_2022_4905_Fig6_HTML.jpg

相似文献

1
GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery.通用 DTA:结合预训练和多任务学习,预测未知药物发现的药物-靶标结合亲和力。
BMC Bioinformatics. 2022 Sep 7;23(1):367. doi: 10.1186/s12859-022-04905-6.
2
Drug-target affinity prediction with extended graph learning-convolutional networks.基于扩展图学习卷积网络的药物-靶标亲和力预测。
BMC Bioinformatics. 2024 Feb 16;25(1):75. doi: 10.1186/s12859-024-05698-6.
3
TC-DTA: Predicting Drug-Target Binding Affinity With Transformer and Convolutional Neural Networks.TC-DTA:基于 Transformer 和卷积神经网络的药物-靶标结合亲和力预测。
IEEE Trans Nanobioscience. 2024 Oct;23(4):572-578. doi: 10.1109/TNB.2024.3441590. Epub 2024 Oct 15.
4
MSGNN-DTA: Multi-Scale Topological Feature Fusion Based on Graph Neural Networks for Drug-Target Binding Affinity Prediction.MSGNN-DTA:基于图神经网络的多尺度拓扑特征融合的药物-靶标结合亲和力预测
Int J Mol Sci. 2023 May 5;24(9):8326. doi: 10.3390/ijms24098326.
5
ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding.ELECTRA-DTA:一种基于上下文序列编码的新型化合物-蛋白质结合亲和力预测模型。
J Cheminform. 2022 Mar 15;14(1):14. doi: 10.1186/s13321-022-00591-x.
6
MultiscaleDTA: A multiscale-based method with a self-attention mechanism for drug-target binding affinity prediction.多尺度 DTA:一种基于多尺度的方法,具有自注意力机制,用于预测药物-靶标结合亲和力。
Methods. 2022 Nov;207:103-109. doi: 10.1016/j.ymeth.2022.09.006. Epub 2022 Sep 23.
7
MMD-DTA: A Multi-Modal Deep Learning Framework for Drug-Target Binding Affinity and Binding Region Prediction.MMD-DTA:一种用于药物-靶点结合亲和力和结合区域预测的多模态深度学习框架。
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):2200-2211. doi: 10.1109/TCBB.2024.3451985. Epub 2024 Dec 10.
8
Predicting Drug-Target Affinity by Learning Protein Knowledge From Biological Networks.从生物网络中学习蛋白质知识预测药物-靶标亲和力。
IEEE J Biomed Health Inform. 2023 Apr;27(4):2128-2137. doi: 10.1109/JBHI.2023.3240305.
9
Breaking the barriers of data scarcity in drug-target affinity prediction.打破药物靶点亲和力预测中数据稀缺的障碍。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad386.
10
BiComp-DTA: Drug-target binding affinity prediction through complementary biological-related and compression-based featurization approach.BiComp-DTA:基于互补生物相关和压缩特征化方法的药物-靶标结合亲和力预测。
PLoS Comput Biol. 2023 Mar 31;19(3):e1011036. doi: 10.1371/journal.pcbi.1011036. eCollection 2023 Mar.

引用本文的文献

1
Digital Alchemy: The Rise of Machine and Deep Learning in Small-Molecule Drug Discovery.数字炼金术:小分子药物发现中机器学习与深度学习的兴起
Int J Mol Sci. 2025 Jul 16;26(14):6807. doi: 10.3390/ijms26146807.
2
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
3
Conotoxins: Classification, Prediction, and Future Directions in Bioinformatics.

本文引用的文献

1
Effective drug-target interaction prediction with mutual interaction neural network.基于相互作用神经网络的有效药物-靶标相互作用预测。
Bioinformatics. 2022 Jul 11;38(14):3582-3589. doi: 10.1093/bioinformatics/btac377.
2
DeepNC: a framework for drug-target interaction prediction with graph neural networks.DeepNC:基于图神经网络的药物-靶标相互作用预测框架。
PeerJ. 2022 May 11;10:e13163. doi: 10.7717/peerj.13163. eCollection 2022.
3
MGraphDTA: deep multiscale graph neural network for explainable drug-target binding affinity prediction.
芋螺毒素:生物信息学中的分类、预测及未来方向
Toxins (Basel). 2025 Feb 9;17(2):78. doi: 10.3390/toxins17020078.
4
A 4D tensor-enhanced multi-dimensional convolutional neural network for accurate prediction of protein-ligand binding affinity.一种用于准确预测蛋白质-配体结合亲和力的4D张量增强多维卷积神经网络。
Mol Divers. 2024 Dec 23. doi: 10.1007/s11030-024-11044-y.
5
MDRepo-an open data warehouse for community-contributed molecular dynamics simulations of proteins.MDRepo——一个用于社区贡献的蛋白质分子动力学模拟的开放数据仓库。
Nucleic Acids Res. 2025 Jan 6;53(D1):D477-D486. doi: 10.1093/nar/gkae1109.
6
AI approaches for the discovery and validation of drug targets.用于药物靶点发现与验证的人工智能方法。
Camb Prism Precis Med. 2024 May 24;2:e7. doi: 10.1017/pcm.2024.4. eCollection 2024.
7
A comprehensive review of the recent advances on predicting drug-target affinity based on deep learning.基于深度学习预测药物-靶点亲和力的最新进展综述
Front Pharmacol. 2024 Apr 2;15:1375522. doi: 10.3389/fphar.2024.1375522. eCollection 2024.
8
Drug-target affinity prediction with extended graph learning-convolutional networks.基于扩展图学习卷积网络的药物-靶标亲和力预测。
BMC Bioinformatics. 2024 Feb 16;25(1):75. doi: 10.1186/s12859-024-05698-6.
9
SubMDTA: drug target affinity prediction based on substructure extraction and multi-scale features.SubMDTA:基于子结构提取和多尺度特征的药物靶点亲和力预测。
BMC Bioinformatics. 2023 Sep 7;24(1):334. doi: 10.1186/s12859-023-05460-4.
MGraphDTA:用于可解释药物-靶点结合亲和力预测的深度多尺度图神经网络
Chem Sci. 2022 Jan 5;13(3):816-833. doi: 10.1039/d1sc05180f. eCollection 2022 Jan 19.
4
FusionDTA: attention-based feature polymerizer and knowledge distillation for drug-target binding affinity prediction.FusionDTA:基于注意力的特征聚合器和知识蒸馏在药物-靶标结合亲和力预测中的应用。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab506.
5
SAG-DTA: Prediction of Drug-Target Affinity Using Self-Attention Graph Network.SAG-DTA:基于自注意力图网络的药物-靶标亲和力预测
Int J Mol Sci. 2021 Aug 20;22(16):8993. doi: 10.3390/ijms22168993.
6
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
7
Deep drug-target binding affinity prediction with multiple attention blocks.基于多注意力块的深度药物-靶标结合亲和力预测。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab117.
8
MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization.基于 MSA-Regularized 蛋白质序列转换器的全基因组化学蛋白质相互作用预测:在 GPCRome 去孤儿化中的应用。
J Chem Inf Model. 2021 Apr 26;61(4):1570-1582. doi: 10.1021/acs.jcim.0c01285. Epub 2021 Mar 23.
9
GraphDTA: predicting drug-target binding affinity with graph neural networks.GraphDTA:基于图神经网络的药物-靶标结合亲和力预测。
Bioinformatics. 2021 May 23;37(8):1140-1147. doi: 10.1093/bioinformatics/btaa921.
10
A learning-based method for drug-target interaction prediction based on feature representation learning and deep neural network.基于特征表示学习和深度神经网络的药物-靶标相互作用预测的学习方法。
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):394. doi: 10.1186/s12859-020-03677-1.