具有预训练蛋白质嵌入的贝叶斯神经网络提高了药物-蛋白质相互作用的预测准确性。

Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction.

作者信息

Kim QHwan, Ko Joon-Hyuk, Kim Sunghoon, Park Nojun, Jhe Wonho

机构信息

Department of Physics and Astronomy, Institute of Applied Physics, Seoul National University, Gwanak-gu, Seoul 08826, Republic of Korea.

出版信息

Bioinformatics. 2021 Oct 25;37(20):3428-3435. doi: 10.1093/bioinformatics/btab346.

DOI:10.1093/bioinformatics/btab346

PMID:33978713

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8545317/

Abstract

MOTIVATION

Characterizing drug-protein interactions (DPIs) is crucial to the high-throughput screening for drug discovery. The deep learning-based approaches have attracted attention because they can predict DPIs without human trial and error. However, because data labeling requires significant resources, the available protein data size is relatively small, which consequently decreases model performance. Here, we propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset.

RESULTS

At first, we use transfer learning in encoding protein sequences with a pretrained model, which trains general sequence representations in an unsupervised manner. Second, we use a Bayesian neural network to make a robust model by estimating the data uncertainty. Our resulting model performs better than the previous baselines at predicting interactions between molecules and proteins. We also show that the quantified uncertainty from the Bayesian inference is related to confidence and can be used for screening DPI data points.

AVAILABILITY AND IMPLEMENTATION

The code is available at https://github.com/QHwan/PretrainDPI.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

表征药物-蛋白质相互作用（DPI）对于药物发现的高通量筛选至关重要。基于深度学习的方法因其能够在无需人为反复试验的情况下预测DPI而受到关注。然而，由于数据标注需要大量资源，可用的蛋白质数据规模相对较小，这进而降低了模型性能。在此，我们提出两种方法来构建一个深度学习框架，该框架在小标注数据集上表现出卓越性能。

结果

首先，我们使用迁移学习，通过预训练模型对蛋白质序列进行编码，该模型以无监督方式训练通用序列表示。其次，我们使用贝叶斯神经网络，通过估计数据不确定性来构建一个稳健的模型。我们得到的模型在预测分子与蛋白质之间的相互作用方面比之前的数据基线表现更好。我们还表明，贝叶斯推理得出的量化不确定性与置信度相关，可用于筛选DPI数据点。

可用性与实现

代码可在https://github.com/QHwan/PretrainDPI获取。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f7f/8545317/400d64663984/btab346f1.jpg

相似文献

Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction.具有预训练蛋白质嵌入的贝叶斯神经网络提高了药物-蛋白质相互作用的预测准确性。

Bioinformatics. 2021 Oct 25;37(20):3428-3435. doi: 10.1093/bioinformatics/btab346.

Drug-Protein interaction prediction by correcting the effect of incomplete information in heterogeneous information.通过纠正异质信息中不完整信息的影响来预测药物-蛋白质相互作用。

Bioinformatics. 2022 Nov 15;38(22):5073-5080. doi: 10.1093/bioinformatics/btac629.

Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences.基于图和序列神经网络端到端学习的化合物-蛋白质相互作用预测。

Bioinformatics. 2019 Jan 15;35(2):309-318. doi: 10.1093/bioinformatics/bty535.

DeepAffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks.DeepAffinity：通过统一的递归和卷积神经网络实现化合物-蛋白质亲和力的可解释深度学习。

Bioinformatics. 2019 Sep 15;35(18):3329-3338. doi: 10.1093/bioinformatics/btz111.

BridgeDPI: a novel Graph Neural Network for predicting drug-protein interactions.BridgeDPI：一种用于预测药物-蛋白质相互作用的新型图神经网络。

Bioinformatics. 2022 Apr 28;38(9):2571-2578. doi: 10.1093/bioinformatics/btac155.

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.无监督蛋白质嵌入在预测分子功能方面优于手工制作的序列和结构特征。

Bioinformatics. 2021 Apr 19;37(2):162-170. doi: 10.1093/bioinformatics/btaa701.

HyperAttentionDTI: improving drug-protein interaction prediction by sequence-based deep learning with attention mechanism.超注意力 DTI：基于注意力机制的序列深度学习提高药物-蛋白相互作用预测

Bioinformatics. 2022 Jan 12;38(3):655-662. doi: 10.1093/bioinformatics/btab715.

MDeePred: novel multi-channel protein featurization for deep learning-based binding affinity prediction in drug discovery.MDeePred：用于药物发现中基于深度学习的结合亲和力预测的新型多通道蛋白质特征化。

Bioinformatics. 2021 May 5;37(5):693-704. doi: 10.1093/bioinformatics/btaa858.

Structure-Aware Multimodal Deep Learning for Drug-Protein Interaction Prediction.用于药物-蛋白质相互作用预测的结构感知多模态深度学习

J Chem Inf Model. 2022 Mar 14;62(5):1308-1317. doi: 10.1021/acs.jcim.2c00060. Epub 2022 Feb 24.

Powerful molecule generation with simple ConvNet.用简单的卷积神经网络生成强大的分子。

Bioinformatics. 2022 Jun 27;38(13):3438-3443. doi: 10.1093/bioinformatics/btac332.

引用本文的文献

AMCF-RDP: a self-attention-based multi-source and cascade framework for the identification of drug-protein relationships.AMCF-RDP：一种基于自注意力机制的多源级联框架，用于识别药物-蛋白质关系。

Mol Divers. 2025 Aug 27. doi: 10.1007/s11030-025-11337-w.

CPI-MIF: Compound-Protein Interaction Prediction with Multiview Information Fusion.CPI-MIF：基于多视图信息融合的复合蛋白相互作用预测

ACS Omega. 2025 Jul 13;10(28):30155-30166. doi: 10.1021/acsomega.5c00113. eCollection 2025 Jul 22.

Negative sampling strategies impact the prediction of scale-free biomolecular network interactions with machine learning.负采样策略会影响利用机器学习对无标度生物分子网络相互作用的预测。

BMC Biol. 2025 May 9;23(1):123. doi: 10.1186/s12915-025-02231-w.

Achieving well-informed decision-making in drug discovery: a comprehensive calibration study using neural network-based structure-activity models.在药物发现中实现明智的决策：一项使用基于神经网络的构效模型的全面校准研究。

J Cheminform. 2025 Mar 5;17(1):29. doi: 10.1186/s13321-025-00964-y.

ET-PROTACs: modeling ternary complex interactions using cross-modal learning and ternary attention for accurate PROTAC-induced degradation prediction.ET-PROTACs：使用跨模态学习和三元注意力对三元复合物相互作用进行建模，以实现准确的PROTAC诱导降解预测。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae654.

Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset.大规模药物-靶点相互作用预测：Komet 算法与 LCIdb 数据集。

J Chem Inf Model. 2024 Sep 23;64(18):6938-6956. doi: 10.1021/acs.jcim.4c00422. Epub 2024 Sep 5.

Reducing overconfident errors in molecular property classification using Posterior Network.使用后验网络减少分子性质分类中的过度自信错误。

Patterns (N Y). 2024 May 8;5(6):100991. doi: 10.1016/j.patter.2024.100991. eCollection 2024 Jun 14.

Pmf-cpi: assessing drug selectivity with a pretrained multi-functional model for compound-protein interactions.Pmf-cpi：使用预训练的多功能化合物-蛋白质相互作用模型评估药物选择性。

J Cheminform. 2023 Oct 14;15(1):97. doi: 10.1186/s13321-023-00767-z.

Sequence-based drug design as a concept in computational drug design.基于序列的药物设计作为计算药物设计中的一个概念。

Nat Commun. 2023 Jul 14;14(1):4217. doi: 10.1038/s41467-023-39856-w.

Self- and cross-attention accurately predicts metabolite-protein interactions.自注意力机制和交叉注意力机制能够准确预测代谢物与蛋白质的相互作用。

NAR Genom Bioinform. 2023 Jan 31;5(1):lqad008. doi: 10.1093/nargab/lqad008. eCollection 2023 Mar.

本文引用的文献

Drug-target affinity prediction using graph neural network and contact maps.使用图神经网络和接触图进行药物-靶点亲和力预测。

RSC Adv. 2020 Jun 1;10(35):20701-20712. doi: 10.1039/d0ra02297g. eCollection 2020 May 27.

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.

GCIceNet: a graph convolutional network for accurate classification of water phases.GCIceNet：一种用于水相准确分类的图卷积网络。

Phys Chem Chem Phys. 2020 Nov 25;22(45):26340-26350. doi: 10.1039/d0cp03456h.

Bioinformatics. 2021 Apr 19;37(2):162-170. doi: 10.1093/bioinformatics/btaa701.

Ongoing Clinical Trials for the Management of the COVID-19 Pandemic.正在进行的 COVID-19 大流行管理临床试验。

Trends Pharmacol Sci. 2020 Jun;41(6):363-382. doi: 10.1016/j.tips.2020.03.006. Epub 2020 Apr 9.

Lopinavir-ritonavir in severe COVID-19.洛匹那韦-利托那韦治疗重症新型冠状病毒肺炎

Nat Med. 2020 Apr;26(4):465. doi: 10.1038/s41591-020-0849-9.

The FDA-approved drug ivermectin inhibits the replication of SARS-CoV-2 in vitro.美国食品药品监督管理局批准的药物伊维菌素可抑制 SARS-CoV-2 的体外复制。

Antiviral Res. 2020 Jun;178:104787. doi: 10.1016/j.antiviral.2020.104787. Epub 2020 Apr 3.

Baricitinib for COVID-19: a suitable treatment?巴瑞替尼用于治疗新冠肺炎：一种合适的治疗方法？

Lancet Infect Dis. 2020 Sep;20(9):1012-1013. doi: 10.1016/S1473-3099(20)30262-0. Epub 2020 Apr 3.

COVID-19 and chronological aging: senolytics and other anti-aging drugs for the treatment or prevention of corona virus infection?2019冠状病毒病与自然衰老：衰老细胞溶解药物及其他抗衰老药物能否用于治疗或预防冠状病毒感染？

Aging (Albany NY). 2020 Mar 30;12(8):6511-6517. doi: 10.18632/aging.103001.

Ribavirin, Remdesivir, Sofosbuvir, Galidesivir, and Tenofovir against SARS-CoV-2 RNA dependent RNA polymerase (RdRp): A molecular docking study.利巴韦林、瑞德西韦、索非布韦、加洛韦和替诺福韦对 SARS-CoV-2 RNA 依赖的 RNA 聚合酶（RdRp）的抑制作用：一项分子对接研究。

Life Sci. 2020 Jul 15;253:117592. doi: 10.1016/j.lfs.2020.117592. Epub 2020 Mar 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

具有预训练蛋白质嵌入的贝叶斯神经网络提高了药物-蛋白质相互作用的预测准确性。

Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性与实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献