Suppr超能文献

用于药物-蛋白质相互作用预测的结构感知多模态深度学习

Structure-Aware Multimodal Deep Learning for Drug-Protein Interaction Prediction.

作者信息

Wang Penglei, Zheng Shuangjia, Jiang Yize, Li Chengtao, Liu Junhong, Wen Chang, Patronov Atanas, Qian Dahong, Chen Hongming, Yang Yuedong

机构信息

School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.

School of Data and Computer Science, Sun Yat-Sen Universit, Guangzhou 510275, China.

出版信息

J Chem Inf Model. 2022 Mar 14;62(5):1308-1317. doi: 10.1021/acs.jcim.2c00060. Epub 2022 Feb 24.

Abstract

Identifying drug-protein interactions (DPIs) is crucial in drug discovery, and a number of machine learning methods have been developed to predict DPIs. Existing methods usually use unrealistic data sets with hidden bias, which will limit the accuracy of virtual screening methods. Meanwhile, most DPI prediction methods pay more attention to molecular representation but lack effective research on protein representation and high-level associations between different instances. To this end, we present the novel structure-aware multimodal deep DPI prediction model, STAMP-DPI, which was trained on a curated industry-scale benchmark data set. We built a high-quality benchmark data set named GalaxyDB for DPI prediction. This industry-scale data set along with an unbiased training procedure resulted in a more robust benchmark study. For informative protein representation, we constructed a structure-aware graph neural network method from the protein sequence by combining predicted contact maps and graph neural networks. Through further integration of structure-based representation and high-level pretrained embeddings for molecules and proteins, our model effectively captures the feature representation of the interactions between them. As a result, STAMP-DPI outperformed state-of-the-art DPI prediction methods by decreasing 7.00% mean square error (MSE) in the Davis data set and improving 8.89% area under the curve (AUC) in the GalaxyDB data set. Moreover, our model is an interpretable model with the transformer-based interaction mechanism, which can accurately reveal the binding sites between molecules and proteins.

摘要

识别药物-蛋白质相互作用(DPI)在药物发现中至关重要,并且已经开发了许多机器学习方法来预测DPI。现有方法通常使用具有隐藏偏差的不切实际的数据集,这将限制虚拟筛选方法的准确性。同时,大多数DPI预测方法更关注分子表示,但缺乏对蛋白质表示以及不同实例之间高级关联的有效研究。为此,我们提出了一种新颖的结构感知多模态深度DPI预测模型STAMP-DPI,该模型在经过整理的行业规模基准数据集上进行训练。我们构建了一个名为GalaxyDB的高质量基准数据集用于DPI预测。这个行业规模的数据集以及无偏训练过程带来了更稳健的基准研究。为了获得信息丰富的蛋白质表示,我们通过结合预测的接触图和图神经网络,从蛋白质序列构建了一种结构感知图神经网络方法。通过进一步整合基于结构的表示以及分子和蛋白质的高级预训练嵌入,我们的模型有效地捕获了它们之间相互作用的特征表示。结果,STAMP-DPI在戴维斯数据集中将均方误差(MSE)降低了7.00%,在GalaxyDB数据集中将曲线下面积(AUC)提高了8.89%,优于现有的DPI预测方法。此外,我们的模型是一种具有基于Transformer的相互作用机制的可解释模型,能够准确揭示分子和蛋白质之间的结合位点。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验