Suppr超能文献

利用蛋白质语言模型和蛋白质网络特征改进蛋白质-蛋白质相互作用预测。

Improving protein-protein interaction prediction using protein language model and protein network features.

机构信息

College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China.

College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China.

出版信息

Anal Biochem. 2024 Oct;693:115550. doi: 10.1016/j.ab.2024.115550. Epub 2024 Apr 26.

Abstract

Interactions between proteins are ubiquitous in a wide variety of biological processes. Accurately identifying the protein-protein interaction (PPI) is of significant importance for understanding the mechanisms of protein functions and facilitating drug discovery. Although the wet-lab technological methods are the best way to identify PPI, their major constraints are their time-consuming nature, high cost, and labor-intensiveness. Hence, lots of efforts have been made towards developing computational methods to improve the performance of PPI prediction. In this study, we propose a novel hybrid computational method (called KSGPPI) that aims at improving the prediction performance of PPI via extracting the discriminative information from protein sequences and interaction networks. The KSGPPI model comprises two feature extraction modules. In the first feature extraction module, a large protein language model, ESM-2, is employed to exploit the global complex patterns concealed within protein sequences. Subsequently, feature representations are further extracted through CKSAAP, and a two-dimensional convolutional neural network (CNN) is utilized to capture local information. In the second feature extraction module, the query protein acquires its similar protein from the STRING database via the sequence alignment tool NW-align and then captures the graph embedding feature for the query protein in the protein interaction network of the similar protein using the algorithm of Node2vec. Finally, the features of these two feature extraction modules are efficiently fused; the fused features are then fed into the multilayer perceptron to predict PPI. The results of five-fold cross-validation on the used benchmarked datasets demonstrate that KSGPPI achieves an average prediction accuracy of 88.96 %. Additionally, the average Matthews correlation coefficient value (0.781) of KSGPPI is significantly higher than that of those state-of-the-art PPI prediction methods. The standalone package of KSGPPI is freely downloaded at https://github.com/rickleezhe/KSGPPI.

摘要

蛋白质之间的相互作用在广泛的生物过程中无处不在。准确识别蛋白质-蛋白质相互作用(PPI)对于理解蛋白质功能的机制和促进药物发现具有重要意义。虽然湿实验室技术方法是识别 PPI 的最佳方法,但它们的主要限制是耗时、成本高和劳动强度大。因此,人们已经做出了大量努力来开发计算方法以提高 PPI 预测的性能。在这项研究中,我们提出了一种新的混合计算方法(称为 KSGPPI),旨在通过从蛋白质序列和相互作用网络中提取有区别的信息来提高 PPI 的预测性能。KSGPPI 模型包括两个特征提取模块。在第一个特征提取模块中,使用大型蛋白质语言模型 ESM-2 来利用隐藏在蛋白质序列中的全局复杂模式。随后,通过 CKSAAP 进一步提取特征表示,并使用二维卷积神经网络(CNN)捕获局部信息。在第二个特征提取模块中,查询蛋白质通过序列比对工具 NW-align 从 STRING 数据库中获取其相似蛋白质,然后使用 Node2vec 算法捕获相似蛋白质的蛋白质相互作用网络中查询蛋白质的图嵌入特征。最后,有效地融合这两个特征提取模块的特征;融合后的特征被送入多层感知机中以预测 PPI。在使用的基准数据集上进行的五重交叉验证的结果表明,KSGPPI 的平均预测准确率为 88.96%。此外,KSGPPI 的平均马修斯相关系数值(0.781)明显高于那些最先进的 PPI 预测方法。KSGPPI 的独立软件包可在 https://github.com/rickleezhe/KSGPPI 上免费下载。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验