Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China, 300071.
Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371.
J Chem Inf Model. 2022 Sep 12;62(17):3961-3969. doi: 10.1021/acs.jcim.2c00580. Epub 2022 Aug 30.
Protein-protein interactions (PPIs) are involved in almost all biological processes in the cell. Understanding protein-protein interactions holds the key for the understanding of biological functions, diseases and the development of therapeutics. Recently, artificial intelligence (AI) models have demonstrated great power in PPIs. However, a key issue for all AI-based PPI models is efficient molecular representations and featurization. Here, we propose -complex-based PPI representation, and -complex-based machine learning models for the prediction of PPI binding affinity changes upon mutation, for the first time. In our model, various complexes (, ) can be generated for the graph representation of protein-protein complex by using different graphs , which reveal -related inner connections within the graph representation of protein-protein complex. Further, for a specific graph , a series of nested complexes are generated to give a multiscale characterization of the PPIs. Its persistent homology and persistent Euler characteristic are used as molecular descriptors and further combined with the machine learning model, in particular, gradient boosting tree (GBT). We systematically test our model on the two most-commonly used data sets, that is, SKEMPI and AB-Bind. It has been found that our model outperforms all the existing models as far as we know, which demonstrates the great potential of our model for the analysis of PPIs. Our model can be used for the analysis and design of efficient antibodies for SARS-CoV-2.
蛋白质-蛋白质相互作用(PPIs)几乎参与了细胞中的所有生物过程。理解蛋白质-蛋白质相互作用是理解生物功能、疾病和治疗药物开发的关键。最近,人工智能(AI)模型在 PPIs 方面表现出了巨大的威力。然而,所有基于 AI 的 PPI 模型的一个关键问题是有效的分子表示和特征化。在这里,我们首次提出了基于 -complex 的 PPI 表示和基于 -complex 的机器学习模型,用于预测突变对 PPI 结合亲和力的影响。在我们的模型中,通过使用不同的图,可以为蛋白质-蛋白质复合物的图表示生成各种 -complex( , ),从而揭示了蛋白质-蛋白质复合物的图表示中的 -相关的内部连接。此外,对于特定的图 ,可以生成一系列嵌套的 complexes 来对 PPIs 进行多尺度表征。它的持久同调与持久欧拉特征被用作分子描述符,并与机器学习模型,特别是梯度提升树(GBT)相结合。我们在两个最常用的数据集中,即 SKEMPI 和 AB-Bind,对我们的模型进行了系统的测试。据我们所知,与所有现有的模型相比,我们的模型表现出了更好的性能,这证明了我们的模型在分析 PPIs 方面具有巨大的潜力。我们的模型可以用于分析和设计针对 SARS-CoV-2 的高效抗体。