School of Basic Medical Sciences, Anhui Medical University, 81 Meishan Road, Shushan District, Hefei 230032, China.
State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, 38 Life Science Park, Changping District, Beijing 102206, China.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae274.
Protein-protein interactions (PPIs) are the basis of many important biological processes, with protein complexes being the key forms implementing these interactions. Understanding protein complexes and their functions is critical for elucidating mechanisms of life processes, disease diagnosis and treatment and drug development. However, experimental methods for identifying protein complexes have many limitations. Therefore, it is necessary to use computational methods to predict protein complexes. Protein sequences can indicate the structure and biological functions of proteins, while also determining their binding abilities with other proteins, influencing the formation of protein complexes. Integrating these characteristics to predict protein complexes is very promising, but currently there is no effective framework that can utilize both protein sequence and PPI network topology for complex prediction. To address this challenge, we have developed HyperGraphComplex, a method based on hypergraph variational autoencoder that can capture expressive features from protein sequences without feature engineering, while also considering topological properties in PPI networks, to predict protein complexes. Experiment results demonstrated that HyperGraphComplex achieves satisfactory predictive performance when compared with state-of-art methods. Further bioinformatics analysis shows that the predicted protein complexes have similar attributes to known ones. Moreover, case studies corroborated the remarkable predictive capability of our model in identifying protein complexes, including 3 that were not only experimentally validated by recent studies but also exhibited high-confidence structural predictions from AlphaFold-Multimer. We believe that the HyperGraphComplex algorithm and our provided proteome-wide high-confidence protein complex prediction dataset will help elucidate how proteins regulate cellular processes in the form of complexes, and facilitate disease diagnosis and treatment and drug development. Source codes are available at https://github.com/LiDlab/HyperGraphComplex.
蛋白质-蛋白质相互作用(PPIs)是许多重要生物过程的基础,而蛋白质复合物则是实现这些相互作用的关键形式。理解蛋白质复合物及其功能对于阐明生命过程的机制、疾病诊断和治疗以及药物开发至关重要。然而,鉴定蛋白质复合物的实验方法存在许多局限性。因此,有必要使用计算方法来预测蛋白质复合物。蛋白质序列可以指示蛋白质的结构和生物学功能,同时也决定了它们与其他蛋白质的结合能力,影响蛋白质复合物的形成。整合这些特性来预测蛋白质复合物非常有前途,但目前还没有有效的框架可以利用蛋白质序列和 PPI 网络拓扑结构来进行复合物预测。为了解决这个挑战,我们开发了 HyperGraphComplex,这是一种基于超图变分自动编码器的方法,可以从蛋白质序列中捕获表现力强的特征,而无需进行特征工程,同时还考虑了 PPI 网络中的拓扑性质,以预测蛋白质复合物。实验结果表明,与最先进的方法相比,HyperGraphComplex 实现了令人满意的预测性能。进一步的生物信息学分析表明,预测的蛋白质复合物与已知的蛋白质复合物具有相似的属性。此外,案例研究证实了我们的模型在识别蛋白质复合物方面的出色预测能力,包括 3 个不仅被最近的研究实验验证,而且还展示了来自 AlphaFold-Multimer 的高置信度结构预测。我们相信,HyperGraphComplex 算法和我们提供的全蛋白质组高置信度蛋白质复合物预测数据集将有助于阐明蛋白质如何以复合物的形式调节细胞过程,并促进疾病诊断和治疗以及药物开发。源代码可在 https://github.com/LiDlab/HyperGraphComplex 上获得。