Lin Peicong, Yan Yumeng, Huang Sheng-You
School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China.
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac499.
Protein-protein interactions play an important role in many biological processes. However, although structure prediction for monomer proteins has achieved great progress with the advent of advanced deep learning algorithms like AlphaFold, the structure prediction for protein-protein complexes remains an open question. Taking advantage of the Transformer model of ESM-MSA, we have developed a deep learning-based model, named DeepHomo2.0, to predict protein-protein interactions of homodimeric complexes by leveraging the direct-coupling analysis (DCA) and Transformer features of sequences and the structure features of monomers. DeepHomo2.0 was extensively evaluated on diverse test sets and compared with eight state-of-the-art methods including protein language model-based, DCA-based and machine learning-based methods. It was shown that DeepHomo2.0 achieved a high precision of >70% with experimental monomer structures and >60% with predicted monomer structures for the top 10 predicted contacts on the test sets and outperformed the other eight methods. Moreover, even the version without using structure information, named DeepHomoSeq, still achieved a good precision of >55% for the top 10 predicted contacts. Integrating the predicted contacts into protein docking significantly improved the structure prediction of realistic Critical Assessment of Protein Structure Prediction homodimeric complexes. DeepHomo2.0 and DeepHomoSeq are available at http://huanglab.phys.hust.edu.cn/DeepHomo2/.
蛋白质-蛋白质相互作用在许多生物过程中发挥着重要作用。然而,尽管随着AlphaFold等先进深度学习算法的出现,单体蛋白质的结构预测取得了巨大进展,但蛋白质-蛋白质复合物的结构预测仍然是一个悬而未决的问题。利用ESM-MSA的Transformer模型,我们开发了一种基于深度学习的模型,名为DeepHomo2.0,通过利用序列的直接耦合分析(DCA)和Transformer特征以及单体的结构特征来预测同二聚体复合物的蛋白质-蛋白质相互作用。我们在不同的测试集上对DeepHomo2.0进行了广泛评估,并与包括基于蛋白质语言模型、基于DCA和基于机器学习的方法在内的八种最先进方法进行了比较。结果表明,对于测试集上前10个预测接触,DeepHomo2.0在使用实验单体结构时达到了>70%的高精度,在使用预测单体结构时达到了>60%的高精度,并且优于其他八种方法。此外,即使是不使用结构信息的版本,即DeepHomoSeq,对于前10个预测接触仍实现了>55%的良好精度。将预测的接触整合到蛋白质对接中显著改善了实际蛋白质结构预测关键评估同二聚体复合物的结构预测。可通过http://huanglab.phys.hust.edu.cn/DeepHomo2/获取DeepHomo2.0和DeepHomoSeq。