Zhou Xiaozhou, Song Haoyu, Li Jingyuan
Zhejiang Province Key Laboratory of Quantum Technology and Device, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, Zhejiang, China.
J Phys Chem B. 2022 Mar 3;126(8):1719-1727. doi: 10.1021/acs.jpcb.1c10525. Epub 2022 Feb 16.
The study of protein-protein interactions (PPIs) is important in understanding the function of proteins. However, it is still a challenge to investigate the transient protein-protein interaction by experiments. Hence, the computational prediction for protein-protein interactions draws growing attention. Statistics-based features have been widely used in the studies of protein structure prediction and protein folding. Due to the scarcity of experimental data of PPI, it is difficult to construct a conventional statistical feature for PPI prediction, and the application of statistics-based features is very limited in this field. In this paper, we explored the application of frustration, a statistical potential, in PPI prediction. By comparing the energetic contribution of the extra stabilization energy from a given residue pair in the native protein with the statistics of the energies, we obtained the residue pair's frustration index. By calculating the number of residue pairs with a high frustration index, the highly frustrated density, a residue-frustration-based feature, was then obtained to describe the tendency of residues to be involved in PPI. Highly frustrated density, as well as structure-based features, were then used to describe protein residues and combined with the long short-term memory (LSTM) neural network to predict PPI residue pairs. Our model correctly predicted 75% dimers when only the top 2‰ residue pairs were selected in each dimer. Our model, which considers the statistics-based features, is significantly different from the models based on the chemical features of residues. We found that frustration can effectively describe the tendency of residue to be involved in PPI. Frustration-based features can replace chemical features to combine with machine learning and realize the better performance of PPI prediction. It reveals the great potential of statistical potential such as frustration in PPI prediction.
蛋白质-蛋白质相互作用(PPI)的研究对于理解蛋白质功能至关重要。然而,通过实验研究瞬时蛋白质-蛋白质相互作用仍然是一项挑战。因此,蛋白质-蛋白质相互作用的计算预测越来越受到关注。基于统计的特征已广泛应用于蛋白质结构预测和蛋白质折叠研究中。由于PPI实验数据稀缺,难以构建用于PPI预测的传统统计特征,基于统计的特征在该领域的应用非常有限。在本文中,我们探索了统计势——失序,在PPI预测中的应用。通过比较天然蛋白质中给定残基对额外稳定能的能量贡献与能量统计数据,我们获得了残基对的失序指数。通过计算具有高失序指数的残基对数量,即高度失序密度,得到了一种基于残基失序的特征,用于描述残基参与PPI的倾向。然后,高度失序密度以及基于结构的特征被用于描述蛋白质残基,并与长短期记忆(LSTM)神经网络相结合来预测PPI残基对。当在每个二聚体中仅选择前2‰的残基对时,我们的模型正确预测了75%的二聚体。我们考虑基于统计特征的模型与基于残基化学特征的模型有显著不同。我们发现失序可以有效地描述残基参与PPI的倾向。基于失序的特征可以取代化学特征与机器学习相结合,实现更好的PPI预测性能。这揭示了失序等统计势在PPI预测中的巨大潜力。