Teimouri Hamid, Medvedeva Angela, Kolomeisky Anatoly B
Department of Chemistry, Rice University, Houston, Texas, United States.
Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States.
bioRxiv. 2024 Mar 1:2024.02.27.582345. doi: 10.1101/2024.02.27.582345.
The ability to accurately predict protein-protein interactions is critically important for our understanding of major cellular processes. However, current experimental and computational approaches for identifying them are technically very challenging and still have limited success. We propose a new computational method for predicting protein-protein interactions using only primary sequence information. It utilizes a concept of physical-chemical similarity to determine which interactions will most probably occur. In our approach, the physical-chemical features of protein are extracted using bioinformatics tools for different organisms, and then they are utilized in a machine-learning method to identify successful protein-protein interactions via correlation analysis. It is found that the most important property that correlates most with the protein-protein interactions for all studied organisms is dipeptide amino acid compositions. The analysis is specifically applied to the bacterial two-component system that includes histidine kinase and transcriptional response regulators. Our theoretical approach provides a simple and robust method for quantifying the important details of complex mechanisms of biological processes.
准确预测蛋白质-蛋白质相互作用的能力对于我们理解主要细胞过程至关重要。然而,目前用于识别它们的实验和计算方法在技术上极具挑战性,且成功率仍然有限。我们提出了一种仅使用一级序列信息来预测蛋白质-蛋白质相互作用的新计算方法。它利用物理化学相似性的概念来确定最可能发生的相互作用。在我们的方法中,使用生物信息学工具为不同生物体提取蛋白质的物理化学特征,然后将其用于机器学习方法,通过相关性分析来识别成功的蛋白质-蛋白质相互作用。结果发现,对于所有研究的生物体,与蛋白质-蛋白质相互作用相关性最高的最重要属性是二肽氨基酸组成。该分析特别应用于包括组氨酸激酶和转录反应调节因子的细菌双组分系统。我们的理论方法为量化生物过程复杂机制的重要细节提供了一种简单而稳健的方法。