Institute of Technical Biology & Agriculture Engineering, Chinese Academy of Sciences, Science Island, HeFei City, AnHui Province 230031, China; Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Science Island, HeFei City, AnHui Province 230031, China; University of Science and Technology of China, Hefei City, Anhui Province 230026, China.
Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Science Island, HeFei City, AnHui Province 230031, China.
Math Biosci. 2019 Jul;313:41-47. doi: 10.1016/j.mbs.2019.04.002. Epub 2019 Apr 25.
Protein-protein interactions (PPIs) play a crucial role in the life-sustaining activities of organisms. Although various methods for the prediction of PPIs have been developed in the past decades, their robustness and prediction accuracy need to be improved. Therefore, it is necessary to develop an effective and accurate method to predict PPIs. Aiming at making sure that PPIs can be predicted effectively, in this paper, we propose a new sequence-based approach based on deep neural network (DNN) and conjoint triad auto covariance (CTAC) to improve the effectiveness of predicting PPIs. The coding method of CTAC combines the advantages of conjoint triad and auto covariance. Therefore, the CTAC can obtain more PPIs information from the amino acid sequence. The model of DNNCTAC achieved an accuracy of 98.37%, recall of 99.41%, area under the curve (AUC) of 99.24% and loss of 22.7%, respectively, on human dataset. These results indicate that DNNCTAC can enhance the predictive power of PPIs and can significantly enhance the accuracy of the prediction. And, it has proved to be a useful complement to future proteomics research. The source codes and all datasets are available at https://github.com/smalltalkman/hppi-tensorflow.
蛋白质-蛋白质相互作用(PPIs)在生物体的生命维持活动中起着至关重要的作用。尽管过去几十年已经开发出了各种预测 PPIs 的方法,但它们的稳健性和预测准确性仍需要提高。因此,有必要开发一种有效且准确的方法来预测 PPIs。为了确保能够有效地预测 PPIs,在本文中,我们提出了一种新的基于深度神经网络(DNN)和联合三联体自协方差(CTAC)的基于序列的方法,以提高预测 PPIs 的有效性。CTAC 的编码方法结合了联合三联体和自协方差的优点。因此,CTAC 可以从氨基酸序列中获得更多的 PPIs 信息。在人类数据集上,DNNCTAC 模型的准确率为 98.37%,召回率为 99.41%,曲线下面积(AUC)为 99.24%,损失为 22.7%。这些结果表明,DNNCTAC 可以增强 PPIs 的预测能力,并显著提高预测的准确性。并且,它已经被证明是未来蛋白质组学研究的有用补充。源代码和所有数据集都可以在 https://github.com/smalltalkman/hppi-tensorflow 上获得。