School of Computer Science and Technology, Soochow University, Suzhou 215 006, China.
Department of Chemistry and Chemical Biology, McMaster University, Hamilton, Ontario L8S 4L8, Canada.
J Chem Inf Model. 2022 Jan 24;62(2):258-273. doi: 10.1021/acs.jcim.1c00982. Epub 2022 Jan 10.
Protein-protein interactions (PPIs) provide a physical basis of molecular communications for a wide range of biological processes in living cells. Establishing the PPI network has become a fundamental but essential task for a better understanding of biological events and disease pathogenesis. Although many machine learning algorithms have been employed to predict PPIs, with only protein sequence information as the training features, these models suffer from low robustness and prediction accuracy. In this study, a new deep-learning-based framework named the Structural Gated Attention Deep (SGAD) model was proposed to improve the performance of PPI network reconstruction (PINR). The improved predictive performances were achieved by augmenting multiple protein sequence descriptors, the topological features and information flow of the PPI network, which were further implemented with a gating mechanism to improve its robustness to noise. On 11 independent test data sets and one combined data set, SGAD yielded area under the curve values of approximately 0.83-0.93, outperforming other models. Furthermore, the SGAD ensemble can learn more characteristics information on protein pairs through a two-layer neural network, serving as a powerful tool in the exploration of PPI biological space.
蛋白质-蛋白质相互作用(PPIs)为活细胞中广泛的生物过程提供了分子通讯的物理基础。建立蛋白质相互作用网络已成为更好地理解生物事件和疾病发病机制的基本但必不可少的任务。尽管已经使用了许多机器学习算法来预测蛋白质相互作用,但是这些模型仅使用蛋白质序列信息作为训练特征,因此存在鲁棒性和预测准确性低的问题。在这项研究中,提出了一种名为结构门控注意力深度(SGAD)模型的新的基于深度学习的框架,以提高蛋白质相互作用网络重建(PINR)的性能。通过增加多个蛋白质序列描述符、蛋白质相互作用网络的拓扑特征和信息流,并进一步使用门控机制来提高其对噪声的鲁棒性,从而实现了改进的预测性能。在 11 个独立的测试数据集和一个组合数据集上,SGAD 的曲线下面积值约为 0.83-0.93,优于其他模型。此外,SGAD 集成可以通过两层神经网络学习蛋白质对的更多特征信息,是探索蛋白质相互作用生物空间的有力工具。