Li Feifei, Zhu Fei, Ling Xinghong, Liu Quan
School of Computer Science and Technology, Soochow University, Suzhou, China.
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, China.
Front Bioeng Biotechnol. 2020 May 5;8:390. doi: 10.3389/fbioe.2020.00390. eCollection 2020.
Protein interactions play an essential role in studying living systems and life phenomena. A considerable amount of literature has been published on analyzing and predicting protein interactions, such as support vector machine method, homology-based method and similarity-based method, each has its pros and cons. Most existing methods for predicting protein interactions require prior domain knowledge, making it difficult to effectively extract protein features. Single method is dissatisfactory in predicting protein interactions, declaring the need for a comprehensive method that combines the advantages of various methods. On this basis, a deep ensemble learning method called EnAmDNN (Ensemble Deep Neural Networks with Attention Mechanism) is proposed to predict protein interactions which is an appropriate candidate for comprehensive learning, combining multiple models, and considering the advantages of various methods. Particularly, it encode protein sequences by the local descriptor, auto covariance, conjoint triad, pseudo amino acid composition and combine the vector representation of each protein in the protein interaction network. Then it takes advantage of the multi-layer convolutional neural networks to automatically extract protein features and construct an attention mechanism to analyze deep-seated relationships between proteins. We set up four different structures of deep learning models. In the ensemble learning model, second layer data sets are generated with five-fold cross validation from basic learners, then predict the protein interaction network by combining 16 models. Results on five independent PPI data sets demonstrate that EnAmDNN achieves superior prediction performance than other comparing methods.
蛋白质相互作用在研究生命系统和生命现象中起着至关重要的作用。关于分析和预测蛋白质相互作用已经发表了大量文献,例如支持向量机方法、基于同源性的方法和基于相似性的方法,每种方法都有其优缺点。大多数现有的预测蛋白质相互作用的方法都需要先验领域知识,这使得难以有效地提取蛋白质特征。单一方法在预测蛋白质相互作用方面并不令人满意,这表明需要一种综合方法来结合各种方法的优点。在此基础上,提出了一种名为EnAmDNN(带有注意力机制的集成深度神经网络)的深度集成学习方法来预测蛋白质相互作用,它是综合学习、结合多个模型并考虑各种方法优点的合适候选方法。特别是,它通过局部描述符、自协方差、三联体组合、伪氨基酸组成对蛋白质序列进行编码,并结合蛋白质相互作用网络中每个蛋白质的向量表示。然后利用多层卷积神经网络自动提取蛋白质特征,并构建注意力机制来分析蛋白质之间的深层次关系。我们设置了四种不同结构的深度学习模型。在集成学习模型中,通过对基础学习器进行五折交叉验证生成第二层数据集,然后结合16个模型预测蛋白质相互作用网络。在五个独立的蛋白质-蛋白质相互作用(PPI)数据集上的结果表明,EnAmDNN比其他比较方法具有更优的预测性能。