School of Computer Science and Engineering, Central South University, Changsha, 410083, People's Republic of China.
Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada.
BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):506. doi: 10.1186/s12859-019-3076-y.
Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of biological networks. In addition, identifying essential proteins is an imbalanced learning problem; but few current shallow machine learning-based methods are designed to handle the imbalanced characteristics.
We develop DeepEP based on a deep learning framework that uses the node2vec technique, multi-scale convolutional neural networks and a sampling technique to identify essential proteins. In DeepEP, the node2vec technique is applied to automatically learn topological and semantic features for each protein in protein-protein interaction (PPI) network. Gene expression profiles are treated as images and multi-scale convolutional neural networks are applied to extract their patterns. In addition, DeepEP uses a sampling method to alleviate the imbalanced characteristics. The sampling method samples the same number of the majority and minority samples in a training epoch, which is not biased to any class in training process. The experimental results show that DeepEP outperforms traditional centrality methods. Moreover, DeepEP is better than shallow machine learning-based methods. Detailed analyses show that the dense vectors which are generated by node2vec technique contribute a lot to the improved performance. It is clear that the node2vec technique effectively captures the topological and semantic properties of PPI network. The sampling method also improves the performance of identifying essential proteins.
We demonstrate that DeepEP improves the prediction performance by integrating multiple deep learning techniques and a sampling method. DeepEP is more effective than existing methods.
必需蛋白对细胞生命至关重要,因此,鉴定必需蛋白是研究人员的一个重要课题和具有挑战性的问题。最近已经提出了许多计算方法来处理这个问题。然而,传统的中心性方法不能充分表示生物网络的拓扑特征。此外,鉴定必需蛋白是一个不平衡的学习问题,但目前很少有基于浅层机器学习的方法被设计来处理不平衡的特点。
我们基于深度学习框架开发了 DeepEP,该框架使用 node2vec 技术、多尺度卷积神经网络和采样技术来识别必需蛋白。在 DeepEP 中,node2vec 技术被应用于自动学习蛋白质-蛋白质相互作用 (PPI) 网络中每个蛋白质的拓扑和语义特征。基因表达谱被视为图像,多尺度卷积神经网络被应用于提取其模式。此外,DeepEP 使用了一种采样方法来缓解不平衡的特点。该采样方法在一个训练时期中对多数和少数样本进行相同数量的采样,在训练过程中不会偏向任何一个类别。实验结果表明,DeepEP 优于传统的中心性方法。此外,DeepEP 优于基于浅层机器学习的方法。详细分析表明,node2vec 技术生成的密集向量对提高性能贡献很大。很明显,node2vec 技术有效地捕捉了 PPI 网络的拓扑和语义特性。采样方法也提高了识别必需蛋白的性能。
我们证明了 DeepEP 通过整合多种深度学习技术和采样方法来提高预测性能。DeepEP 比现有的方法更有效。