School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China.
Int J Mol Sci. 2020 Aug 9;21(16):5710. doi: 10.3390/ijms21165710.
Mitochondrial proteins are physiologically active in different compartments, and their abnormal location will trigger the pathogenesis of human mitochondrial pathologies. Correctly identifying submitochondrial locations can provide information for disease pathogenesis and drug design. A mitochondrion has four submitochondrial compartments, the matrix, the outer membrane, the inner membrane, and the intermembrane space, but various existing studies ignored the intermembrane space. The majority of researchers used traditional machine learning methods for predicting mitochondrial protein localization. Those predictors required expert-level knowledge of biology to be encoded as features rather than allowing the underlying predictor to extract features through a data-driven procedure. Besides, few researchers have considered the imbalance in datasets. In this paper, we propose a novel end-to-end predictor employing deep neural networks, DeepPred-SubMito, for protein submitochondrial location prediction. First, we utilize random over-sampling to decrease the influence caused by unbalanced datasets. Next, we train a multi-channel bilayer convolutional neural network for multiple subsequences to learn high-level features. Third, the prediction result is outputted through the fully connected layer. The performance of the predictor is measured by 10-fold cross-validation and 5-fold cross-validation on the SM424-18 dataset and the SubMitoPred dataset, respectively. Experimental results show that the predictor outperforms state-of-the-art predictors. In addition, the prediction of results in the M983 dataset also confirmed its effectiveness in predicting submitochondrial locations.
线粒体蛋白在不同的隔室中具有生理活性,其异常定位会引发人类线粒体疾病的发病机制。正确识别亚线粒体定位可为疾病发病机制和药物设计提供信息。线粒体有四个亚线粒体隔室,即基质、外膜、内膜和膜间空间,但现有的各种研究都忽略了膜间空间。大多数研究人员使用传统的机器学习方法来预测线粒体蛋白的定位。这些预测器需要生物学方面的专家级知识才能被编码为特征,而不是通过数据驱动的过程让底层预测器提取特征。此外,很少有研究人员考虑到数据集的不平衡问题。在本文中,我们提出了一种新颖的端到端预测器 DeepPred-SubMito,它采用深度神经网络来进行蛋白质亚线粒体定位预测。首先,我们利用随机过采样来减少不平衡数据集带来的影响。接下来,我们训练一个多通道双层卷积神经网络来对多个子序列进行学习,以提取高级特征。最后,通过全连接层输出预测结果。我们在 SM424-18 数据集和 SubMitoPred 数据集上分别进行了 10 折交叉验证和 5 折交叉验证,以衡量预测器的性能。实验结果表明,该预测器优于现有的最先进的预测器。此外,在 M983 数据集上的预测结果也证实了其在预测亚线粒体定位方面的有效性。