School of Artificial Intelligence, Jilin University, Jilin, China.
Information Science and Technology, Northeast Normal University, Jilin, China.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab288.
Mitochondria are membrane-bound organelles containing over 1000 different proteins involved in mitochondrial function, gene expression and metabolic processes. Accurate localization of those proteins in the mitochondrial compartments is critical to their operation. A few computational methods have been developed for predicting submitochondrial localization from the protein sequences. Unfortunately, most of these computational methods focus on employing biological features or evolutionary information to extract sequence features, which greatly limits the performance of subsequent identification. Moreover, the efficiency of most computational models is still under explored, especially the deep learning feature, which is promising but requires improvement. To address these limitations, we propose a novel computational method called iDeepSubMito to predict the location of mitochondrial proteins to the submitochondrial compartments. First, we adopted a coding scheme using the ProteinELMo to model the probability distribution over the protein sequences and then represent the protein sequences as continuous vectors. Then, we proposed and implemented convolutional neural network architecture based on the bidirectional LSTM with self-attention mechanism, to effectively explore the contextual information and protein sequence semantic features. To demonstrate the effectiveness of our proposed iDeepSubMito, we performed cross-validation on two datasets containing 424 proteins and 570 proteins respectively, and consisting of four different mitochondrial compartments (matrix, inner membrane, outer membrane and intermembrane regions). Experimental results revealed that our method outperformed other computational methods. In addition, we tested iDeepSubMito on the M187, M983 and MitoCarta3.0 to further verify the efficiency of our method. Finally, the motif analysis and the interpretability analysis were conducted to reveal novel insights into subcellular biological functions of mitochondrial proteins. iDeepSubMito source code is available on GitHub at https://github.com/houzl3416/iDeepSubMito.
线粒体是一种具有膜结构的细胞器,包含超过 1000 种不同的蛋白质,这些蛋白质参与线粒体功能、基因表达和代谢过程。这些蛋白质在细胞器中的准确定位对于它们的功能至关重要。已经开发了一些计算方法来根据蛋白质序列预测亚线粒体定位。不幸的是,大多数这些计算方法都集中于利用生物特征或进化信息来提取序列特征,这极大地限制了后续识别的性能。此外,大多数计算模型的效率仍有待探索,特别是深度学习特征,虽然很有前途,但需要改进。为了解决这些限制,我们提出了一种新的计算方法,称为 iDeepSubMito,用于预测线粒体蛋白质的亚线粒体定位。首先,我们采用了一种编码方案,使用 ProteinELMo 来对蛋白质序列的概率分布进行建模,然后将蛋白质序列表示为连续向量。然后,我们提出并实现了基于双向 LSTM 的卷积神经网络架构,该架构具有自注意力机制,可有效探索上下文信息和蛋白质序列语义特征。为了证明我们提出的 iDeepSubMito 的有效性,我们在两个分别包含 424 个和 570 个蛋白质的数据集上进行了交叉验证,这些数据集包含四个不同的线粒体区室(基质、内膜、外膜和膜间区室)。实验结果表明,我们的方法优于其他计算方法。此外,我们还在 M187、M983 和 MitoCarta3.0 上测试了 iDeepSubMito,以进一步验证我们方法的效率。最后,进行了基序分析和可解释性分析,以揭示线粒体蛋白质亚细胞生物功能的新见解。iDeepSubMito 的源代码可在 GitHub 上获得,网址为 https://github.com/houzl3416/iDeepSubMito。