Jiao Ya-Sen, Du Pu-Feng
School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.
School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.
J Theor Biol. 2017 Mar 7;416:81-87. doi: 10.1016/j.jtbi.2016.12.026. Epub 2017 Jan 8.
Predicting protein submitochondrial locations has been studied for about ten years. A dozen of methods were developed in this regard. Although a mitochondrion has four submitochondrial compartments, all existing studies considered only three of them. The mitochondrial intermembrane space proteins were always excluded in these studies. However, there are over 50 mitochondrial intermembrane space proteins in the recent release of UniProt database. We think it is time to incorporate these proteins in predicting protein submitochondrial locations. We proposed the functional domain enrichment score, which can be used as an enhancement to our positional-specific physicochemical properties method. We constructed a high-quality working dataset from the UniProt database. This dataset contains proteins from all four submitochondrial locations. Proteins with multiple submitochondrial locations are also included. Our method achieved over 70% prediction accuracy for proteins with single location on this dataset. On the M3-317 benchmarking dataset, our method achieved comparable prediction performance to other state-of-the-art methods. Our results indicate that the intermembrane space proteins can be incorporated in predicting protein submitochondrial locations. By evaluating our method with the proteins that have multiple submitochondrial locations, we conclude that our method is capable of predicting multiple submitochondrial locations. This is the first report of ab initio methods that can identify intermembrane space proteins. This is also the first attempt to incorporate proteins with multiple submitochondrial locations. The benchmarking dataset can be obtained by emails to the corresponding author.
预测蛋白质的亚线粒体定位已经研究了大约十年。在这方面已经开发了十几种方法。尽管线粒体有四个亚线粒体区室,但所有现有研究只考虑了其中三个。线粒体膜间隙蛋白在这些研究中总是被排除在外。然而,在最近发布的UniProt数据库中有超过50种线粒体膜间隙蛋白。我们认为现在是时候将这些蛋白纳入蛋白质亚线粒体定位的预测中了。我们提出了功能域富集分数,它可以作为我们的位置特异性物理化学性质方法的增强。我们从UniProt数据库构建了一个高质量的工作数据集。这个数据集包含来自所有四个亚线粒体位置的蛋白质。也包括具有多个亚线粒体位置的蛋白质。我们的方法在这个数据集上对单定位蛋白质的预测准确率超过了70%。在M3 - 317基准数据集上,我们的方法取得了与其他最先进方法相当的预测性能。我们的结果表明,膜间隙蛋白可以纳入蛋白质亚线粒体定位的预测中。通过用具有多个亚线粒体位置的蛋白质评估我们的方法,我们得出结论,我们的方法能够预测多个亚线粒体位置。这是关于能够识别膜间隙蛋白的从头算方法的首次报告。这也是纳入具有多个亚线粒体位置的蛋白质的首次尝试。基准数据集可通过发邮件给通讯作者获得。