Kumar Manish, Verma Ruchi, Raghava Gajendra P S
Institute of Microbial Technology, Sector 39-A, Chandigarh 160036, India.
J Biol Chem. 2006 Mar 3;281(9):5357-63. doi: 10.1074/jbc.M511061200. Epub 2005 Dec 8.
Mitochondria are considered as one of the core organelles of eukaryotic cells hence prediction of mitochondrial proteins is one of the major challenges in the field of genome annotation. This study describes a method, MitPred, developed for predicting mitochondrial proteins with high accuracy. The data set used in this study was obtained from Guda, C., Fahy, E. & Subramaniam, S. (2004) Bioinformatics 20, 1785-1794. First support vector machine-based modules/methods were developed using amino acid and dipeptide composition of proteins and achieved accuracy of 78.37 and 79.38%, respectively. The accuracy of prediction further improved to 83.74% when split amino acid composition (25 N-terminal, 25 C-terminal, and remaining residues) of proteins was used. Then BLAST search and support vector machine-based method were combined to get 88.22% accuracy. Finally we developed a hybrid approach that combined hidden Markov model profiles of domains (exclusively found in mitochondrial proteins) and the support vector machine-based method. We were able to predict mitochondrial protein with 100% specificity at a 56.36% sensitivity rate and with 80.50% specificity at 98.95% sensitivity. The method estimated 9.01, 6.35, 4.84, 3.95, and 4.25% of proteins as mitochondrial in Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, mouse, and human proteomes, respectively. MitPred was developed on the above hybrid approach.
线粒体被认为是真核细胞的核心细胞器之一,因此预测线粒体蛋白是基因组注释领域的主要挑战之一。本研究描述了一种名为MitPred的方法,该方法用于高精度预测线粒体蛋白。本研究中使用的数据集来自古达、法希和苏布拉马尼亚姆(2004年)发表于《生物信息学》20卷第1785 - 1794页的文章。首先,利用蛋白质的氨基酸和二肽组成开发了基于支持向量机的模块/方法,其准确率分别达到了78.37%和79.38%。当使用蛋白质的拆分氨基酸组成(25个N端、25个C端和其余残基)时,预测准确率进一步提高到83.74%。然后,将BLAST搜索与基于支持向量机的方法相结合,准确率达到了88.22%。最后,我们开发了一种混合方法,该方法结合了(仅在线粒体蛋白中发现的)结构域的隐马尔可夫模型概况和基于支持向量机的方法。我们能够以100%的特异性在56.36%的灵敏度下预测线粒体蛋白,以及在98.95%的灵敏度下以80.50%的特异性进行预测。该方法分别估计酿酒酵母、黑腹果蝇、秀丽隐杆线虫、小鼠和人类蛋白质组中有9.01%、6.35%、4.84%、3.95%和4.25%的蛋白质为线粒体蛋白。MitPred就是基于上述混合方法开发的。