Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan.
J Comput Aided Mol Des. 2013 Jan;27(1):91-103. doi: 10.1007/s10822-012-9628-0. Epub 2013 Jan 3.
The function of a protein is generally related to its subcellular localization. Therefore, knowing its subcellular localization is helpful in understanding its potential functions and roles in biological processes. This work develops a hybrid method for computationally predicting the subcellular localization of eukaryotic protein. The method is called EuLoc and incorporates the Hidden Markov Model (HMM) method, homology search approach and the support vector machines (SVM) method by fusing several new features into Chou's pseudo-amino acid composition. The proposed SVM module overcomes the shortcoming of the homology search approach in predicting the subcellular localization of a protein which only finds low-homologous or non-homologous sequences in a protein subcellular localization annotated database. The proposed HMM modules overcome the shortcoming of SVM in predicting subcellular localizations using few data on protein sequences. Several features of a protein sequence are considered, including the sequence-based features, the biological features derived from PROSITE, NLSdb and Pfam, the post-transcriptional modification features and others. The overall accuracy and location accuracy of EuLoc are 90.5 and 91.2 %, respectively, revealing a better predictive performance than obtained elsewhere. Although the amounts of data of the various subcellular location groups in benchmark dataset differ markedly, the accuracies of 12 subcellular localizations of EuLoc range from 82.5 to 100 %, indicating that this tool is much more balanced than other tools. EuLoc offers a high, balanced predictive power for each subcellular localization. EuLoc is now available on the web at http://euloc.mbc.nctu.edu.tw/.
蛋白质的功能通常与其亚细胞定位有关。因此,了解其亚细胞定位有助于理解其在生物过程中的潜在功能和作用。本工作开发了一种混合方法,用于计算预测真核蛋白质的亚细胞定位。该方法称为 EuLoc,它将隐马尔可夫模型(HMM)方法、同源搜索方法和支持向量机(SVM)方法相结合,通过将几个新特征融合到 Chou 的伪氨基酸组成中。所提出的 SVM 模块克服了同源搜索方法在预测蛋白质亚细胞定位中的缺点,该方法只能在蛋白质亚细胞定位注释数据库中找到低同源或非同源序列。所提出的 HMM 模块克服了 SVM 在使用蛋白质序列上的少量数据预测亚细胞定位的缺点。考虑了蛋白质序列的几个特征,包括基于序列的特征、从 PROSITE、NLSdb 和 Pfam 中导出的生物特征、转录后修饰特征等。EuLoc 的整体准确性和位置准确性分别为 90.5%和 91.2%,显示出比其他方法更好的预测性能。尽管基准数据集各种亚细胞定位组的数据量差异很大,但 EuLoc 的 12 种亚细胞定位的准确性范围为 82.5%至 100%,表明该工具比其他工具更平衡。EuLoc 为每个亚细胞定位提供了高、平衡的预测能力。EuLoc 现在可以在 http://euloc.mbc.nctu.edu.tw/ 上获得。