Petsalaki Evangelia I, Bagos Pantelis G, Litou Zoi I, Hamodrakas Stavros J
Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens 15701, Greece.
Genomics Proteomics Bioinformatics. 2006 Feb;4(1):48-55. doi: 10.1016/S1672-0229(06)60016-8.
The ability to predict the subcellular localization of a protein from its sequence is of great importance, as it provides information about the protein's function. We present a computational tool, PredSL, which utilizes neural networks, Markov chains, profile hidden Markov models, and scoring matrices for the prediction of the subcellular localization of proteins in eukaryotic cells from the N-terminal amino acid sequence. It aims to classify proteins into five groups: chloroplast, thylakoid, mitochondrion, secretory pathway, and "other". When tested in a five-fold cross-validation procedure, PredSL demonstrates 86.7% and 87.1% overall accuracy for the plant and non-plant datasets, respectively. Compared with TargetP, which is the most widely used method to date, and LumenP, the results of PredSL are comparable in most cases. When tested on the experimentally verified proteins of the Saccharomyces cerevisiae genome, PredSL performs comparably if not better than any available algorithm for the same task. Furthermore, PredSL is the only method capable for the prediction of these subcellular localizations that is available as a stand-alone application through the URL:http://bioinformatics.biol.uoa.gr/PredSL/.
从蛋白质序列预测其亚细胞定位的能力非常重要,因为它能提供有关蛋白质功能的信息。我们提出了一种计算工具PredSL,它利用神经网络、马尔可夫链、轮廓隐马尔可夫模型和评分矩阵,根据N端氨基酸序列预测真核细胞中蛋白质的亚细胞定位。它旨在将蛋白质分为五组:叶绿体、类囊体、线粒体、分泌途径和“其他”。在五折交叉验证过程中进行测试时,PredSL在植物和非植物数据集上的总体准确率分别为86.7%和87.1%。与目前使用最广泛的方法TargetP和LumenP相比,PredSL的结果在大多数情况下相当。在对酿酒酵母基因组的实验验证蛋白质进行测试时,PredSL在相同任务上的表现即使不比任何现有算法更好,也与之相当。此外,PredSL是唯一一种能够预测这些亚细胞定位的方法,可通过以下网址作为独立应用程序获取:http://bioinformatics.biol.uoa.gr/PredSL/ 。