Mak Man-Wai, Guo Jian, Kung Sun-Yuan
Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong.
IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):416-22. doi: 10.1109/TCBB.2007.70256.
The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method---PairProSVM---to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST and the pairwise profile-alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino-acid compositions even if most of the homologous sequences have been removed. This paper also demonstrates that the performance of PairProSVM is sensitive (and somewhat proportional) to the degree of its kernel matrix meeting the Mercer's condition. PairProSVM was evaluated on Reinhardt and Hubbard's, Huang and Li's, and Gardy et al.'s protein datasets. The overall accuracies on these three datasets reach 99.3\%, 76.5\%, and 91.9\%, respectively, which are higher than or comparable to those obtained by sequence alignment and by the methods compared in this paper.
蛋白质的亚细胞定位是重要的功能注释。对于蛋白质组学研究而言,一种有效且可靠的亚细胞定位方法是必不可少的。本文介绍了一种新方法——PairProSVM——用于自动预测蛋白质的亚细胞定位。通过PSI-BLAST构建训练集中所有蛋白质序列的谱,并使用成对谱比对分数来形成特征向量,以训练支持向量机(SVM)分类器。研究发现,即使去除了大部分同源序列,PairProSVM的性能仍优于基于序列比对和氨基酸组成的方法。本文还证明,PairProSVM的性能对其核矩阵满足Mercer条件的程度敏感(且在一定程度上成比例)。在Reinhardt和Hubbard、Huang和Li以及Gardy等人的蛋白质数据集上对PairProSVM进行了评估。在这三个数据集上的总体准确率分别达到99.3%、76.5%和91.9%,高于或与通过序列比对以及本文中所比较的方法所获得的准确率相当。