Wang Xiao, Zhang Jun, Li Guo-Zheng
BMC Bioinformatics. 2015;16 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-16-S12-S1. Epub 2015 Aug 25.
It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins.
In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively.
Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/ and http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/ respectively.
利用计算方法预测细菌蛋白质的亚细胞定位已成为一项非常重要且充满挑战的任务。尽管存在许多针对细菌蛋白质的预测方法,但这些方法中的大多数只能处理单定位蛋白质。但不幸的是,许多多定位蛋白质存在于细菌细胞中。此外,多定位蛋白质具有特殊的生物学功能,有助于新药的开发。因此,有必要开发新的计算方法来准确预测多定位细菌蛋白质的亚细胞定位。
在本文中,开发了两种高效的多标签预测器Gpos-ECC-mPLoc和Gneg-ECC-mPLoc,分别用于预测多标签革兰氏阳性和革兰氏阴性细菌蛋白质的亚细胞定位。这两种多标签预测器通过使用查询蛋白质同源蛋白质的GO术语构建GO向量,然后采用强大的多标签集成分类器进行最终的多标签预测。这两种多标签预测器具有以下优点:(1)通过考虑不同标签之间的相关性提高了多标签蛋白质的预测性能;(2)集成了多个CC分类器,并通过集成学习进一步产生更好的预测结果;(3)通过使用典型同源集中GO术语的出现频率构建GO向量,而不是使用0/1值。实验结果表明,Gpos-ECC-mPLoc和Gneg-ECC-mPLoc可以分别有效地预测多标签革兰氏阳性和革兰氏阴性细菌蛋白质的亚细胞定位。
Gpos-ECC-mPLoc和Gneg-ECC-mPLoc可以分别有效地提高多定位革兰氏阳性和革兰氏阴性细菌蛋白质亚细胞定位的预测准确性。Gpos-ECC-mPLoc和Gneg-ECC-mPLoc预测器的在线网络服务器分别可从http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/和http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/免费访问。