Suppr超能文献

利用基因本体论和多标签分类器集成进行多地点革兰氏阳性和革兰氏阴性细菌蛋白质亚细胞定位

Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble.

作者信息

Wang Xiao, Zhang Jun, Li Guo-Zheng

出版信息

BMC Bioinformatics. 2015;16 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-16-S12-S1. Epub 2015 Aug 25.

Abstract

BACKGROUND

It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins.

RESULTS

In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively.

CONCLUSIONS

Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/ and http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/ respectively.

摘要

背景

利用计算方法预测细菌蛋白质的亚细胞定位已成为一项非常重要且充满挑战的任务。尽管存在许多针对细菌蛋白质的预测方法,但这些方法中的大多数只能处理单定位蛋白质。但不幸的是,许多多定位蛋白质存在于细菌细胞中。此外,多定位蛋白质具有特殊的生物学功能,有助于新药的开发。因此,有必要开发新的计算方法来准确预测多定位细菌蛋白质的亚细胞定位。

结果

在本文中,开发了两种高效的多标签预测器Gpos-ECC-mPLoc和Gneg-ECC-mPLoc,分别用于预测多标签革兰氏阳性和革兰氏阴性细菌蛋白质的亚细胞定位。这两种多标签预测器通过使用查询蛋白质同源蛋白质的GO术语构建GO向量,然后采用强大的多标签集成分类器进行最终的多标签预测。这两种多标签预测器具有以下优点:(1)通过考虑不同标签之间的相关性提高了多标签蛋白质的预测性能;(2)集成了多个CC分类器,并通过集成学习进一步产生更好的预测结果;(3)通过使用典型同源集中GO术语的出现频率构建GO向量,而不是使用0/1值。实验结果表明,Gpos-ECC-mPLoc和Gneg-ECC-mPLoc可以分别有效地预测多标签革兰氏阳性和革兰氏阴性细菌蛋白质的亚细胞定位。

结论

Gpos-ECC-mPLoc和Gneg-ECC-mPLoc可以分别有效地提高多定位革兰氏阳性和革兰氏阴性细菌蛋白质亚细胞定位的预测准确性。Gpos-ECC-mPLoc和Gneg-ECC-mPLoc预测器的在线网络服务器分别可从http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/和http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/免费访问。

相似文献

1
Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble.
BMC Bioinformatics. 2015;16 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-16-S12-S1. Epub 2015 Aug 25.
3
Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins.
J Theor Biol. 2010 May 21;264(2):326-33. doi: 10.1016/j.jtbi.2010.01.018. Epub 2010 Jan 20.
7
A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins.
PLoS One. 2012;7(5):e36317. doi: 10.1371/journal.pone.0036317. Epub 2012 May 22.
8
Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins.
Protein Eng Des Sel. 2007 Jan;20(1):39-46. doi: 10.1093/protein/gzl053. Epub 2007 Jan 23.
10
MSLoc-DT: a new method for predicting the protein subcellular location of multispecies based on decision templates.
Anal Biochem. 2014 Mar 15;449:164-71. doi: 10.1016/j.ab.2013.12.013. Epub 2013 Dec 21.

本文引用的文献

1
2
A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins.
PLoS One. 2012;7(5):e36317. doi: 10.1371/journal.pone.0036317. Epub 2012 May 22.
5
iDNA-Prot: identification of DNA binding proteins using random forest with grey model.
PLoS One. 2011;6(9):e24756. doi: 10.1371/journal.pone.0024756. Epub 2011 Sep 15.
7
NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features.
PLoS One. 2011;6(8):e23505. doi: 10.1371/journal.pone.0023505. Epub 2011 Aug 15.
9
Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine.
J Theor Biol. 2011 Jul 21;281(1):18-23. doi: 10.1016/j.jtbi.2011.04.017. Epub 2011 Apr 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验