Zhou Miaomiao, Boekhorst Jos, Francke Christof, Siezen Roland J
Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, PO Box 9101, 6500 HB Nijmegen, The Netherlands.
BMC Bioinformatics. 2008 Mar 27;9:173. doi: 10.1186/1471-2105-9-173.
In the past decades, various protein subcellular-location (SCL) predictors have been developed. Most of these predictors, like TMHMM 2.0, SignalP 3.0, PrediSi and Phobius, aim at the identification of one or a few SCLs, whereas others such as CELLO and Psortb.v.2.0 aim at a broader classification. Although these tools and pipelines can achieve a high precision in the accurate prediction of signal peptides and transmembrane helices, they have a much lower accuracy when other sequence characteristics are concerned. For instance, it proved notoriously difficult to identify the fate of proteins carrying a putative type I signal peptidase (SPIase) cleavage site, as many of those proteins are retained in the cell membrane as N-terminally anchored membrane proteins. Moreover, most of the SCL classifiers are based on the classification of the Swiss-Prot database and consequently inherited the inconsistency of that SCL classification. As accurate and detailed SCL prediction on a genome scale is highly desired by experimental researchers, we decided to construct a new SCL prediction pipeline: LocateP.
LocateP combines many of the existing high-precision SCL identifiers with our own newly developed identifiers for specific SCLs. The LocateP pipeline was designed such that it mimics protein targeting and secretion processes. It distinguishes 7 different SCLs within Gram-positive bacteria: intracellular, multi-transmembrane, N-terminally membrane anchored, C-terminally membrane anchored, lipid-anchored, LPxTG-type cell-wall anchored, and secreted/released proteins. Moreover, it distinguishes pathways for Sec- or Tat-dependent secretion and alternative secretion of bacteriocin-like proteins. The pipeline was tested on data sets extracted from literature, including experimental proteomics studies. The tests showed that LocateP performs as well as, or even slightly better than other SCL predictors for some locations and outperforms current tools especially where the N-terminally anchored and the SPIase-cleaved secreted proteins are concerned. Overall, the accuracy of LocateP was always higher than 90%. LocateP was then used to predict the SCLs of all proteins encoded by completed Gram-positive bacterial genomes. The results are stored in the database LocateP-DB http://www.cmbi.ru.nl/locatep-db1.
LocateP is by far the most accurate and detailed protein SCL predictor for Gram-positive bacteria currently available.
在过去几十年中,已经开发了各种蛋白质亚细胞定位(SCL)预测工具。这些预测工具中的大多数,如TMHMM 2.0、SignalP 3.0、PrediSi和Phobius,旨在识别一种或几种SCL,而其他工具,如CELLO和Psortb.v.2.0,则旨在进行更广泛的分类。尽管这些工具和流程在准确预测信号肽和跨膜螺旋方面可以达到很高的精度,但在涉及其他序列特征时,它们的准确性要低得多。例如,事实证明,识别携带假定的I型信号肽酶(SPIase)切割位点的蛋白质的命运非常困难,因为许多此类蛋白质作为N端锚定膜蛋白保留在细胞膜中。此外,大多数SCL分类器基于Swiss-Prot数据库的分类,因此继承了该SCL分类的不一致性。由于实验研究人员非常希望在基因组规模上进行准确而详细的SCL预测,我们决定构建一个新的SCL预测流程:LocateP。
LocateP将许多现有的高精度SCL识别器与我们自己新开发的特定SCL识别器结合在一起。LocateP流程的设计模仿了蛋白质靶向和分泌过程。它区分革兰氏阳性菌中的7种不同SCL:细胞内、多跨膜、N端膜锚定、C端膜锚定、脂锚定、LPxTG型细胞壁锚定和分泌/释放蛋白。此外,它区分Sec或Tat依赖性分泌途径以及类细菌素蛋白的替代分泌途径。该流程在从文献中提取的数据集上进行了测试,包括实验蛋白质组学研究。测试表明,对于某些定位,LocateP的表现与其他SCL预测器相当,甚至略好,尤其在涉及N端锚定和SPIase切割的分泌蛋白方面,其性能优于当前工具。总体而言,LocateP的准确率始终高于90%。然后,LocateP被用于预测已完成的革兰氏阳性菌基因组编码的所有蛋白质的SCL。结果存储在数据库LocateP-DB http://www.cmbi.ru.nl/locatep-db1中。
LocateP是目前可用的用于革兰氏阳性菌的最准确、最详细的蛋白质SCL预测器。