Gardy Jennifer L, Spencer Cory, Wang Ke, Ester Martin, Tusnády Gábor E, Simon István, Hua Sujun, deFays Katalin, Lambert Christophe, Nakai Kenta, Brinkman Fiona S L
Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada V5A 1S6.
Nucleic Acids Res. 2003 Jul 1;31(13):3613-7. doi: 10.1093/nar/gkg602.
Automated prediction of bacterial protein subcellular localization is an important tool for genome annotation and drug discovery. PSORT has been one of the most widely used computational methods for such bacterial protein analysis; however, it has not been updated since it was introduced in 1991. In addition, neither PSORT nor any of the other computational methods available make predictions for all five of the localization sites characteristic of Gram-negative bacteria. Here we present PSORT-B, an updated version of PSORT for Gram-negative bacteria, which is available as a web-based application at http://www.psort.org. PSORT-B examines a given protein sequence for amino acid composition, similarity to proteins of known localization, presence of a signal peptide, transmembrane alpha-helices and motifs corresponding to specific localizations. A probabilistic method integrates these analyses, returning a list of five possible localization sites with associated probability scores. PSORT-B, designed to favor high precision (specificity) over high recall (sensitivity), attained an overall precision of 97% and recall of 75% in 5-fold cross-validation tests, using a dataset we developed of 1443 proteins of experimentally known localization. This dataset, the largest of its kind, is freely available, along with the PSORT-B source code (under GNU General Public License).
细菌蛋白质亚细胞定位的自动预测是基因组注释和药物发现的重要工具。PSORT一直是此类细菌蛋白质分析中使用最广泛的计算方法之一;然而,自1991年推出以来它就没有更新过。此外,PSORT以及其他任何可用的计算方法都无法对革兰氏阴性菌特有的所有五个定位位点进行预测。在此,我们展示PSORT-B,这是PSORT针对革兰氏阴性菌的更新版本,可通过网页应用程序在http://www.psort.org获取。PSORT-B会检查给定的蛋白质序列的氨基酸组成、与已知定位蛋白质的相似性、信号肽的存在、跨膜α螺旋以及与特定定位相对应的基序。一种概率方法整合这些分析,返回五个可能的定位位点列表以及相关的概率得分。PSORT-B旨在优先考虑高精度(特异性)而非高召回率(敏感性),在使用我们开发的包含1443个实验已知定位蛋白质的数据集进行的5折交叉验证测试中,总体精度达到97%,召回率达到75%。这个同类中最大的数据集以及PSORT-B源代码(遵循GNU通用公共许可证)均可免费获取。