Yu Chin-Sheng, Lin Chih-Jen, Hwang Jenn-Kang
Department of Biological Science and Technology, National Chiao Tung University, HsinChu 30050, Taiwan.
Protein Sci. 2004 May;13(5):1402-6. doi: 10.1110/ps.03479604.
Gram-negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram-negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n-peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT-B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high-throughput and large-scale analysis of proteomic and genomic data.
细胞质、周质、内膜、外膜和细胞外空间。蛋白质的亚细胞定位能够为其功能提供有价值的信息。随着已测序基因组数据的迅速增加,对用于预测亚细胞定位的自动化且准确的工具的需求变得越来越重要。我们提出了一种预测革兰氏阴性菌亚细胞定位的方法。该方法使用基于n肽组成的多个特征向量训练的支持向量机。对于一个包含1443种蛋白质的标准数据集,总体预测准确率达到89%,据我们所知,这是迄今报道的最高预测率。我们的预测比最近开发的多模块PSORT - B高出14%。由于其简单性,这种方法可以很容易地扩展到其他生物体,并且应该是蛋白质组学和基因组数据高通量大规模分析的有用工具。