King Brian R, Vural Suleyman, Pandey Sanjit, Barteau Alex, Guda Chittibabu
Department of Computer Science, Bucknell University, One Dent Drive, Lewisburg, PA 17837, USA.
BMC Res Notes. 2012 Jul 10;5:351. doi: 10.1186/1756-0500-5-351.
Understanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of organelles in the cell. Additionally, the majority of methods predict only a single location for a sequence, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function.
We present a software package and a web server for predicting the subcellular localization of protein sequences based on the ngLOC method. ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 89.8% to 91.4% across species. This program can predict 11 distinct locations each in plant and animal species. ngLOC also predicts 4 and 5 distinct locations on gram-positive and gram-negative bacterial datasets, respectively.
ngLOC is a generic method that can be trained by data from a variety of species or classes for predicting protein subcellular localization. The standalone software is freely available for academic use under GNU GPL, and the ngLOC web server is also accessible at http://ngloc.unmc.edu.
了解蛋白质的亚细胞定位是理解蛋白质整体功能的必要组成部分。在过去十年中,已经发表了许多计算方法,其成功程度各不相同。尽管该领域已发表的方法众多,但可供研究人员在自己的研究中使用的却很少。在那些可用的方法中,许多方法仅限于预测细胞中的少数细胞器。此外,大多数方法只为一个序列预测单一位置,尽管已知真核生物中的很大一部分蛋白质会在不同位置之间穿梭以执行其功能。
我们展示了一个基于ngLOC方法预测蛋白质序列亚细胞定位的软件包和一个网络服务器。ngLOC是一种基于n元语法的贝叶斯分类器,可预测原核生物和真核生物中蛋白质的亚细胞定位。跨物种的总体预测准确率在89.8%至91.4%之间。该程序可以分别预测植物和动物物种中的11个不同位置。ngLOC还分别在革兰氏阳性和革兰氏阴性细菌数据集上预测4个和5个不同位置。
ngLOC是一种通用方法,可以通过来自各种物种或类别的数据进行训练,以预测蛋白质的亚细胞定位。独立软件在GNU GPL许可下可免费用于学术用途,ngLOC网络服务器也可通过http://ngloc.unmc.edu访问。