Shen Hong-Bin, Chou Kuo-Chen
Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, 1954 Hua-Shan Road, Shanghai 200030, China.
Protein Eng Des Sel. 2007 Nov;20(11):561-7. doi: 10.1093/protein/gzm057. Epub 2007 Nov 10.
The life processes of an eukaryotic cell are guided by its nucleus. In addition to the genetic material, the cellular nucleus contains many proteins located at its different compartments, called subnuclear locations. Information of their localization in a nucleus is indispensable for the in-depth study of system biology because, in addition to helping determine their functions, it can provide illuminative insights of how and in what kind of microenvironments these subnuclear proteins are interacting with each other and with other molecules. Facing the deluge of protein sequences generated in the post-genomic age, we are challenged to develop an automated method for fast and effectively annotating the subnuclear locations of numerous newly found nuclear protein sequences. In view of this, a new classifier, called Nuc-PLoc, has been developed that can be used to identify nuclear proteins among the following nine subnuclear locations: (1) chromatin, (2) heterochromatin, (3) nuclear envelope, (4) nuclear matrix, (5) nuclear pore complex, (6) nuclear speckle, (7) nucleolus, (8) nucleoplasm and (9) nuclear promyelocytic leukaemia (PML) body. Nuc-PLoc is featured by an ensemble classifier formed by fusing the evolution information of a protein and its pseudo-amino acid composition. The overall jackknife cross-validation accuracy obtained by Nuc-PLoc is significantly higher than those by the existing methods on the same benchmark data set through the same testing procedure. As a user-friendly web-server, Nuc-PLoc is freely accessible to the public at http://chou.med.harvard.edu/bioinf/Nuc-PLoc.
真核细胞的生命过程由其细胞核引导。除了遗传物质外,细胞核还包含许多位于其不同隔室的蛋白质,这些隔室被称为亚核位置。了解它们在细胞核中的定位信息对于深入研究系统生物学至关重要,因为这不仅有助于确定它们的功能,还能提供有关这些亚核蛋白如何以及在何种微环境中相互作用以及与其他分子相互作用的有启发性见解。面对后基因组时代产生的大量蛋白质序列,我们面临着开发一种自动化方法来快速有效地注释众多新发现的核蛋白序列亚核位置的挑战。有鉴于此,已经开发了一种名为Nuc-PLoc的新分类器,可用于识别以下九个亚核位置中的核蛋白:(1)染色质,(2)异染色质,(3)核膜,(4)核基质,(5)核孔复合体,(6)核斑,(7)核仁,(8)核质和(9)核早幼粒细胞白血病(PML)体。Nuc-PLoc的特点是通过融合蛋白质的进化信息及其伪氨基酸组成形成的集成分类器。通过相同的测试程序,Nuc-PLoc在相同基准数据集上获得的总体留一法交叉验证准确率明显高于现有方法。作为一个用户友好的网络服务器,公众可以通过http://chou.med.harvard.edu/bioinf/Nuc-PLoc免费访问Nuc-PLoc。