Shen Hong-Bin, Chou Kuo-Chen
Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai 200240, China.
Anal Biochem. 2009 Nov 15;394(2):269-74. doi: 10.1016/j.ab.2009.07.046. Epub 2009 Aug 3.
Predicting subcellular localization of human proteins is a challenging problem, particularly when query proteins may have a multiplex character, i.e., simultaneously exist at, or move between, two or more different subcellular location sites. In a previous study, we developed a predictor called "Hum-mPLoc" to deal with the multiplex problem for the human protein system. However, Hum-mPLoc has the following shortcomings. (1) The input of accession number for a query protein is required in order to obtain a higher expected success rate by selecting to use the higher-level prediction pathway; but many proteins, such as synthetic and hypothetical proteins as well as those newly discovered proteins without being deposited into databanks yet, do not have accession numbers. (2) Neither functional domain nor sequential evolution information were taken into account in Hum-mPLoc, and hence its power may be reduced accordingly. In view of this, a top-down strategy to address these shortcomings has been implemented. The new predictor thus obtained is called Hum-mPLoc 2.0, where the accession number for input is no longer needed whatsoever. Moreover, both the functional domain information and the sequential evolution information have been fused into the predictor by an ensemble classifier. As a consequence, the prediction power has been significantly enhanced. The web server of Hum-mPLoc2.0 is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/hum-multi-2/.
预测人类蛋白质的亚细胞定位是一个具有挑战性的问题,尤其是当查询蛋白质可能具有多重特征时,即同时存在于两个或更多不同的亚细胞定位位点,或在这些位点之间移动。在之前的一项研究中,我们开发了一种名为“Hum-mPLoc”的预测工具来处理人类蛋白质系统中的多重问题。然而,Hum-mPLoc存在以下缺点。(1)为了通过选择使用更高级别的预测途径获得更高的预期成功率,需要输入查询蛋白质的登录号;但许多蛋白质,如合成蛋白质、假设蛋白质以及那些尚未存入数据库的新发现蛋白质,没有登录号。(2)Hum-mPLoc没有考虑功能域信息和序列进化信息,因此其预测能力可能会相应降低。鉴于此,我们实施了一种自上而下的策略来解决这些缺点。由此获得的新预测工具称为Hum-mPLoc 2.0,它不再需要任何输入登录号。此外,功能域信息和序列进化信息都已通过集成分类器融入到预测工具中。因此,预测能力得到了显著提高。Hum-mPLoc2.0的网络服务器可在http://www.csbio.sjtu.edu.cn/bioinf/hum-multi-2/免费访问。