Chou Kuo-Chen, Cai Yu-Dong
Gordon Life Science Institute San Diego, CA 92130, USA.
Bioinformatics. 2005 Apr 1;21(7):944-50. doi: 10.1093/bioinformatics/bti104. Epub 2004 Oct 28.
Most of the existing methods in predicting protein subcellular location were used to deal with the cases limited within the scope from two to five localizations, and only a few of them can be effectively extended to cover the cases of 12-14 localizations. This is because the more the locations involved are, the poorer the success rate would be. Besides, some proteins may occur in several different subcellular locations, i.e. bear the feature of 'multiplex locations'. So far there is no method that can be used to effectively treat the difficult multiplex location problem. The present study was initiated in an attempt to address (1) how to efficiently identify the localization of a query protein among many possible subcellular locations, and (2) how to deal with the case of multiplex locations.
By hybridizing gene ontology, functional domain and pseudo amino acid composition approaches, a new method has been developed that can be used to predict subcellular localization of proteins with multiplex location feature. A global analysis of the proteins in budding yeast classified into 22 locations was performed by jack-knife cross-validation with the new method. The overall success identification rate thus obtained is 70%. In contrast to this, the corresponding rates obtained by some other existing methods were only 13-14%, indicating that the new method is very powerful and promising. Furthermore, predictions were made for the four proteins whose localizations could not be determined by experiments, as well as for the 236 proteins whose localizations in budding yeast were ambiguous according to experimental observations. However, according to our predicted results, many of these 'ambiguous proteins' were found to have the same score and ranking for several different subcellular locations, implying that they may simultaneously exist, or move around, in these locations. This finding is intriguing because it reflects the dynamic feature of these proteins in a cell that may be associated with some special biological functions.
大多数现有的预测蛋白质亚细胞定位的方法用于处理局限于两到五个定位范围内的情况,其中只有少数方法能够有效地扩展到涵盖12 - 14个定位的情况。这是因为涉及的定位越多,成功率就越低。此外,一些蛋白质可能出现在几个不同的亚细胞定位中,即具有“多重定位”特征。到目前为止,还没有一种方法可以有效地处理困难的多重定位问题。本研究旨在解决:(1)如何在众多可能的亚细胞定位中高效识别查询蛋白质的定位,以及(2)如何处理多重定位的情况。
通过将基因本体、功能域和伪氨基酸组成方法相结合,开发了一种新方法,可用于预测具有多重定位特征的蛋白质的亚细胞定位。使用该新方法通过留一法交叉验证对芽殖酵母中分类为22个定位的蛋白质进行了全局分析。由此获得的总体成功识别率为70%。相比之下,其他一些现有方法获得的相应比率仅为13 - 14%,表明新方法非常强大且有前景。此外,对实验无法确定其定位的四种蛋白质以及根据实验观察在芽殖酵母中定位不明确的236种蛋白质进行了预测。然而,根据我们的预测结果,发现许多这些“定位不明确的蛋白质”在几个不同的亚细胞定位中具有相同的得分和排名,这意味着它们可能同时存在于这些定位中,或者在这些定位之间移动。这一发现很有趣,因为它反映了这些蛋白质在细胞中的动态特征,这可能与某些特殊的生物学功能相关。