Chou Kuo-Chen, Cai Yu-Dong
Gordon Life Science Institute, Torrey Del Mar Drive, San Diego, California 92130, USA.
J Cell Biochem. 2004 Apr 15;91(6):1197-203. doi: 10.1002/jcb.10790.
Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Since the functions of these proteins are closely correlated with their subcellular localizations, many efforts have been made to develop a variety of methods for predicting protein subcellular location. In this study, based on the strategy by hybridizing the functional domain composition and the pseudo-amino acid composition (Cai and Chou [2003]: Biochem. Biophys. Res. Commun. 305:407-411), the Intimate Sorting Algorithm (ISort predictor) was developed for predicting the protein subcellular location. As a showcase, the same plant and non-plant protein datasets as investigated by the previous investigators were used for demonstration. The overall success rate by the jackknife test for the plant protein dataset was 85.4%, and that for the non-plant protein dataset 91.9%. These are so far the highest success rates achieved for the two datasets by following a rigorous cross validation test procedure, further confirming that such a hybrid approach may become a very useful high-throughput tool in the area of bioinformatics, proteomics, as well as molecular cell biology.
大规模基因组测序的最新进展导致了功能未知蛋白质氨基酸序列的快速积累。由于这些蛋白质的功能与其亚细胞定位密切相关,人们已经做出了许多努力来开发各种预测蛋白质亚细胞定位的方法。在本研究中,基于功能域组成与伪氨基酸组成相结合的策略(蔡和周[2003]:《生物化学与生物物理研究通讯》305:407 - 411),开发了亲密排序算法(ISort预测器)来预测蛋白质亚细胞定位。作为一个展示,使用了与之前研究者所研究的相同的植物和非植物蛋白质数据集进行演示。对植物蛋白质数据集进行留一法检验的总体成功率为85.4%,对非植物蛋白质数据集为91.9%。通过严格的交叉验证测试程序,这些是迄今为止在这两个数据集上所取得的最高成功率,进一步证实了这种混合方法可能成为生物信息学、蛋白质组学以及分子细胞生物学领域非常有用的高通量工具。