Zhang Shao-Wu, Zhang Yun-Long, Yang Hui-Fang, Zhao Chun-Hui, Pan Quan
College of Automation, Northwestern Polytechnical University, No. 127 Youyi West Road, Xi'an 710072, China.
Amino Acids. 2008 May;34(4):565-72. doi: 10.1007/s00726-007-0010-9. Epub 2007 Dec 11.
The rapidly increasing number of sequence entering into the genome databank has called for the need for developing automated methods to analyze them. Information on the subcellular localization of new found protein sequences is important for helping to reveal their functions in time and conducting the study of system biology at the cellular level. Based on the concept of Chou's pseudo-amino acid composition, a series of useful information and techniques, such as residue conservation scores, von Neumann entropies, multi-scale energy, and weighted auto-correlation function were utilized to generate the pseudo-amino acid components for representing the protein samples. Based on such an infrastructure, a hybridization predictor was developed for identifying uncharacterized proteins among the following 12 subcellular localizations: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracell, Golgi apparatus, lysosome, mitochondria, nucleus, peroxisome, plasma membrane, and vacuole. Compared with the results reported by the previous investigators, higher success rates were obtained, suggesting that the current approach is quite promising, and may become a useful high-throughput tool in the relevant areas.
进入基因组数据库的序列数量迅速增加,这就需要开发自动化方法来对其进行分析。新发现的蛋白质序列的亚细胞定位信息对于及时揭示其功能以及在细胞水平上开展系统生物学研究至关重要。基于周的伪氨基酸组成概念,利用了一系列有用的信息和技术,如残基保守分数、冯·诺依曼熵、多尺度能量和加权自相关函数来生成用于表征蛋白质样本的伪氨基酸成分。基于这样的基础架构,开发了一种杂交预测器,用于在以下12种亚细胞定位中识别未表征的蛋白质:叶绿体、细胞质、细胞骨架、内质网、细胞外、高尔基体、溶酶体、线粒体、细胞核、过氧化物酶体、质膜和液泡。与先前研究者报道的结果相比,获得了更高的成功率,这表明当前的方法很有前景,可能会成为相关领域中一种有用的高通量工具。