Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M
College of Chemistry, Sichuan University, Chengdu, Sichuan, China.
Amino Acids. 2008 Jan;34(1):111-7. doi: 10.1007/s00726-007-0550-z. Epub 2007 May 23.
Transmembrane (TM) proteins represent about 20-30% of the protein sequences in higher eukaryotes, playing important roles across a range of cellular functions. Moreover, knowledge about topology of these proteins often provides crucial hints toward their function. Due to the difficulties in experimental structure determinations of TM protein, theoretical prediction methods are highly preferred in identifying the topology of newly found ones according to their primary sequences, useful in both basic research and drug discovery. In this paper, based on the concept of pseudo amino acid composition (PseAA) that can incorporate sequence-order information of a protein sequence so as to remarkably enhance the power of discrete models (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246-255), cellular automata and Lempel-Ziv complexity are introduced to predict the TM regions of integral membrane proteins including both alpha-helical and beta-barrel membrane proteins, validated by jackknife test. The result thus obtained is quite promising, which indicates that the current approach might be a quite potential high throughput tool in the post-genomic era. The source code and dataset are available for academic users at liml@scu.edu.cn.
跨膜(TM)蛋白约占高等真核生物蛋白质序列的20%-30%,在一系列细胞功能中发挥着重要作用。此外,关于这些蛋白质拓扑结构的知识通常为其功能提供关键线索。由于跨膜蛋白实验结构测定存在困难,在根据新发现的跨膜蛋白的一级序列确定其拓扑结构时,理论预测方法备受青睐,这在基础研究和药物发现中都很有用。本文基于伪氨基酸组成(PseAA)的概念,该概念可以纳入蛋白质序列的序列顺序信息,从而显著增强离散模型的能力(Chou,K.C.,《蛋白质:结构、功能与遗传学》,2001年,43:246-255),引入细胞自动机和莱姆尔-齐夫复杂度来预测包括α-螺旋和β-桶状膜蛋白在内的整合膜蛋白的跨膜区域,并通过留一法检验进行验证。由此获得的结果很有前景,这表明当前方法可能是后基因组时代一种颇具潜力的高通量工具。学术用户可通过liml@scu.edu.cn获取源代码和数据集。