Gromiha M Michael, Ahmad Shandar, Suwa Makiko
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tokyo Walterfront Bio-IT Research Building 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan.
Comput Biol Chem. 2005 Apr;29(2):135-42. doi: 10.1016/j.compbiolchem.2005.02.006.
Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important problem both for detecting outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have systematically analyzed the distribution of amino acid residues in the sequences of globular and outer membrane proteins. We observed that the occurrence of two neighboring aliphatic and polar residues is significantly higher in outer membrane proteins than in globular proteins. From the information about the dipeptide composition we have devised a statistical method for discriminating outer membrane proteins from other globular and membrane proteins. Our approach correctly picked up the outer membrane proteins with an accuracy of 95% for the training set of 337 proteins. On the other hand, our method has correctly excluded the globular proteins at an accuracy of 79% in a non-redundant dataset of 674 proteins. Furthermore, the present method is able to correctly exclude alpha-helical membrane proteins up to an accuracy of 87%. These accuracy levels are comparable to other methods in the literature. The influence of protein size and structural class for discrimination is discussed.
区分外膜蛋白与其他折叠类型的球状蛋白和膜蛋白,对于从基因组序列中检测外膜蛋白以及成功预测其二级和三级结构来说,都是一个重要问题。在这项工作中,我们系统地分析了球状蛋白和外膜蛋白序列中氨基酸残基的分布。我们观察到,在外膜蛋白中,两个相邻的脂肪族和极性残基的出现频率显著高于球状蛋白。根据二肽组成信息,我们设计了一种统计方法,用于区分外膜蛋白与其他球状蛋白和膜蛋白。对于337种蛋白的训练集,我们的方法以95%的准确率正确识别出外膜蛋白。另一方面,在674种蛋白的非冗余数据集中,我们的方法以79%的准确率正确排除了球状蛋白。此外,本方法能够以高达87%的准确率正确排除α-螺旋膜蛋白。这些准确率水平与文献中的其他方法相当。文中还讨论了蛋白质大小和结构类别对区分的影响。