Park Keun-Joon, Gromiha M Michael, Horton Paul, Suwa Makiko
Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan.
Bioinformatics. 2005 Dec 1;21(23):4223-9. doi: 10.1093/bioinformatics/bti697. Epub 2005 Oct 4.
Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important task both for dissecting outer membrane proteins (OMPs) from genomic sequences and for the successful prediction of their secondary and tertiary structures.
We have developed a method based on support vector machines using amino acid composition and residue pair information. Our approach with amino acid composition has correctly predicted the OMPs with a cross-validated accuracy of 94% in a set of 208 proteins. Further, this method has successfully excluded 633 of 673 globular proteins and 191 of 206 alpha-helical membrane proteins. We obtained an overall accuracy of 92% for correctly picking up the OMPs from a dataset of 1087 proteins belonging to all different types of globular and membrane proteins. Furthermore, residue pair information improved the accuracy from 92 to 94%. This accuracy of discriminating OMPs is higher than that of other methods in the literature, which could be used for dissecting OMPs from genomic sequences.
Discrimination results are available at http://tmbeta-svm.cbrc.jp.
从其他折叠类型的球状蛋白和膜蛋白中区分外膜蛋白,对于从基因组序列中解析外膜蛋白(OMP)以及成功预测其二级和三级结构而言都是一项重要任务。
我们开发了一种基于支持向量机的方法,该方法利用氨基酸组成和残基对信息。我们基于氨基酸组成的方法在一组208种蛋白质中以94%的交叉验证准确率正确预测了OMP。此外,该方法成功排除了673种球状蛋白中的633种以及206种α螺旋膜蛋白中的191种。从一个包含所有不同类型球状蛋白和膜蛋白的1087种蛋白质的数据集中正确挑选出OMP时,我们获得了92%的总体准确率。此外,残基对信息将准确率从92%提高到了94%。这种区分OMP的准确率高于文献中其他方法,可用于从基因组序列中解析OMP。