Kumar Arun V, Ali Rehana F M, Cao Yu, Krishnan V V
Department of Computer Science, California State University, Fresno, CA 93740, United States.
Department of Chemistry, California State University, Fresno, CA 93740, United States; Department of Pathology and Laboratory Medicine, School of Medicine, University of California, Davis, CA 95616, United States.
Biochim Biophys Acta. 2015 Oct;1854(10 Pt A):1545-52. doi: 10.1016/j.bbapap.2015.02.016. Epub 2015 Mar 7.
The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for protein structural information that is directly related to function. Nuclear magnetic resonance (NMR) provides powerful means to determine three-dimensional structures of proteins in the solution state. However, translation of the NMR spectral parameters to even low-resolution structural information such as protein class requires multiple time consuming steps. In this paper, we present an unorthodox method to predict the protein structural class directly by using the residue's averaged chemical shifts (ACS) based on machine learning algorithms. Experimental chemical shift information from 1491 proteins obtained from Biological Magnetic Resonance Bank (BMRB) and their respective protein structural classes derived from structural classification of proteins (SCOP) were used to construct a data set with 119 attributes and 5 different classes. Twenty four different classification schemes were evaluated using several performance measures. Overall the residue based ACS values can predict the protein structural classes with 80% accuracy measured by Matthew correlation coefficient. Specifically protein classes defined by mixed αβ or small proteins are classified with >90% correlation. Our results indicate that this NMR-based method can be utilized as a low-resolution tool for protein structural class identification without any prior chemical shift assignments.
来自基因组测序项目的蛋白质序列数量增长速度超过了我们对这些蛋白质功能的了解。随着已通过实验表征和未表征蛋白质之间的差距不断扩大,有必要开发新的计算方法和工具来获取与功能直接相关的蛋白质结构信息。核磁共振(NMR)为确定溶液状态下蛋白质的三维结构提供了有力手段。然而,将NMR光谱参数转化为即使是低分辨率的结构信息(如蛋白质类别)也需要多个耗时的步骤。在本文中,我们提出了一种非传统方法,即基于机器学习算法,直接利用残基的平均化学位移(ACS)来预测蛋白质结构类别。我们使用从生物磁共振数据库(BMRB)获得的1491种蛋白质的实验化学位移信息以及它们各自从蛋白质结构分类(SCOP)中得出的蛋白质结构类别,构建了一个具有119个属性和5个不同类别的数据集。使用几种性能指标评估了24种不同的分类方案。总体而言,如果通过马修相关系数来衡量,基于残基的ACS值能够以80%的准确率预测蛋白质结构类别。具体来说,由混合αβ结构或小蛋白质定义的蛋白质类别分类的相关性>90%。我们的结果表明,这种基于NMR的方法可作为一种低分辨率工具,用于在没有任何先验化学位移分配的情况下识别蛋白质结构类别。