Chatterjee S, Ghosh S, Vishveshwara S
Molecular Biophysics Unit, Indian Institute of Science, Bangalore - 560012, India.
Mol Biosyst. 2013 Jul;9(7):1774-88. doi: 10.1039/c3mb70157c. Epub 2013 May 22.
Protein structure space is believed to consist of a finite set of discrete folds, unlike the protein sequence space which is astronomically large, indicating that proteins from the available sequence space are likely to adopt one of the many folds already observed. In spite of extensive sequence-structure correlation data, protein structure prediction still remains an open question with researchers having tried different approaches (experimental as well as computational). One of the challenges of protein structure prediction is to identify the native protein structures from a milieu of decoys/models. In this work, a rigorous investigation of Protein Structure Networks (PSNs) has been performed to detect native structures from decoys/models. Ninety four parameters obtained from network studies have been optimally combined with Support Vector Machines (SVM) to derive a general metric to distinguish decoys/models from the native protein structures with an accuracy of 94.11%. Recently, for the first time in the literature we had shown that PSN has the capability to distinguish native proteins from decoys. A major difference between the present work and the previous study is to explore the transition profiles at different strengths of non-covalent interactions and SVM has indeed identified this as an important parameter. Additionally, the SVM trained algorithm is also applied to the recent CASP10 predicted models. The novelty of the network approach is that it is based on general network properties of native protein structures and that a given model can be assessed independent of any reference structure. Thus, the approach presented in this paper can be valuable in validating the predicted structures. A web-server has been developed for this purpose and is freely available at .
蛋白质结构空间被认为由一组有限的离散折叠组成,这与天文数字般庞大的蛋白质序列空间不同,这表明来自可用序列空间的蛋白质可能会采用已观察到的众多折叠之一。尽管有大量的序列 - 结构相关性数据,但蛋白质结构预测仍然是一个悬而未决的问题,研究人员尝试了不同的方法(实验方法和计算方法)。蛋白质结构预测的挑战之一是从大量的诱饵/模型中识别天然蛋白质结构。在这项工作中,对蛋白质结构网络(PSN)进行了严格的研究,以从诱饵/模型中检测天然结构。从网络研究中获得的94个参数已与支持向量机(SVM)进行了优化组合,以得出一种通用度量,用于区分诱饵/模型与天然蛋白质结构,准确率达到94.11%。最近,我们在文献中首次表明PSN有能力区分天然蛋白质和诱饵。本工作与先前研究的一个主要区别在于探索不同强度非共价相互作用下的转变概况,而SVM确实将此识别为一个重要参数。此外,经过SVM训练的算法也应用于最近的CASP10预测模型。网络方法的新颖之处在于它基于天然蛋白质结构的一般网络特性,并且给定的模型可以独立于任何参考结构进行评估。因此,本文提出的方法在验证预测结构方面可能很有价值。为此已开发了一个网络服务器,可在……免费获取。