使用临界随机网络从氨基酸序列预测天然蛋白质结构的二级结构、接触数和残基水平的接触序。

Predicting secondary structures, contact numbers, and residue-wise contact orders of native protein structures from amino acid sequences using critical random networks.

作者信息

Kinjo Akira R, Nishikawa Ken

机构信息

Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Mishima 411-8540, Japan; Department of Genetics, The Graduate University for Advanced Studies (SOKENDAI), Mishima 411-8540, Japan.

出版信息

Biophysics (Nagoya-shi). 2005 Nov 22;1:67-74. doi: 10.2142/biophysics.1.67. eCollection 2005.

Abstract

Predictions of one-dimensional protein structures such as secondary structures and contact numbers are useful for predicting three-dimensional structure and important for understanding the sequence-structure relationship. Here we present a new machine-learning method, critical random networks (CRNs), for predicting one-dimensional structures, and apply it, with position-specific scoring matrices, to the prediction of secondary structures (SS), contact numbers (CN), and residue-wise contact orders (RWCO). The present method achieves, on average, accuracy of 77.8% for SS, and correlation coefficients of 0.726 and 0.601 for CN and RWCO, respectively. The accuracy of the SS prediction is comparable to that obtained with other state-of-the-art methods, and accuracy of the CN prediction is a significant improvement over that with previous methods. We give a detailed formulation of the critical random networks-based prediction scheme, and examine the context-dependence of prediction accuracies. In order to study the nonlinear and multi-body effects, we compare the CRNs-based method with a purely linear method based on position-specific scoring matrices. Although not superior to the CRNs-based method, the surprisingly good accuracy achieved by the linear method highlights the difficulty in extracting structural features of higher order from an amino acid sequence beyond the information provided by the position-specific scoring matrices.

摘要

诸如二级结构和接触数等一维蛋白质结构的预测对于三维结构预测很有用,并且对于理解序列 - 结构关系很重要。在此,我们提出一种用于预测一维结构的新机器学习方法——临界随机网络(CRNs),并将其与位置特异性评分矩阵一起应用于二级结构(SS)、接触数(CN)和残基级接触序(RWCO)的预测。本方法对于SS平均达到77.8%的准确率,对于CN和RWCO的相关系数分别为0.726和0.601。SS预测的准确率与其他现有最先进方法所获得的准确率相当,并且CN预测的准确率相较于先前方法有显著提高。我们给出了基于临界随机网络的预测方案的详细公式,并研究了预测准确率的上下文依赖性。为了研究非线性和多体效应,我们将基于CRNs的方法与基于位置特异性评分矩阵的纯线性方法进行比较。尽管线性方法并不优于基于CRNs的方法,但其取得的惊人高准确率凸显了从氨基酸序列中提取超出位置特异性评分矩阵所提供信息的高阶结构特征的困难。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4639/5036631/7ea515f129ed/1_67f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索