从氨基酸序列预测天然蛋白质结构的绝对接触数。

Predicting absolute contact numbers of native protein structure from amino acid sequence.

作者信息

Kinjo Akira R, Horimoto Katsuhisa, Nishikawa Ken

机构信息

Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Mishima, Japan.

出版信息

Proteins. 2005 Jan 1;58(1):158-65. doi: 10.1002/prot.20300.

DOI:10.1002/prot.20300

PMID:15523668

Abstract

The contact number of an amino acid residue in a protein structure is defined by the number of C(beta) atoms around the C(beta) atom of the given residue, a quantity similar to, but different from, solvent accessible surface area. We present a method to predict the contact numbers of a protein from its amino acid sequence. The method is based on a simple linear regression scheme and predicts the absolute values of contact numbers. When single sequences are used for both parameter estimation and cross-validation, the present method predicts the contact numbers with a correlation coefficient of 0.555 on average. When multiple sequence alignments are used, the correlation increases to 0.627, which is a significant improvement over previous methods. In terms of discrete states prediction, the accuracies for 2-, 3-, and 10-state predictions are, respectively, 71.4%, 54.1%, and 18.9% with residue type-dependent unbiased thresholds, and 76.3%, 59.2%, and 21.8% with residue type-independent unbiased thresholds. The difference between accessible surface area and contact number from a prediction viewpoint and the application of contact number prediction to three-dimensional structure prediction are discussed.

摘要

蛋白质结构中氨基酸残基的接触数由给定残基的Cβ原子周围的Cβ原子数定义，这一数量与溶剂可及表面积相似但不同。我们提出了一种从氨基酸序列预测蛋白质接触数的方法。该方法基于简单的线性回归方案，并预测接触数的绝对值。当单序列用于参数估计和交叉验证时，本方法预测接触数的平均相关系数为0.555。当使用多序列比对时，相关性提高到0.627，这比以前的方法有显著改进。在离散状态预测方面，对于2态、3态和10态预测，使用依赖于残基类型的无偏阈值时准确率分别为71.4%、54.1%和18.9%，使用不依赖于残基类型的无偏阈值时准确率分别为76.3%、59.2%和21.8%。从预测角度讨论了可及表面积与接触数之间的差异以及接触数预测在三维结构预测中的应用。