从低分辨率蛋白质结构高效预测核酸结合功能。

Efficient prediction of nucleic acid binding function from low-resolution protein structures.

作者信息

Szilágyi András, Skolnick Jeffrey

机构信息

Center of Excellence in Bioinformatics, University at Buffalo, State University of New York, 901 Washington St, Buffalo, NY 14203, USA.

出版信息

J Mol Biol. 2006 May 5;358(3):922-33. doi: 10.1016/j.jmb.2006.02.053. Epub 2006 Mar 10.

DOI:10.1016/j.jmb.2006.02.053

PMID:16551468

Abstract

Structural genomics projects as well as ab initio protein structure prediction methods provide structures of proteins with no sequence or fold similarity to proteins with known functions. These are often low-resolution structures that may only include the positions of C alpha atoms. We present a fast and efficient method to predict DNA-binding proteins from just the amino acid sequences and low-resolution, C alpha-only protein models. The method uses the relative proportions of certain amino acids in the protein sequence, the asymmetry of the spatial distribution of certain other amino acids as well as the dipole moment of the molecule. These quantities are used in a linear formula, with coefficients derived from logistic regression performed on a training set, and DNA-binding is predicted based on whether the result is above a certain threshold. We show that the method is insensitive to errors in the atomic coordinates and provides correct predictions even on inaccurate protein models. We demonstrate that the method is capable of predicting proteins with novel binding site motifs and structures solved in an unbound state. The accuracy of our method is close to another, published method that uses all-atom structures, time-consuming calculations and information on conserved residues.

摘要

结构基因组学项目以及从头算蛋白质结构预测方法提供了与已知功能蛋白质没有序列或折叠相似性的蛋白质结构。这些通常是低分辨率结构，可能只包括Cα原子的位置。我们提出了一种快速有效的方法，仅从氨基酸序列和低分辨率、仅含Cα的蛋白质模型预测DNA结合蛋白。该方法使用蛋白质序列中某些氨基酸的相对比例、其他某些氨基酸空间分布的不对称性以及分子的偶极矩。这些量用于一个线性公式中，系数来自对训练集进行的逻辑回归，并且根据结果是否高于某个阈值来预测DNA结合。我们表明该方法对原子坐标中的误差不敏感，甚至对不准确的蛋白质模型也能提供正确的预测。我们证明该方法能够预测具有新结合位点基序和以未结合状态解析的结构的蛋白质。我们方法的准确性接近另一种已发表的方法，该方法使用全原子结构、耗时的计算以及保守残基的信息。