Huang Yuanyuan, Bonett Stephen, Kloczkowski Andrzej, Jernigan Robert, Wu Zhijun
Program on Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50014, USA.
J Struct Funct Genomics. 2011 Jul;12(2):119-36. doi: 10.1007/s10969-011-9104-4. Epub 2011 Mar 31.
The atomic-level structural properties of proteins, such as bond lengths, bond angles, and torsion angles, have been well studied and understood based on either chemistry knowledge or statistical analysis. Similar properties on the residue-level, such as the distances between two residues and the angles formed by short sequences of residues, can be equally important for structural analysis and modeling, but these have not been examined and documented on a similar scale. While these properties are difficult to measure experimentally, they can be statistically estimated in meaningful ways based on their distributions in known proteins structures. Residue-level structural properties including various types of residue distances and angles are estimated statistically. A software package is built to provide direct access to the statistical data for the properties including some important correlations not previously investigated. The distributions of residue distances and angles may vary with varying sequences, but in most cases, are concentrated in some high probability ranges, corresponding to their frequent occurrences in either α-helices or β-sheets. Strong correlations among neighboring residue angles, similar to those between neighboring torsion angles at the atomic-level, are revealed based on their statistical measures. Residue-level statistical potentials can be defined using the statistical distributions and correlations of the residue distances and angles. Ramachandran-like plots for strongly correlated residue angles are plotted and analyzed. Their applications to structural evaluation and refinement are demonstrated. With the increase in both number and quality of known protein structures, many structural properties can be derived from sets of protein structures by statistical analysis and data mining, and these can even be used as a supplement to the experimental data for structure determinations. Indeed, the statistical measures on various types of residue distances and angles provide more systematic and quantitative assessments on these properties, which can otherwise be estimated only individually and qualitatively. Their distributions and correlations in known protein structures show their importance for providing insights into how proteins may fold naturally to various residue-level structures.
基于化学知识或统计分析,蛋白质的原子级结构特性,如键长、键角和扭转角,已得到充分研究和理解。残基水平上的类似特性,如两个残基之间的距离以及由短残基序列形成的角度,对于结构分析和建模同样重要,但尚未在类似规模上进行研究和记录。虽然这些特性难以通过实验测量,但可以根据它们在已知蛋白质结构中的分布,以有意义的方式进行统计估计。对包括各种类型残基距离和角度在内的残基水平结构特性进行统计估计。构建了一个软件包,以直接访问这些特性的统计数据,包括一些以前未研究过的重要相关性。残基距离和角度的分布可能随序列变化而变化,但在大多数情况下,集中在一些高概率范围内,这与它们在α螺旋或β折叠中频繁出现相对应。基于统计量揭示了相邻残基角度之间的强相关性,类似于原子水平上相邻扭转角之间的相关性。可以使用残基距离和角度的统计分布及相关性来定义残基水平的统计势。绘制并分析了强相关残基角度的类似拉氏图。展示了它们在结构评估和优化中的应用。随着已知蛋白质结构数量和质量的增加,许多结构特性可以通过统计分析和数据挖掘从蛋白质结构集中推导出来,这些甚至可以用作结构测定实验数据的补充。事实上,对各种类型残基距离和角度的统计量为这些特性提供了更系统和定量的评估,否则这些特性只能单独和定性地估计。它们在已知蛋白质结构中的分布和相关性表明,它们对于深入了解蛋白质如何自然折叠成各种残基水平结构具有重要意义。