Suppr超能文献

氨基酸组成、蛋白质有限大小及稀疏统计对距离相关统计对势的影响。

Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials.

作者信息

Rykunov Dmitry, Fiser András

机构信息

Department of Biochemistry, Seaver Center for Bioinformatics, Albert Einstein College of Medicine, Bronx, New York 10461, USA.

出版信息

Proteins. 2007 May 15;67(3):559-68. doi: 10.1002/prot.21279.

Abstract

Statistical distance dependent pair potentials are frequently used in a variety of folding, threading, and modeling studies of proteins. The applicability of these types of potentials is tightly connected to the reliability of statistical observations. We explored the possible origin and extent of false positive signals in statistical potentials by analyzing their distance dependence in a variety of randomized protein-like models. While on average potentials derived from such models are expected to equal zero at any distance, we demonstrate that systematic and significant distortions exist. These distortions originate from the limited statistical counts in local environments of proteins and from the limited size of protein structures at large distances. We suggest that these systematic errors in statistical potentials are connected to the dependence of amino acid composition on protein size and to variation in protein sizes. Additionally, atom-based potentials are dominated by a false positive signal that is due to correlation among distances measured from atoms of one residue to atoms of another residue. The significance of residue-based pairwise potentials at various spatial pair separations was assessed in this study and it was found that as few as approximately 50% of potential values were statistically significant at distances below 4 A, and only at most approximately 80% of them were significant at larger pair separations. A new definition for reference state, free of the observed systematic errors, is suggested. It has been demonstrated to generate statistical potentials that compare favorably to other publicly available ones.

摘要

统计距离相关的对势在蛋白质的各种折叠、穿线和建模研究中经常被使用。这些类型的势的适用性与统计观测的可靠性紧密相关。我们通过分析各种随机化的类蛋白质模型中它们的距离依赖性,探索了统计势中假阳性信号的可能来源和程度。虽然平均而言,从这类模型导出的势在任何距离下都预期等于零,但我们证明存在系统性的显著偏差。这些偏差源于蛋白质局部环境中有限的统计计数以及远距离处蛋白质结构的有限大小。我们认为统计势中的这些系统误差与氨基酸组成对蛋白质大小的依赖性以及蛋白质大小的变化有关。此外,基于原子的势由一个假阳性信号主导,该信号是由于从一个残基的原子到另一个残基的原子所测量距离之间的相关性。在本研究中评估了各种空间对间距下基于残基的成对势的显著性,发现距离低于4埃时,仅有约50%的势值具有统计学显著性,而在更大的对间距时,最多只有约80%的势值具有显著性。提出了一种新的参考态定义,该定义没有观测到的系统误差。已证明它所生成的统计势与其他公开可用的势相比具有优势。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验