Suppr超能文献

解析蛋白质序列比对中残基的直接和间接协同进化。

Disentangling direct from indirect co-evolution of residues in protein alignments.

机构信息

Biozentrum, University of Basel, and Swiss Institute of Bioinformatics, Basel, Switzerland.

出版信息

PLoS Comput Biol. 2010 Jan;6(1):e1000633. doi: 10.1371/journal.pcbi.1000633. Epub 2010 Jan 1.

Abstract

Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., statistical dependency, between columns in multiple alignments of protein domain sequences remains one of the most promising avenues for predicting residues that are contacting in the structure. A key impediment to this approach is that strong statistical dependencies are also observed for many residue pairs that are distal in the structure. Using a comprehensive analysis of protein domains with available three-dimensional structures we show that co-evolving contacts very commonly form chains that percolate through the protein structure, inducing indirect statistical dependencies between many distal pairs of residues. We characterize the distributions of length and spatial distance traveled by these co-evolving contact chains and show that they explain a large fraction of observed statistical dependencies between structurally distal pairs. We adapt a recently developed Bayesian network model into a rigorous procedure for disentangling direct from indirect statistical dependencies, and we demonstrate that this method not only successfully accomplishes this task, but also allows contacts with weak statistical dependency to be detected. To illustrate how additional information can be incorporated into our method, we incorporate a phylogenetic correction, and we develop an informative prior that takes into account that the probability for a pair of residues to contact depends strongly on their primary-sequence distance and the amount of conservation that the corresponding columns in the multiple alignment exhibit. We show that our model including these extensions dramatically improves the accuracy of contact prediction from multiple sequence alignments.

摘要

从一级序列预测蛋白质结构是计算生物学的终极挑战之一。鉴于可用序列数据的大量增加,对共进化的分析(即蛋白质结构域序列多重比对中列之间的统计相关性)仍然是预测结构中接触残基的最有前途的方法之一。该方法的一个主要障碍是,在结构中距离较远的许多残基对之间也观察到很强的统计相关性。通过对具有可用三维结构的蛋白质结构域的综合分析,我们表明,共进化的接触非常常见地形成链,这些链在蛋白质结构中渗透,在许多结构上较远的残基对之间诱导间接的统计相关性。我们描述了这些共进化接触链的长度和空间距离分布,并表明它们解释了观察到的结构上较远的残基对之间的大部分观察到的统计相关性。我们将最近开发的贝叶斯网络模型改编为一种严格的方法,用于区分直接和间接的统计依赖性,我们证明这种方法不仅成功地完成了这项任务,而且还可以检测具有弱统计依赖性的接触。为了说明如何将更多信息纳入我们的方法,我们纳入了系统发生校正,并开发了一种信息先验,该先验考虑了一对残基接触的可能性强烈取决于它们的一级序列距离以及多重比对中相应列的保守程度。我们表明,我们的模型包括这些扩展,大大提高了从多重序列比对预测接触的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/012c/2793430/976d37acb814/pcbi.1000633.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验