Suppr超能文献

从多个结构比对的QR分解得出的进化概况提供了信息经济性。

Evolutionary profiles derived from the QR factorization of multiple structural alignments gives an economy of information.

作者信息

O'Donoghue Patrick, Luthey-Schulten Zaida

机构信息

Department of Chemistry, University of Illinois at Urbana-Champaign, 600 S. Mathews, Urbana, IL 61801, USA.

出版信息

J Mol Biol. 2005 Feb 25;346(3):875-94. doi: 10.1016/j.jmb.2004.11.053. Epub 2005 Jan 22.

Abstract

We present a new algorithm, based on the multidimensional QR factorization, to remove redundancy from a multiple structural alignment by choosing representative protein structures that best preserve the phylogenetic tree topology of the homologous group. The classical QR factorization with pivoting, developed as a fast numerical solution to eigenvalue and linear least-squares problems of the form Ax=b, was designed to re-order the columns of A by increasing linear dependence. Removing the most linear dependent columns from A leads to the formation of a minimal basis set which well spans the phase space of the problem at hand. By recasting the problem of redundancy in multiple structural alignments into this framework, in which the matrix A now describes the multiple alignment, we adapted the QR factorization to produce a minimal basis set of protein structures which best spans the evolutionary (phase) space. The non-redundant and representative profiles obtained from this procedure, termed evolutionary profiles, are shown in initial results to outperform well-tested profiles in homology detection searches over a large sequence database. A measure of structural similarity between homologous proteins, Q(H), is presented. By properly accounting for the effect and presence of gaps, a phylogenetic tree computed using this metric is shown to be congruent with the maximum-likelihood sequence-based phylogeny. The results indicate that evolutionary information is indeed recoverable from the comparative analysis of protein structure alone. Applications of the QR ordering and this structural similarity metric to analyze the evolution of structure among key, universally distributed proteins involved in translation, and to the selection of representatives from an ensemble of NMR structures are also discussed.

摘要

我们提出了一种基于多维QR分解的新算法,通过选择能最佳保留同源组系统发育树拓扑结构的代表性蛋白质结构,来消除多重结构比对中的冗余。带主元的经典QR分解最初是作为求解Ax = b形式的特征值和线性最小二乘问题的快速数值解法而开发的,旨在通过增加线性相关性来重新排列A的列。从A中去除线性相关性最强的列会导致形成一个最小基集,该基集能很好地跨越当前问题的相空间。通过将多重结构比对中的冗余问题重塑为这个框架,其中矩阵A现在描述多重比对,我们对QR分解进行了调整,以生成一个能最佳跨越进化(相)空间的蛋白质结构最小基集。从这个过程中获得的非冗余且具有代表性的轮廓,称为进化轮廓,初步结果表明,在大型序列数据库的同源性检测搜索中,它优于经过充分测试的轮廓。我们提出了一种同源蛋白质之间结构相似性的度量Q(H)。通过适当考虑空位的影响和存在情况,使用该度量计算的系统发育树与基于最大似然法的序列系统发育树一致。结果表明,进化信息确实可以仅从蛋白质结构的比较分析中恢复。我们还讨论了QR排序和这种结构相似性度量在分析参与翻译的关键、普遍分布的蛋白质之间的结构进化以及从NMR结构集合中选择代表方面的应用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验