Suppr超能文献

同源蛋白质的一种构象空间,其保留互信息并允许基于成对Z分数概率进行系统发育推断。

A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.

作者信息

Bastien Olivier, Ortet Philippe, Roy Sylvaine, Maréchal Eric

机构信息

UMR 5019 CNRS-CEA-INRA-Université Joseph Fourier, Laboratoire de Physiologie Cellulaire Végétale, Département Réponse et Dynamique Cellulaire, CEA Grenoble, 17 rue des Martyrs, F-38054, Grenoble cedex 09, France.

出版信息

BMC Bioinformatics. 2005 Mar 10;6:49. doi: 10.1186/1471-2105-6-49.

Abstract

BACKGROUND

Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction.

RESULTS

We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny.

CONCLUSION

The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.

摘要

背景

重建分子系统发育的常用方法基于多序列比对,其中数据的添加或删除可能会改变最终的树形拓扑结构。我们一直在寻找一种同源蛋白质的表示方法,这种方法既能保留成对序列比对的信息,尊重Z分数的概率特性(应用于成对比较的蒙特卡罗方法),又能成为一种一致且稳定的系统发育重建新方法的基础。

结果

我们利用粒子物理学中的概念(构型空间)构建了蛋白质序列的空间表示,并遵循从信息论中基于成对比对得分特性推导出来的一组约束条件。所得到的同源蛋白质构型空间(CSHP)允许对真实序列和重排序列进行表示,进而能够表达Z分数概率的郁金香定理。基于CSHP,我们提出了一种使用Z分数的系统发育重建方法。推导得到的树,称为郁金香树,与基于多序列比对的树是一致的。此外,郁金香树重建方法为一些先前报道的不一致结果提供了解决方案,例如顶复门烯醇酶系统发育。

结论

CSHP是一个统一的模型,它以物理模型保存能量的方式保存蛋白质之间的互信息。其应用包括重建进化上一致且稳健的树,这些树的拓扑结构基于一种空间表示,在序列添加或删除后不会重新排序。CSHP及其指定的系统发育拓扑结构,为基于Z分数计算的大规模成对基因组比较提供了一种强大且易于更新的表示方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8236/555736/d55f75969c7c/1471-2105-6-49-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验