Bastien Olivier, Ortet Philippe, Roy Sylvaine, Maréchal Eric
UMR 5019 CNRS-CEA-INRA-Université Joseph Fourier, Laboratoire de Physiologie Cellulaire Végétale, Département Réponse et Dynamique Cellulaire, CEA Grenoble, 17 rue des Martyrs, F-38054, Grenoble cedex 09, France.
BMC Bioinformatics. 2005 Mar 10;6:49. doi: 10.1186/1471-2105-6-49.
Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction.
We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny.
The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.
重建分子系统发育的常用方法基于多序列比对,其中数据的添加或删除可能会改变最终的树形拓扑结构。我们一直在寻找一种同源蛋白质的表示方法,这种方法既能保留成对序列比对的信息,尊重Z分数的概率特性(应用于成对比较的蒙特卡罗方法),又能成为一种一致且稳定的系统发育重建新方法的基础。
我们利用粒子物理学中的概念(构型空间)构建了蛋白质序列的空间表示,并遵循从信息论中基于成对比对得分特性推导出来的一组约束条件。所得到的同源蛋白质构型空间(CSHP)允许对真实序列和重排序列进行表示,进而能够表达Z分数概率的郁金香定理。基于CSHP,我们提出了一种使用Z分数的系统发育重建方法。推导得到的树,称为郁金香树,与基于多序列比对的树是一致的。此外,郁金香树重建方法为一些先前报道的不一致结果提供了解决方案,例如顶复门烯醇酶系统发育。
CSHP是一个统一的模型,它以物理模型保存能量的方式保存蛋白质之间的互信息。其应用包括重建进化上一致且稳健的树,这些树的拓扑结构基于一种空间表示,在序列添加或删除后不会重新排序。CSHP及其指定的系统发育拓扑结构,为基于Z分数计算的大规模成对基因组比较提供了一种强大且易于更新的表示方法。