Suppr超能文献

Bio++库和程序套件中序列进化的非齐次模型。

Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs.

作者信息

Dutheil Julien, Boussau Bastien

机构信息

BiRC - Bioinformatics Research Center, University of Aarhus, C, F, Møllers Alle, DK-8000 Arhus C, Denmark.

出版信息

BMC Evol Biol. 2008 Sep 22;8:255. doi: 10.1186/1471-2148-8-255.

Abstract

BACKGROUND

Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed to estimate the null-distribution of complex statistics, an approach referred to as parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction programs. It has often been observed that homologous sequences can vary widely in their nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly among lineages, and may therefore be most appropriately approached through non-homogeneous models. Several programs implementing such models have been developed, but they are limited in their possibilities: only a few particular models are available for likelihood optimization, and data sets cannot be easily generated using the resulting estimated parameters.

RESULTS

We hereby present a general implementation of non-homogeneous models of substitutions. It is available as dedicated classes in the Bio++ libraries and can hence be used in any C++ program. Two programs that use these classes are also presented. The first one, Bio++ Maximum Likelihood (BppML), estimates parameters of any non-homogeneous model and the second one, Bio++ Sequence Generator (BppSeqGen), simulates the evolution of sequences from these models. These programs allow the user to describe non-homogeneous models through a property file with a simple yet powerful syntax, without any programming required.

CONCLUSION

We show that the general implementation introduced here can accommodate virtually any type of non-homogeneous models of sequence evolution, including heterotachous ones, while being computer efficient. We furthermore illustrate the use of such general models for parametric bootstrapping, using tests of non-homogeneity applied to an already published ribosomal RNA data set.

摘要

背景

准确模拟序列替换过程对于正确估计进化参数至关重要,无论是系统发育关系、替换率还是祖先状态;模拟现实的数据集也很关键。这种模拟程序对于估计复杂统计量的零分布(一种称为参数自展的方法)是必需的,并且还用于测试系统发育重建程序的质量。人们经常观察到同源序列在核苷酸或氨基酸组成上可能有很大差异,这表明序列进化在不同谱系中发生了重要变化,因此可能最适合通过非齐次模型来处理。已经开发了几个实现此类模型的程序,但它们的可能性有限:只有少数特定模型可用于似然优化,并且使用所得估计参数不容易生成数据集。

结果

我们在此提出替换非齐次模型的通用实现。它作为Bio++库中的专用类提供,因此可用于任何C++程序。还展示了两个使用这些类的程序。第一个是Bio++最大似然法(BppML),用于估计任何非齐次模型的参数,第二个是Bio++序列生成器(BppSeqGen),用于模拟这些模型的序列进化。这些程序允许用户通过具有简单而强大语法的属性文件来描述非齐次模型,无需任何编程。

结论

我们表明,这里介绍的通用实现几乎可以容纳任何类型的序列进化非齐次模型,包括异速进化模型,同时具有计算机效率。我们还通过对已发表的核糖体RNA数据集应用非齐次性测试来说明这种通用模型在参数自展中的应用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验