Bio++库和程序套件中序列进化的非齐次模型。

Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs.

作者信息

Dutheil Julien, Boussau Bastien

机构信息

BiRC - Bioinformatics Research Center, University of Aarhus, C, F, Møllers Alle, DK-8000 Arhus C, Denmark.

出版信息

BMC Evol Biol. 2008 Sep 22;8:255. doi: 10.1186/1471-2148-8-255.

DOI:10.1186/1471-2148-8-255

PMID:18808672

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2559849/

Abstract

BACKGROUND

Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed to estimate the null-distribution of complex statistics, an approach referred to as parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction programs. It has often been observed that homologous sequences can vary widely in their nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly among lineages, and may therefore be most appropriately approached through non-homogeneous models. Several programs implementing such models have been developed, but they are limited in their possibilities: only a few particular models are available for likelihood optimization, and data sets cannot be easily generated using the resulting estimated parameters.

RESULTS

We hereby present a general implementation of non-homogeneous models of substitutions. It is available as dedicated classes in the Bio++ libraries and can hence be used in any C++ program. Two programs that use these classes are also presented. The first one, Bio++ Maximum Likelihood (BppML), estimates parameters of any non-homogeneous model and the second one, Bio++ Sequence Generator (BppSeqGen), simulates the evolution of sequences from these models. These programs allow the user to describe non-homogeneous models through a property file with a simple yet powerful syntax, without any programming required.

CONCLUSION

We show that the general implementation introduced here can accommodate virtually any type of non-homogeneous models of sequence evolution, including heterotachous ones, while being computer efficient. We furthermore illustrate the use of such general models for parametric bootstrapping, using tests of non-homogeneity applied to an already published ribosomal RNA data set.

摘要

背景

准确模拟序列替换过程对于正确估计进化参数至关重要，无论是系统发育关系、替换率还是祖先状态；模拟现实的数据集也很关键。这种模拟程序对于估计复杂统计量的零分布（一种称为参数自展的方法）是必需的，并且还用于测试系统发育重建程序的质量。人们经常观察到同源序列在核苷酸或氨基酸组成上可能有很大差异，这表明序列进化在不同谱系中发生了重要变化，因此可能最适合通过非齐次模型来处理。已经开发了几个实现此类模型的程序，但它们的可能性有限：只有少数特定模型可用于似然优化，并且使用所得估计参数不容易生成数据集。

结果

我们在此提出替换非齐次模型的通用实现。它作为Bio++库中的专用类提供，因此可用于任何C++程序。还展示了两个使用这些类的程序。第一个是Bio++最大似然法（BppML），用于估计任何非齐次模型的参数，第二个是Bio++序列生成器（BppSeqGen），用于模拟这些模型的序列进化。这些程序允许用户通过具有简单而强大语法的属性文件来描述非齐次模型，无需任何编程。

结论

我们表明，这里介绍的通用实现几乎可以容纳任何类型的序列进化非齐次模型，包括异速进化模型，同时具有计算机效率。我们还通过对已发表的核糖体RNA数据集应用非齐次性测试来说明这种通用模型在参数自展中的应用。

相似文献

Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs.Bio++库和程序套件中序列进化的非齐次模型。

BMC Evol Biol. 2008 Sep 22;8:255. doi: 10.1186/1471-2148-8-255.

Empirical models for substitution in ribosomal RNA.核糖体RNA中替代的经验模型。

Mol Biol Evol. 2004 Mar;21(3):419-27. doi: 10.1093/molbev/msh029. Epub 2003 Dec 5.

Simulating DNA coding sequence evolution with EvolveAGene 3.使用EvolveAGene 3模拟DNA编码序列进化。

Mol Biol Evol. 2008 Apr;25(4):688-95. doi: 10.1093/molbev/msn008. Epub 2008 Jan 12.

Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics.Bio++：一套用于序列分析、系统发育学、分子进化和群体遗传学的C++库。

BMC Bioinformatics. 2006 Apr 4;7:188. doi: 10.1186/1471-2105-7-188.

Vestige: maximum likelihood phylogenetic footprinting.痕迹：最大似然系统发育足迹法。

BMC Bioinformatics. 2005 May 29;6:130. doi: 10.1186/1471-2105-6-130.

Scoredist: a simple and robust protein sequence distance estimator.Scoredist：一种简单且强大的蛋白质序列距离估计器。

BMC Bioinformatics. 2005 Apr 27;6:108. doi: 10.1186/1471-2105-6-108.

Efficient likelihood computations with nonreversible models of evolution.使用不可逆进化模型进行高效似然计算。

Syst Biol. 2006 Oct;55(5):756-68. doi: 10.1080/10635150600975218.

DNA assembly with gaps (Dawg): simulating sequence evolution.带缺口的DNA组装（Dawg）：模拟序列进化

Bioinformatics. 2005 Nov 1;21 Suppl 3:iii31-8. doi: 10.1093/bioinformatics/bti1200.

Evolutionary model selection with a genetic algorithm: a case study using stem RNA.基于遗传算法的进化模型选择：以茎RNA为例的案例研究

Mol Biol Evol. 2007 Jan;24(1):159-70. doi: 10.1093/molbev/msl144. Epub 2006 Oct 12.

indel-Seq-Gen: a new protein family simulator incorporating domains, motifs, and indels.插入缺失序列生成器（indel-Seq-Gen）：一种整合结构域、基序和插入缺失的新型蛋白质家族模拟器。

Mol Biol Evol. 2007 Mar;24(3):640-9. doi: 10.1093/molbev/msl195. Epub 2006 Dec 8.

引用本文的文献

Relaxed Purifying Selection is Associated with an Accumulation of Transposable Elements in Flies.松弛的净化选择与果蝇中转座元件的积累有关。

Mol Biol Evol. 2025 Jun 4;42(6). doi: 10.1093/molbev/msaf111.

The Characterization of Ancient Methanococcales Malate Dehydrogenases Reveals That Strong Thermal Stability Prevents Unfolding Under Intense γ-Irradiation.古代甲烷球菌苹果酸脱氢酶的特性表明，强大的热稳定性可防止其在强烈γ辐射下展开。

Mol Biol Evol. 2024 Dec 6;41(12). doi: 10.1093/molbev/msae231.

GTDrift: a resource for exploring the interplay between genetic drift, genomic and transcriptomic characteristics in eukaryotes.GTDrift：一个用于探索真核生物中基因漂变、基因组和转录组特征之间相互作用的资源。

NAR Genom Bioinform. 2024 Jun 12;6(2):lqae064. doi: 10.1093/nargab/lqae064. eCollection 2024 Jun.

Random genetic drift sets an upper limit on mRNA splicing accuracy in metazoans.随机遗传漂变对后生动物mRNA剪接准确性设置了上限。

Elife. 2024 Mar 12;13:RP93629. doi: 10.7554/eLife.93629.

The role of recombination dynamics in shaping signatures of direct and indirect selection across the flycatcher genome.重组动力学在塑造食雀鹰基因组中直接和间接选择特征中的作用。

Proc Biol Sci. 2024 Jan 31;291(2015):20232382. doi: 10.1098/rspb.2023.2382. Epub 2024 Jan 17.

Evidence that genetic drift not adaptation drives fast-Z and large-Z effects in Ficedula flycatchers.有证据表明，在姬鹟中，是遗传漂变而非适应性导致了快速Z效应和大Z效应。

Mol Ecol. 2024 Jan 9:e17262. doi: 10.1111/mec.17262.

Early Divergence and Gene Exchange Highways in the Evolutionary History of Mesoaciditogales.中酸菌目演化历史中的早期分歧和基因交换高速公路。

Genome Biol Evol. 2023 Sep 4;15(9). doi: 10.1093/gbe/evad156.

Base Composition, Codon Usage, and Patterns of Gene Sequence Evolution in Butterflies.蝴蝶的碱基组成、密码子使用和基因序列进化模式。

Genome Biol Evol. 2023 Aug 1;15(8). doi: 10.1093/gbe/evad150.

The origin and evolution of methanogenesis and are intertwined.甲烷生成的起源与演化相互交织。（你提供的原文中“and are intertwined”表述有误，推测完整应该是“methanogenesis and [相关内容] are intertwined” ，这里按照正确理解翻译了前半部分）

PNAS Nexus. 2023 Jan 31;2(2):pgad023. doi: 10.1093/pnasnexus/pgad023. eCollection 2023 Feb.

Evidence for existence of an apoptosis-inducing BH3-only protein, sayonara, in Drosophila.在果蝇中存在一种诱导细胞凋亡的 BH3 仅蛋白，sayonara。

EMBO J. 2023 Apr 17;42(8):e110454. doi: 10.15252/embj.2021110454. Epub 2023 Feb 2.

本文引用的文献

A site- and time-heterogeneous model of amino acid replacement.氨基酸替换的位点和时间异质性模型。

Mol Biol Evol. 2008 May;25(5):842-58. doi: 10.1093/molbev/msn018. Epub 2008 Jan 29.

A test for symmetry in contingency tables.列联表对称性检验。

J Am Stat Assoc. 1948 Dec;43(244):572-4. doi: 10.1080/01621459.1948.10483284.

Detecting and overcoming systematic errors in genome-scale phylogenies.检测并克服基因组尺度系统发育中的系统误差。

Syst Biol. 2007 Jun;56(3):389-99. doi: 10.1080/10635150701397643.

PAML 4: phylogenetic analysis by maximum likelihood.PAML 4：基于最大似然法的系统发育分析。

Mol Biol Evol. 2007 Aug;24(8):1586-91. doi: 10.1093/molbev/msm088. Epub 2007 May 4.

Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model.使用位点异质性模型抑制动物系统发育中的长枝吸引假象。

BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2148-7-S1-S4.

Protein and DNA sequence determinants of thermophilic adaptation.嗜热适应的蛋白质和DNA序列决定因素。

PLoS Comput Biol. 2007 Jan 12;3(1):e5. doi: 10.1371/journal.pcbi.0030005. Epub 2006 Nov 30.

Efficient likelihood computations with nonreversible models of evolution.使用不可逆进化模型进行高效似然计算。

Syst Biol. 2006 Oct;55(5):756-68. doi: 10.1080/10635150600975218.

Testing for covarion-like evolution in protein sequences.检测蛋白质序列中的类共变进化。

Mol Biol Evol. 2007 Jan;24(1):294-305. doi: 10.1093/molbev/msl155. Epub 2006 Oct 20.

A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution.一种用于对非平稳和非齐次序列进化进行建模的贝叶斯复合随机过程。

Mol Biol Evol. 2006 Nov;23(11):2058-71. doi: 10.1093/molbev/msl091. Epub 2006 Aug 24.

Assessing the accuracy of ancestral protein reconstruction methods.评估祖先蛋白质重建方法的准确性。

PLoS Comput Biol. 2006 Jun 23;2(6):e69. doi: 10.1371/journal.pcbi.0020069.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Bio++库和程序套件中序列进化的非齐次模型。

Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献