Suppr超能文献

对种内进化率和时间尺度的系统发育估计中,模型站点间变异率异质性的影响。

The impact of modelling rate heterogeneity among sites on phylogenetic estimates of intraspecific evolutionary rates and timescales.

机构信息

School of Biological Sciences, University of Sydney, Sydney, New South Wales, Australia.

出版信息

PLoS One. 2014 May 5;9(5):e95722. doi: 10.1371/journal.pone.0095722. eCollection 2014.

Abstract

Phylogenetic analyses of DNA sequence data can provide estimates of evolutionary rates and timescales. Nearly all phylogenetic methods rely on accurate models of nucleotide substitution. A key feature of molecular evolution is the heterogeneity of substitution rates among sites, which is often modelled using a discrete gamma distribution. A widely used derivative of this is the gamma-invariable mixture model, which assumes that a proportion of sites in the sequence are completely resistant to change, while substitution rates at the remaining sites are gamma-distributed. For data sampled at the intraspecific level, however, biological assumptions involved in the invariable-sites model are commonly violated. We examined the use of these models in analyses of five intraspecific data sets. We show that using 6-10 rate categories for the discrete gamma distribution of rates among sites is sufficient to provide a good approximation of the marginal likelihood. Increasing the number of gamma rate categories did not have a substantial effect on estimates of the substitution rate or coalescence time, unless rates varied strongly among sites in a non-gamma-distributed manner. The assumption of a proportion of invariable sites provided a better approximation of the asymptotic marginal likelihood when the number of gamma categories was small, but had minimal impact on estimates of rates and coalescence times. However, the estimated proportion of invariable sites was highly susceptible to changes in the number of gamma rate categories. The concurrent use of gamma and invariable-site models for intraspecific data is not biologically meaningful and has been challenged on statistical grounds; here we have found that the assumption of a proportion of invariable sites has no obvious impact on Bayesian estimates of rates and timescales from intraspecific data.

摘要

基于 DNA 序列数据的系统发育分析可以提供进化率和时间尺度的估计。几乎所有的系统发育方法都依赖于核苷酸替代的准确模型。分子进化的一个关键特征是位点替代率的异质性,这通常使用离散的伽马分布来建模。这种分布的一个广泛应用的衍生模型是伽马不变混合模型,该模型假设序列中存在一部分位点完全不能发生变化,而其余位点的替代率呈伽马分布。然而,对于在种内水平采样的数据,不变位点模型所涉及的生物学假设通常是违反的。我们检验了这些模型在五个种内数据集分析中的应用。结果表明,对于位点之间的离散伽马分布速率,使用 6-10 个速率类别足以很好地逼近边际似然。除非速率以非伽马分布的方式在不同位点之间有强烈变化,否则增加伽马速率类别的数量对替代率或聚和时间的估计不会有实质性的影响。假设存在一部分不变位点,当伽马类别数量较小时,能更好地逼近渐近边际似然,但对替代率和聚和时间的估计影响很小。然而,不变位点的估计比例非常容易受到伽马速率类别的数量变化的影响。对于种内数据同时使用伽马和不变位点模型在生物学上没有意义,并在统计上受到挑战;在这里,我们发现,假设存在一部分不变位点,对种内数据的贝叶斯估计替代率和时间尺度没有明显的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ee6/4010409/4dae58ebd836/pone.0095722.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验