Suppr超能文献

阿达马共轭以及对位点间速率不等的序列进化进行建模。

Hadamard conjugations and modeling sequence evolution with unequal rates across sites.

作者信息

Waddell P J, Penny D, Moore T

机构信息

Department of Plant Biology and Biotechnology, School of Biological Sciences, Massey University, Palmerston North, New Zealand.

出版信息

Mol Phylogenet Evol. 1997 Aug;8(1):33-50. doi: 10.1006/mpev.1997.0405.

Abstract

This paper considers the many different distributions that may approximate the distribution of site rates in DNA sequences and shows how the Hadamard conjugation may be modified to take these into account. This is done for both 2-state and 4-state data. Distributions which give simple closed forms include the gamma (gamma) distribution, the inverse Gaussian distribution (which is similar to the lognormal), and a mixture of either of these with a proportion of sites which cannot change (invariant sites). It is seen that the tail of a distribution can have major effects upon the coefficient of variation of site rates. Because the Hadamard conjugation can be used to either correct data or predict the data given the model (i.e., the likelihood of site patterns), light is shed on properties of maximum likelihood tree selection with unequal site rates. Analysis of rRNA shows how unequal rates across sites can change the optimal tree. Maximum likelihood analysis also shows that distinct distributions fit each data set, with the gamma often not being the best. Analyzing both these data and a long stretch of primate mtDNA reveals evidence of many "hidden" multiple substitutions, while signals not corresponding to the preferred biological tree generally decrease an unequal rates are allowed for. Last, we discuss the expected behavior of sequences evolving by models where stabilizing selection alone explains unequal site rates. Such models do not explain "synapomorphies" or informative changes in ancient molecules, because while stabilizing selection can vastly decrease change at a site, it will also vastly accelerate back-substitution (leaving only a covarion model to explain old synapomorphies). When and why models allowing a continuous distribution of site rates (e.g., gamma) will approximate covarion evolution requires further study.

摘要

本文考虑了许多可能近似DNA序列中位点速率分布的不同分布,并展示了如何修改哈达玛共轭以将这些因素考虑在内。这是针对二态和四态数据进行的。能给出简单封闭形式的分布包括伽马(gamma)分布、逆高斯分布(类似于对数正态分布),以及这些分布之一与一定比例不变位点(即不能变化的位点)的混合分布。可以看出,分布的尾部对位点速率的变异系数有重大影响。由于哈达玛共轭可用于校正数据或在给定模型(即位点模式的似然性)的情况下预测数据,这为具有不等位点速率的最大似然树选择的性质提供了启示。对rRNA的分析表明位点间的不等速率如何改变最优树。最大似然分析还表明,不同的分布适合每个数据集,伽马分布往往并非最佳。对这些数据以及一段较长的灵长类线粒体DNA的分析揭示了许多“隐藏”的多次替换的证据,而与首选生物学树不对应的信号在允许不等速率时通常会减少。最后,我们讨论了仅由稳定选择解释不等位点速率的模型中序列进化的预期行为。此类模型无法解释古代分子中的“共近裔性状”或信息性变化,因为虽然稳定选择可大幅减少位点处的变化,但它也会大幅加速反向替换(仅留下协变模型来解释古老的共近裔性状)。允许位点速率连续分布(如伽马分布)的模型何时以及为何会近似协变进化需要进一步研究。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验