Bioinformatics and Systems Biology Graduate Program, UC San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA.
Department of Electrical and Computer Engineering, UC San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA.
Syst Biol. 2018 May 1;67(3):475-489. doi: 10.1093/sysbio/syx088.
Models of tree evolution have mostly focused on capturing the cladogenesis processes behind speciation. Processes that derive the evolution of genomic elements, such as repeats, are not necessarily captured by these existing models. In this article, we design a model of tree evolution that we call the dual-birth model, and we show how it can be useful in studying the evolution of short Alu repeats found in the human genome in abundance. The dual-birth model extends the traditional birth-only model to have two rates of propagation, one for active nodes that propagate often, and another for inactive nodes, that with a lower rate, activate and start propagating. Adjusting the ratio of the rates controls the expected tree balance. We present several theoretical results under the dual-birth model, introduce parameter estimation techniques, and study the properties of the model in simulations. We then use the dual-birth model to estimate the number of active Alu elements and their rates of propagation and activation in the human genome based on a large phylogenetic tree that we build from close to one million Alu sequences.
树木进化模型主要集中在捕捉物种形成背后的分支发生过程。这些现有的模型不一定能捕捉到衍生基因组元素(如重复序列)的进化过程。在本文中,我们设计了一种称为双重诞生模型的树进化模型,并展示了它如何有助于研究在人类基因组中大量存在的短 Alu 重复序列的进化。双重诞生模型将传统的仅出生模型扩展为具有两种传播率,一种用于经常传播的活跃节点,另一种用于不活跃节点,后者以较低的速率激活并开始传播。调整速率的比值可以控制预期的树平衡。我们在双重诞生模型下提出了几个理论结果,介绍了参数估计技术,并在模拟中研究了模型的性质。然后,我们根据从近 100 万条 Alu 序列构建的大型系统发育树,使用双重诞生模型来估计人类基因组中活跃的 Alu 元件数量及其传播和激活率。