Department of Applied Mathematics, School of Natural Sciences, University of California, Merced CA 95348, United States.
Math Biosci. 2018 Aug;302:46-66. doi: 10.1016/j.mbs.2018.05.009. Epub 2018 May 19.
Transposable elements (TEs), segments of DNA capable of self-replication, are abundant in the genomes of most organisms and thus serve as a record of past mutational events. While some work suggests TEs may serve a regulatory function for the host, most empirical and theoretical studies have shown that TEs often have deleterious effects on a host. Because they are not essential, the host genome consists of both full-length (actively replicating) and partial length (inactive remnant) copies of TEs. We developed a novel mathematical formulation of TE dynamics by modeling the density of full and partial length copies resulting from mutations (insertions and deletions) and TE replication within the host genome. More specifically, we model the time-evolution of the complete TE length distribution (full and partial elements) in a genome using fragmentation equations in both a discrete and continuous framework under two models of TE replication. In the first case, we assume that full-length TEs replicate at a constant rate regardless of the number of full-length TEs present in the genome. While this assumption simplifies the underlying biological processes, it allows us to derive an explicit analytical form of the time-varying TE density, as well as the steady-state behavior, under both discrete and continuous formulations. Next, we take into account the potential deleterious effects of TEs by modeling TE replication with a logistic growth equation. Under this assumption, the number of actively replicating TEs in a genome is limited by a carrying capacity. While we are still able to derive to derive analytical forms for the time-varying TE density, for both discrete and continuous length formulations, these solutions are implicit. For all four proposed models, we prove existence and uniqueness of these solutions describing TE length distributions. We compare both models and note that the logistic and exponential models initially agree. Since most TEs have not reached carrying capacity, we use the explicit exponential solution to quantify rates of replication to mutations. We apply our model to present day annotated collections of TEs from the genomes of species of fruit-flies, birds, and primates to uncover quantitative relationships of TE dynamics. With the increasing availability of complete genomes, our method is likely to uncover relationships of biological drivers of genomic variation in many species.
转座元件(TEs)是能够自我复制的 DNA 片段,在大多数生物体的基因组中大量存在,因此它们是过去突变事件的记录。虽然一些研究表明 TEs 可能对宿主具有调节功能,但大多数实证和理论研究表明,TEs 通常对宿主产生有害影响。由于它们不是必需的,因此宿主基因组包含 TE 的全长(活跃复制)和部分长度(非活跃残余)副本。我们通过在宿主基因组内建模突变(插入和缺失)和 TE 复制引起的全长和部分长度副本的密度,提出了一种新的 TE 动力学数学公式。更具体地说,我们使用离散和连续框架中的片段化方程,在两种 TE 复制模型下,对基因组中完整 TE 长度分布(全长和部分元件)的时间演变进行建模。在第一种情况下,我们假设无论基因组中存在的全长 TE 数量如何,全长 TE 都以恒定的速率复制。虽然这一假设简化了潜在的生物学过程,但它使我们能够推导出时间变化的 TE 密度的显式解析形式,以及离散和连续公式下的稳态行为。接下来,我们通过使用逻辑增长方程来建模 TE 复制,考虑到 TE 的潜在有害影响。在这种假设下,基因组中活跃复制的 TE 数量受到承载能力的限制。虽然我们仍然能够推导出离散和连续长度公式下的时间变化的 TE 密度的解析形式,但这些解决方案是隐含的。对于提出的所有四个模型,我们证明了描述 TE 长度分布的这些解的存在性和唯一性。我们比较了这两种模型,并指出逻辑和指数模型最初是一致的。由于大多数 TE 尚未达到承载能力,我们使用显式指数解来量化复制到突变的速率。我们将我们的模型应用于来自果蝇、鸟类和灵长类动物基因组的现代注释 TE 集合,以揭示 TE 动力学的定量关系。随着完整基因组的可用性不断增加,我们的方法很可能会揭示许多物种中基因组变异的生物学驱动因素的关系。