Suppr超能文献

在重复-缺失和重复-缺失-转移模型中计算和采样基因家族进化历史。

Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models.

机构信息

Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada.

LaBRI, Université de Bordeaux, Talence, France.

出版信息

J Math Biol. 2020 Apr;80(5):1353-1388. doi: 10.1007/s00285-019-01465-x. Epub 2020 Feb 15.

Abstract

Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including-but not limited to-speciation ([Formula: see text]), gene duplication ([Formula: see text]), gene loss ([Formula: see text]), and horizontal gene transfer ([Formula: see text]). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text]-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text]-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text]-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.

摘要

给定一组物种,其进化由物种树表示,基因家族是一组从单个祖先基因进化而来的基因。基因家族通过各种机制沿着物种树的分支进化,包括但不限于物种形成([Formula: see text])、基因复制([Formula: see text])、基因丢失([Formula: see text])和水平基因转移([Formula: see text])。根据物种树重建代表基因家族进化的基因树是系统发育基因组学中的一个重要问题。然而,与仅考虑物种形成和不完全谱系分选事件的多物种合并进化模型不同,对于考虑基因复制、基因丢失和水平基因转移的基因家族历史的搜索空间([Formula: see text]-模型)知之甚少。在这项工作中,我们引入了进化历史的概念,将其定义为描述基因家族进化的二叉有序根树,受[Formula: see text]-模型中物种树的约束。我们提供了形式语法,描述了与给定物种树兼容的所有进化历史的集合,无论它是否是有等级的还是无等级的。这些语法允许我们使用解析组合学或动态规划,有效地计算给定大小的历史记录数量,并且还可以在均匀分布下生成给定大小的随机历史记录。我们应用这些工具来获得两个物种树的基因家族历史数量的精确渐近值,即有根毛毛虫和完全二叉树,以及随机物种树大小为 25 的历史数量的指数增长因子的范围的估计值。我们的结果表明,包括水平基因转移会导致进化历史数量的急剧增加。我们还表明,在有等级的物种树中,[Formula: see text]-模型中的进化历史数量几乎与物种树拓扑无关。这些结果为开发用于预测协调的集成方法奠定了坚实的基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a8f/7052048/47e0fd14215f/285_2019_1465_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验