Centre for Anthropobiology & Genomics of Toulouse, Université Paul Sabatier, Toulouse, France.
Departament de Genètica, Microbiologia I Estadística, and Institut de Recerca de la Biodiversitat, Universitat de Barcelona, Barcelona, Spain.
Methods Mol Biol. 2022;2569:213-232. doi: 10.1007/978-1-0716-2691-7_10.
Estimating gene gain and losses is paramount to understand the molecular mechanisms underlying adaptive evolution. Despite the advent of high-throughput sequencing, such analyses have been so far hampered by the poor contiguity of genome assemblies. The increasing affordability of long-read sequencing technologies will however revolutionize our capacity to identify gene gains and losses at an unprecedented resolution, even in non-model organisms. To thoroughly exploit all such multigene family variation, the software BadiRate implements a collection of birth-and-death stochastic models, aiming at estimating by maximum likelihood the gene turnover rates along the internal and external branches of a given phylogenetic species tree. Its statistical framework also provides versatility for inferring the gene family content at the internal phylogenetic nodes (and to estimate the minimum number of gene gains and losses in each branch), for statistically contrasting competing hypotheses (e.g., accelerations of the gene turnover rates at pre-defined clades), and for pinpointing gene family expansions or contractions likely driven by natural selection. In this chapter we review the theoretical models implemented in BadiRate and illustrate their applicability by analyzing a hypothetical data set of 14 microbial species.
估计基因的获得和丢失对于理解适应性进化的分子机制至关重要。尽管高通量测序技术已经问世,但到目前为止,这些分析受到基因组组装连续性差的阻碍。然而,长读长测序技术的成本不断降低,将彻底改变我们以空前的分辨率识别基因获得和丢失的能力,即使在非模式生物中也是如此。为了充分利用所有这些多基因家族的变异,BadiRate 软件实现了一系列的出生和死亡随机模型,旨在通过最大似然法估计给定系统发育种树内部和外部分支上的基因周转率。其统计框架还为推断内部系统发育节点上的基因家族内容(并估计每个分支中的最小基因获得和丢失数量)提供了多功能性,用于统计对比竞争假设(例如,在预定义进化枝处基因周转率的加速),并确定可能由自然选择驱动的基因家族扩张或收缩。在本章中,我们将回顾 BadiRate 中实现的理论模型,并通过分析 14 个微生物物种的假设数据集来说明它们的适用性。