Department of Biostatistics, UCLA School of Public Health, Los Angeles, CA 90095, USA.
Syst Biol. 2010 Jan;59(1):27-41. doi: 10.1093/sysbio/syp076. Epub 2009 Nov 9.
Evolutionary biologists have introduced numerous statistical approaches to explore nonvertical evolution, such as horizontal gene transfer, recombination, and genomic reassortment, through collections of Markov-dependent gene trees. These tree collections allow for inference of nonvertical evolution, but only indirectly, making findings difficult to interpret and models difficult to generalize. An alternative approach to explore nonvertical evolution relies on phylogenetic networks. These networks provide a framework to model nonvertical evolution but leave unanswered questions such as the statistical significance of specific nonvertical events. In this paper, we begin to correct the shortcomings of both approaches by introducing the "stochastic model for reassortment and transfer events" (SMARTIE) drawing upon ancestral recombination graphs (ARGs). ARGs are directed graphs that allow for formal probabilistic inference on vertical speciation events and nonvertical evolutionary events. We apply SMARTIE to phylogenetic data. Because of this, we can typically infer a single most probable ARG, avoiding coarse population dynamic summary statistics. In addition, a focus on phylogenetic data suggests novel probability distributions on ARGs. To make inference with our model, we develop a reversible jump Markov chain Monte Carlo sampler to approximate the posterior distribution of SMARTIE. Using the BEAST phylogenetic software as a foundation, the sampler employs a parallel computing approach that allows for inference on large-scale data sets. To demonstrate SMARTIE, we explore 2 separate phylogenetic applications, one involving pathogenic Leptospirochete and the other Saccharomyces.
进化生物学家已经引入了许多统计方法来探索非垂直进化,例如水平基因转移、重组和基因组重排,通过收集马尔可夫依赖的基因树。这些树集允许对非垂直进化进行推断,但只是间接的,使得结果难以解释,模型难以推广。探索非垂直进化的另一种方法依赖于系统发生网络。这些网络提供了一个建模非垂直进化的框架,但留下了一些未解决的问题,例如特定非垂直事件的统计显著性。在本文中,我们通过引入基于祖先重组图(ARG)的“重组和转移事件的随机模型”(SMARTIE)来纠正这两种方法的缺点。ARG 是有向图,允许对垂直物种形成事件和非垂直进化事件进行正式的概率推断。我们将 SMARTIE 应用于系统发生数据。因此,我们通常可以推断出单个最可能的 ARG,避免了粗略的种群动态汇总统计。此外,对系统发生数据的关注表明 ARG 上有新的概率分布。为了对我们的模型进行推断,我们开发了一个可逆跳跃马尔可夫链蒙特卡罗采样器来近似 SMARTIE 的后验分布。采样器以 BEAST 系统发生软件为基础,采用并行计算方法,允许对大规模数据集进行推断。为了演示 SMARTIE,我们探索了两个独立的系统发生应用,一个涉及致病性钩端螺旋体,另一个涉及酿酒酵母。