Department of Biological Sciences & Museum of Natural History, Auburn University, 101 Rouse Life Sciences Building, Auburn, AL 36849, USA.
Syst Biol. 2019 May 1;68(3):371-395. doi: 10.1093/sysbio/syy063.
A challenge to understanding biological diversification is accounting for community-scale processes that cause multiple, co-distributed lineages to co-speciate. Such processes predict non-independent, temporally clustered divergences across taxa. Approximate-likelihood Bayesian computation (ABC) approaches to inferring such patterns from comparative genetic data are very sensitive to prior assumptions and often biased toward estimating shared divergences. We introduce a full-likelihood Bayesian approach, ecoevolity, which takes full advantage of information in genomic data. By analytically integrating over gene trees, we are able to directly calculate the likelihood of the population history from genomic data, and efficiently sample the model-averaged posterior via Markov chain Monte Carlo algorithms. Using simulations, we find that the new method is much more accurate and precise at estimating the number and timing of divergence events across pairs of populations than existing approximate-likelihood methods. Our full Bayesian approach also requires several orders of magnitude less computational time than existing ABC approaches. We find that despite assuming unlinked characters (e.g., unlinked single-nucleotide polymorphisms), the new method performs better if this assumption is violated in order to retain the constant characters of whole linked loci. In fact, retaining constant characters allows the new method to robustly estimate the correct number of divergence events with high posterior probability in the face of character-acquisition biases, which commonly plague loci assembled from reduced-representation genomic libraries. We apply our method to genomic data from four pairs of insular populations of Gekko lizards from the Philippines that are not expected to have co-diverged. Despite all four pairs diverging very recently, our method strongly supports that they diverged independently, and these results are robust to very disparate prior assumptions.
理解生物多样化的一个挑战是解释导致多个共同分布的谱系共同形成物种的群落尺度过程。这些过程预测了跨分类单元的非独立、时间聚类的分歧。从比较遗传数据推断这种模式的近似似然贝叶斯计算(ABC)方法对先验假设非常敏感,并且往往偏向于估计共享的分歧。我们引入了一种全似然贝叶斯方法 ecoevolity,它充分利用了基因组数据中的信息。通过对基因树进行分析积分,我们能够直接从基因组数据中计算种群历史的似然,并通过马尔可夫链蒙特卡罗算法有效地对模型平均后验进行采样。通过模拟,我们发现新方法在估计两个种群之间的分歧事件数量和时间方面比现有的近似似然方法要准确和精确得多。我们的全贝叶斯方法还需要比现有的 ABC 方法少几个数量级的计算时间。我们发现,尽管假设了非连锁特征(例如,非连锁单核苷酸多态性),但如果为了保留整个连锁基因座的常数特征而违反此假设,新方法的性能会更好。实际上,保留常数特征可以使新方法在面对特征获取偏差时稳健地以高后验概率估计正确的分歧事件数量,这些偏差通常困扰着从代表性降低的基因组文库组装的基因座。我们将我们的方法应用于来自菲律宾的四个岛屿蜥蜴属壁虎种群的基因组数据,这些种群预计不会共同分化。尽管所有四个种群都在最近才分化,但我们的方法强烈支持它们是独立分化的,并且这些结果对非常不同的先验假设具有鲁棒性。