Kornai Daniel, Jiao Xiyun, Ji Jiayi, Flouri Tomáš, Yang Ziheng
Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.
Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen, Guangdong 518055, China.
Syst Biol. 2024 Nov 29;73(6):1015-1037. doi: 10.1093/sysbio/syae050.
The multispecies coalescent (MSC) model accommodates genealogical fluctuations across the genome and provides a natural framework for comparative analysis of genomic sequence data from closely related species to infer the history of species divergence and gene flow. Given a set of populations, hypotheses of species delimitation (and species phylogeny) may be formulated as instances of MSC models (e.g., MSC for 1 species versus MSC for 2 species) and compared using Bayesian model selection. This approach, implemented in the program bpp, has been found to be prone to over-splitting. Alternatively, heuristic criteria based on population parameters (such as population split times, population sizes, and migration rates) estimated from genomic data may be used to delimit species. Here, we develop hierarchical merge and split algorithms for heuristic species delimitation based on the genealogical divergence index (gdi) and implement them in a Python pipeline called hhsd. We characterize the behavior of the gdi under a few simple scenarios of gene flow. We apply the new approaches to a dataset simulated under a model of isolation by distance as well as 3 empirical datasets. Our tests suggest that the new approaches produced sensible results and were less prone to oversplitting. We discuss possible strategies for accommodating paraphyletic species in the hierarchical algorithm, as well as the challenges of species delimitation based on heuristic criteria.
多物种合并(MSC)模型考虑了全基因组的谱系波动,并为比较分析来自近缘物种的基因组序列数据以推断物种分化历史和基因流提供了一个自然框架。给定一组种群,物种界定(以及物种系统发育)的假设可以被表述为MSC模型的实例(例如,1个物种的MSC与2个物种的MSC),并使用贝叶斯模型选择进行比较。在程序bpp中实现的这种方法已被发现容易过度划分。或者,可以使用基于从基因组数据估计的种群参数(如种群分裂时间、种群大小和迁移率)的启发式标准来界定物种。在这里,我们基于谱系分化指数(gdi)开发了用于启发式物种界定的分层合并和分裂算法,并在一个名为hhsd的Python管道中实现了它们。我们在一些简单的基因流场景下刻画了gdi的行为。我们将新方法应用于在距离隔离模型下模拟的一个数据集以及3个实证数据集。我们的测试表明,新方法产生了合理的结果,并且不太容易过度划分。我们讨论了在分层算法中容纳并系物种的可能策略,以及基于启发式标准进行物种界定的挑战。