Thawornwattana Yuttapong, Flouri Tomáš, Mallet James, Yang Ziheng
Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.
Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.
Mol Biol Evol. 2025 Jun 4;42(6). doi: 10.1093/molbev/msaf121.
Thanks to genomic data, interspecific gene flow is increasingly recognized as a major evolutionary force that shapes biodiversity. Two models have been developed in the multispecies coalescent (MSC) framework to infer gene flow from genomic data, assuming either constant-rate continuous migration (MSC-M) or discrete introgression/hybridization (MSC-I). The extreme simplicity of these models raises concerns about their usefulness as they represent misspecified models when applied to real data. Here, we study inference of gene flow under the MSC-M model, considering mis-assignment of gene flow onto incorrect parental or daughter lineages, misspecification of the direction of gene flow, and misspecification of the mode of gene flow. Mis-assignment of gene flow to an incorrect lineage causes large biases in the estimated rates. The Bayesian test has high power for inferring both recent and ancient gene flow, between either sister lineages or nonsister lineages, although misspecification of the direction of gene flow may make it hard to distinguish early divergence with gene flow from recent complete isolation. Misspecification of the mode of gene flow (MSC-I versus MSC-M) has small local effects, and gene flow is detected with high power despite the misspecification. We analyze a genomic dataset from the purple cone spruce (Picea spp., Pinaceae), which putatively arose through homoploid hybrid speciation, to demonstrate practical implications of our theoretical analyses. Overall, we find that the extremely idealized models of gene flow (in particular the discrete MSC-I model) are very effective for extracting information about species divergence and gene flow from genomic data.
得益于基因组数据,种间基因流日益被视为塑造生物多样性的一种主要进化力量。在多物种溯祖(MSC)框架下,已开发出两种模型来从基因组数据推断基因流,一种假设为恒定速率的连续迁移(MSC-M),另一种假设为离散的渐渗/杂交(MSC-I)。这些模型极其简单,这引发了人们对其效用的担忧,因为当应用于实际数据时,它们代表了错误设定的模型。在此,我们研究在MSC-M模型下基因流的推断,考虑基因流错误分配到不正确的亲本或子代谱系、基因流方向的错误设定以及基因流模式的错误设定。将基因流错误分配到不正确的谱系会导致估计速率出现很大偏差。贝叶斯检验在推断姐妹谱系或非姐妹谱系之间近期和古代的基因流方面具有很高的功效,尽管基因流方向的错误设定可能使得难以区分有基因流的早期分化与近期的完全隔离。基因流模式的错误设定(MSC-I与MSC-M)具有较小的局部影响,并且尽管存在错误设定,基因流仍能被高效检测到。我们分析了来自紫果云杉(松科云杉属)的一个基因组数据集,该数据集推测是通过同倍体杂交物种形成产生的,以证明我们理论分析的实际意义。总体而言,我们发现基因流的极其理想化的模型(特别是离散的MSC-I模型)对于从基因组数据中提取有关物种分化和基因流的信息非常有效。