Ji Jiayi, Roberts Thomas, Flouri Tomáš, Yang Ziheng
Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK.
Syst Biol. 2025 May 27. doi: 10.1093/sysbio/syaf019.
Analysis of genomic data in the past two decades has highlighted the prevalence of introgression as an important evolutionary force in both plants and animals. The genus Drosophila has received much attention recently, with an analysis of genomic sequence data revealing widespread introgression across the species phylogeny for the genus. However, the methods used in the study are based on data summaries for species triplets and are unable to infer gene flow between sister lineages or to identify the direction of gene flow. Hence, we reanalyze a subset of the data using the Bayesian program bpp, which is a full-likelihood implementation of the multispecies coalescent model and can provide more powerful inference of gene flow between species, including its direction, timing, and strength. While our analysis supports the presence of gene flow in the species group, the results differ from the previous study: we infer gene flow between sister lineages undetected previously whereas most gene-flow events inferred in the previous study are rejected in our tests. To verify our conclusions, we performed simulations to examine the properties of Bayesian and summary methods. Bpp was found to have high power to detect gene flow, high accuracy in estimated rates of gene flow, and robustness under misspecification of the mode of gene flow. In contrast, summary methods had low power and produced biased estimates of introgression probability. Our results highlight an urgent need for improving the statistical properties of summary methods and the computational efficiency of likelihood methods for inferring gene flow using genomic sequence data.
过去二十年对基因组数据的分析突出了渐渗作为动植物重要进化力量的普遍性。果蝇属最近备受关注,对基因组序列数据的分析揭示了该属物种系统发育中广泛存在的渐渗现象。然而,该研究中使用的方法基于物种三元组的数据汇总,无法推断姐妹谱系之间的基因流动,也无法确定基因流动的方向。因此,我们使用贝叶斯程序bpp重新分析了一部分数据,该程序是多物种合并模型的全似然实现,能够更有力地推断物种间的基因流动,包括其方向、时间和强度。虽然我们的分析支持该物种组中存在基因流动,但结果与之前的研究不同:我们推断出了先前未检测到的姐妹谱系之间的基因流动,而先前研究中推断出的大多数基因流动事件在我们的测试中被否定。为了验证我们的结论,我们进行了模拟以检验贝叶斯方法和汇总方法的特性。结果发现,bpp检测基因流动的能力很强,估计基因流动速率的准确性很高,并且在基因流动模式指定错误的情况下具有稳健性。相比之下,汇总方法的能力较低,对渐渗概率的估计存在偏差。我们的结果凸显了迫切需要改善汇总方法的统计特性以及似然方法在使用基因组序列数据推断基因流动时的计算效率。