Research Unit of Mathematical Sciences, University of Oulu, 90014 Oulu, Finland.
The Organismal and Evolutionary Biology Research Programme University of Helsinki, 00014 Helsinki, Finland.
Mol Biol Evol. 2024 Nov 1;41(11). doi: 10.1093/molbev/msae198.
Over the past 15 years, the D-statistic, a four-taxon test for organismal admixture (hybridization, or introgression) which incorporates single nucleotide polymorphism data with allelic patterns ABBA and BABA, has seen considerable use. This statistic seeks to discern significant deviation from either a given species tree assumption, or from the balanced incomplete lineage sorting that could otherwise defy this species tree. However, while the D-statistic can successfully discriminate admixture from incomplete lineage sorting, it is not a simple matter to determine the directionality of admixture using only four-leaf tree models. As such, methods have been developed that use five leaves to evaluate admixture. Among these, the DFOIL method ("FOIL", a mnemonic for "First-Outer-Inner-Last"), which tests allelic patterns on the "symmetric" tree S=(((1,2),(3,4)),5), succeeds in finding admixture direction for many five-taxon examples. However, DFOIL does not make full use of all symmetry, nor can DFOIL function properly when ancient samples are included because of the reliance on singleton patterns (such as BAAAA and ABAAA). Here, we take inspiration from DFOIL to develop a new and completely general family of five-leaf admixture tests, dubbed Δ-statistics, that can either incorporate or exclude the singleton allelic patterns depending on individual taxon and age sampling choices. We describe two new shapes that are also fully testable, namely the "asymmetric" tree A=((((1,2),3),4),5) and the "quasisymmetric" tree Q=(((1,2),3),(4,5)), which can considerably supplement the "symmetric" S=(((1,2),(3,4)),5) model used by DFOIL. We demonstrate the consistency of Δ-statistics under various simulated scenarios, and provide empirical examples using data from black, brown and polar bears, the latter also including two ancient polar bear samples from previous studies. Recently, DFOIL and one of these ancient samples was used to argue for a dominant polar bear → brown bear introgression direction. However, we find, using both this ancient polar bear and our own, that by far the strongest signal using both DFOIL and Δ-statistics on tree S is actually bidirectional gene flow of indistinguishable direction. Further experiments on trees A and Q instead highlight what were likely two phases of admixture: one with stronger brown bear → polar bear introgression in ancient times, and a more recent phase with predominant polar bear → brown bear directionality.
在过去的 15 年中,D 统计量作为一种四分类的生物体混合(杂交或基因渗入)检验方法,已经得到了广泛的应用。该统计量将单核苷酸多态性数据与 ABBA 和 BABA 等位模式相结合,旨在辨别物种树假设或不平衡不完全谱系分选(这可能会挑战物种树)的显著偏差。然而,虽然 D 统计量可以成功地区分混合与不完全谱系分选,但仅使用四叶树模型确定混合的方向并不是一件简单的事情。因此,已经开发了一些使用五叶树来评估混合的方法。其中,DFOIL 方法(“FOIL”,是“First-Outer-Inner-Last”的首字母缩写),通过检验“对称”树 S=(((1,2),(3,4)),5)上的等位模式,成功地找到了许多五叶树例子的混合方向。然而,DFOIL 并没有充分利用所有的对称性,也不能在包括古代样本时正常工作,因为它依赖于单态模式(如 BAAAA 和 ABAAA)。在这里,我们从 DFOIL 中获得灵感,开发了一种新的、完全通用的五叶混合检验方法——Δ统计量,可以根据个体分类群和年龄采样选择来包含或排除单态等位模式。我们描述了两种新的形状,它们也是完全可检验的,即“不对称”树 A=((((1,2),3),4),5)和“准对称”树 Q=(((1,2),3),(4,5)),它们可以极大地补充 DFOIL 使用的“对称”树 S=(((1,2),(3,4)),5)模型。我们在各种模拟场景下证明了 Δ 统计量的一致性,并使用来自黑、棕、北极熊的数据提供了实证例子,后者还包括来自之前研究的两个古代北极熊样本。最近,DFOIL 和其中一个古代样本被用于论证主要的北极熊→棕熊基因渗入方向。然而,我们发现,使用这个古老的北极熊和我们自己的样本,在 S 树上使用 DFOIL 和 Δ 统计量的最强信号实际上是无法区分方向的双向基因流动。在 A 树和 Q 树上的进一步实验则突出了两个可能的混合阶段:一个是在古代更强的棕熊→北极熊基因渗入,另一个是最近阶段更强的北极熊→棕熊方向。