Wiens Ben J, Colella Jocelyn P
Department of Ecology and Evolutionary Biology, Biodiversity Institute, University of Kansas, Lawrence, Kansas, USA.
Mol Ecol Resour. 2025 Apr;25(3):e14039. doi: 10.1111/1755-0998.14039. Epub 2024 Oct 28.
Describing naturally occurring genetic variation is a fundamental goal of molecular phylogeography and population genetics. Popular methods for this task include STRUCTURE, a model-based algorithm that assigns individuals to genetic clusters, and principal component analysis (PCA), a parameter-free method. The ability of STRUCTURE to infer mixed ancestry makes it popular for documenting natural hybridisation, which is of considerable interest to evolutionary biologists, given that such systems provide a window into the speciation process. Yet, STRUCTURE can produce misleading results when its underlying assumptions are violated, like when genetic variation is distributed continuously across geographic space. To test the ability of STRUCTURE and PCA to accurately distinguish admixture from continuous variation, we use forward-time simulations to generate population genetic data under three demographic scenarios: two involving admixture and one with isolation by distance (IBD). STRUCTURE and PCA alone cannot distinguish admixture from IBD, but complementing these analyses with triangle plots, which visualise hybrid index against interclass heterozygosity, provides more accurate inference of demographic history, especially in cases of recent admixture. We demonstrate that triangle plots are robust to missing data, while STRUCTURE and PCA are not, and show that setting a low allele frequency difference threshold for ancestry-informative marker (AIM) identification can accurately characterise the relationship between hybrid index and interclass heterozygosity across demographic histories of admixture and range expansion. While STRUCTURE and PCA provide useful summaries of genetic variation, results should be paired with triangle plots before admixture is inferred.
描述自然发生的遗传变异是分子系统地理学和群体遗传学的一个基本目标。完成这项任务的常用方法包括STRUCTURE(一种基于模型的算法,可将个体分配到遗传簇)和主成分分析(PCA,一种无参数方法)。STRUCTURE推断混合血统的能力使其在记录自然杂交方面很受欢迎,鉴于此类系统为物种形成过程提供了一个窗口,进化生物学家对此颇感兴趣。然而,当STRUCTURE的基本假设被违反时,比如当遗传变异在地理空间中连续分布时,它可能会产生误导性结果。为了测试STRUCTURE和PCA准确区分混合与连续变异的能力,我们使用正向时间模拟在三种人口统计学情景下生成群体遗传数据:两种涉及混合,一种是距离隔离(IBD)。仅STRUCTURE和PCA无法区分混合与IBD,但用三角图对这些分析进行补充(三角图可将杂交指数与类间杂合度可视化),能更准确地推断人口历史,尤其是在近期混合的情况下。我们证明三角图对缺失数据具有鲁棒性,而STRUCTURE和PCA则不然,并且表明为祖先信息标记(AIM)识别设置低等位基因频率差异阈值,可以准确描述混合和范围扩张的人口历史中杂交指数与类间杂合度之间的关系。虽然STRUCTURE和PCA提供了有用的遗传变异总结,但在推断混合之前,结果应与三角图相结合。