结构比其他聚类方法在模拟的混倍体群体中更稳健。

STRUCTURE is more robust than other clustering methods in simulated mixed-ploidy populations.

机构信息

Ecology, Department of Biology, University of Konstanz, Konstanz, Germany.

Department of Botany, Faculty of Science, Charles University in Prague, Prague, Czechia.

出版信息

Heredity (Edinb). 2019 Oct;123(4):429-441. doi: 10.1038/s41437-019-0247-6. Epub 2019 Jul 8.

DOI:10.1038/s41437-019-0247-6

PMID:31285566

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6781132/

Abstract

Analysis of population genetic structure has become a standard approach in population genetics. In polyploid complexes, clustering analyses can elucidate the origin of polyploid populations and patterns of admixture between different cytotypes. However, combining diploid and polyploid data can theoretically lead to biased inference with (artefactual) clustering by ploidy. We used simulated mixed-ploidy (diploid-autotetraploid) data to systematically compare the performance of k-means clustering and the model-based clustering methods implemented in STRUCTURE, ADMIXTURE, FASTSTRUCTURE and INSTRUCT under different scenarios of differentiation and with different marker types. Under scenarios of strong population differentiation, the tested applications performed equally well. However, when population differentiation was weak, STRUCTURE was the only method that allowed unbiased inference with markers with limited genotypic information (co-dominant markers with unknown dosage or dominant markers). Still, since STRUCTURE was comparatively slow, the much faster but less powerful FASTSTRUCTURE provides a reasonable alternative for large datasets. Finally, although bias makes k-means clustering unsuitable for markers with incomplete genotype information, for large numbers of loci (>1000) with known dosage k-means clustering was superior to FASTSTRUCTURE in terms of power and speed. We conclude that STRUCTURE is the most robust method for the analysis of genetic structure in mixed-ploidy populations, although alternative methods should be considered under some specific conditions.

摘要

群体遗传结构分析已成为群体遗传学中的一种标准方法。在多倍体复合体中，聚类分析可以阐明多倍体群体的起源以及不同细胞型之间的混合模式。然而，将二倍体和多倍体数据结合起来，理论上可能会因倍性而导致聚类的有偏差推断（人为聚类）。我们使用模拟的混合倍性（二倍体-自四倍体）数据，系统比较了 k-means 聚类和 STRUCTURE、ADMIXTURE、FASTSTRUCTURE 和 INSTRUCT 中基于模型的聚类方法在不同分化场景和不同标记类型下的性能。在群体分化较强的情况下，测试的应用程序表现相当。然而，当群体分化较弱时，STRUCTURE 是唯一允许使用具有有限基因型信息（未知剂量的共显性标记或显性标记）的标记进行无偏推断的方法。尽管 STRUCTURE 比较慢，但速度更快但功能较弱的 FASTSTRUCTURE 为大型数据集提供了一个合理的替代方案。最后，尽管偏差使得 k-means 聚类不适合具有不完全基因型信息的标记，但对于具有已知剂量的大量标记（>1000），k-means 聚类在功率和速度方面优于 FASTSTRUCTURE。我们的结论是，尽管在某些特定条件下应考虑替代方法，但 STRUCTURE 是分析混合倍性群体遗传结构的最稳健方法。