将主成分判别分析应用于基因型数据的标准化指南。

Guidelines for standardizing the application of discriminant analysis of principal components to genotype data.

作者信息

Thia Joshua A

机构信息

Bio21 Institute, School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia.

出版信息

Mol Ecol Resour. 2023 Apr;23(3):523-538. doi: 10.1111/1755-0998.13706. Epub 2022 Sep 7.

DOI:10.1111/1755-0998.13706

PMID:36039574

Abstract

Despite the popularity of discriminant analysis of principal components (DAPC) for studying population structure, there has been little discussion of best practice for this method. In this work, I provide guidelines for standardizing the application of DAPC to genotype data sets. An often overlooked fact is that DAPC generates a model describing genetic differences among a set of populations defined by a researcher. Appropriate parameterization of this model is critical for obtaining biologically meaningful results. I show that the number of leading PC axes used as predictors of among-population differences, p , should not exceed the k-1 biologically informative PC axes that are expected for k effective populations in a genotype data set. This k-1 criterion for p specification is more appropriate compared to the widely used proportional variance criterion, which often results in a choice of p ≫ k-1. DAPC parameterized with no more than the leading k-1 PC axes: (i) is more parsimonious; (ii) captures maximal among-population variation on biologically relevant predictors; (iii) is less sensitive to unintended interpretations of population structure; and (iv) is more generally applicable to independent sample sets. Assessing model fit should be routine practice and aids interpretation of population structure. It is imperative that researchers articulate their study goals, that is, testing a priori expectations vs. studying de novo inferred populations, because this has implications on how their DAPC results should be interpreted. The discussion and practical recommendations in this work provide the molecular ecology community with a roadmap for using DAPC in population genetic investigations.

摘要

尽管主成分判别分析（DAPC）在研究种群结构方面很受欢迎，但对于该方法的最佳实践却鲜有讨论。在这项工作中，我提供了将DAPC应用于基因型数据集的标准化指南。一个经常被忽视的事实是，DAPC生成了一个描述研究人员定义的一组种群之间遗传差异的模型。对该模型进行适当的参数化对于获得具有生物学意义的结果至关重要。我表明，用作种群间差异预测因子的主成分轴数量p不应超过基因型数据集中k个有效种群预期的k - 1个具有生物学信息的主成分轴。与广泛使用的比例方差标准相比，这个用于指定p的k - 1标准更为合适，比例方差标准常常导致选择的p≫k - 1。用不超过前k - 1个主成分轴进行参数化的DAPC：（i）更简洁；（ii）在生物学相关预测因子上捕获最大的种群间变异；（iii）对种群结构的意外解释不太敏感；（iv）更普遍适用于独立样本集。评估模型拟合应该是常规操作，并有助于对种群结构的解释。研究人员必须阐明他们的研究目标，即检验先验预期与研究新推断的种群，因为这对如何解释他们的DAPC结果有影响。这项工作中的讨论和实际建议为分子生态学领域提供了在种群遗传调查中使用DAPC的路线图。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

将主成分判别分析应用于基因型数据的标准化指南。

Guidelines for standardizing the application of discriminant analysis of principal components to genotype data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

将主成分判别分析应用于基因型数据的标准化指南。

Guidelines for standardizing the application of discriminant analysis of principal components to genotype data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献