Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America.
PLoS One. 2012;7(7):e40115. doi: 10.1371/journal.pone.0040115. Epub 2012 Jul 9.
With the availability of high-density genotype information, principal components analysis (PCA) is now routinely used to detect and quantify the genetic structure of populations in both population genetics and genetic epidemiology. An important issue is how to make appropriate and correct inferences about population relationships from the results of PCA, especially when admixed individuals are included in the analysis. We extend our recently developed theoretical formulation of PCA to allow for admixed populations. Because the sampled individuals are treated as features, our generalized formulation of PCA directly relates the pattern of the scatter plot of the top eigenvectors to the admixture proportions and parameters reflecting the population relationships, and thus can provide valuable guidance on how to properly interpret the results of PCA in practice. Using our formulation, we theoretically justify the diagnostic of two-way admixture. More importantly, our theoretical investigations based on the proposed formulation yield a diagnostic of multi-way admixture. For instance, we found that admixed individuals with three parental populations are distributed inside the triangle formed by their parental populations and divide the triangle into three smaller triangles whose areas have the same proportions in the big triangle as the corresponding admixture proportions. We tested and illustrated these findings using simulated data and data from HapMap III and the Human Genome Diversity Project.
随着高密度基因型信息的可用性,主成分分析(PCA)现在常用于在群体遗传学和遗传流行病学中检测和量化群体的遗传结构。一个重要的问题是如何从 PCA 的结果中对群体关系做出适当和正确的推断,特别是当混合个体被纳入分析时。我们将最近开发的 PCA 理论公式扩展到允许混合群体。由于采样个体被视为特征,我们的 PCA 广义公式直接将前特征向量散点图的模式与反映群体关系的混合比例和参数联系起来,因此可以为如何正确解释 PCA 的结果提供有价值的指导。使用我们的公式,我们从理论上证明了双向混合的诊断。更重要的是,我们基于所提出的公式进行的理论研究得出了多向混合的诊断。例如,我们发现有三个亲本群体的混合个体分布在它们的亲本群体形成的三角形内,并将三角形分成三个更小的三角形,其面积在大三角形中的比例与相应的混合比例相同。我们使用模拟数据以及 HapMap III 和人类基因组多样性计划的数据进行了测试和说明。