McVean Gil
Department of Statistics, University of Oxford, Oxford, United Kingdom.
PLoS Genet. 2009 Oct;5(10):e1000686. doi: 10.1371/journal.pgen.1000686. Epub 2009 Oct 16.
Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright's f(st) and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference.
主成分分析(PCA)是群体遗传学中常用的一种统计方法,用于识别跨地理位置和种族背景的遗传变异分布中的结构。然而,尽管该方法经常用于了解历史人口统计过程,但对于基本人口统计参数与样本在主轴上的投影之间的关系却知之甚少。在这里,我表明对于单核苷酸多态性(SNP)数据,样本在主成分上的投影可以直接通过考虑成对单倍体基因组之间的平均合并时间来获得。该结果提供了一个框架,用于根据潜在过程(包括迁移、地理隔离和混合)来解释PCA投影。我还展示了PCA与赖特氏Fst之间的联系,并表明SNP确定对样本投影有很大程度上简单且可预测的影响。通过人类遗传学的例子,我讨论了这些结果在实证数据中的应用以及对推断的影响。