Witherspoon D J, Wooding S, Rogers A R, Marchani E E, Watkins W S, Batzer M A, Jorde L B
Department of Human Genetics, University of Utah Health Sciences Center, Salt Lake City, Utah 84112, USA.
Genetics. 2007 May;176(1):351-9. doi: 10.1534/genetics.106.067355. Epub 2007 Mar 4.
The proportion of human genetic variation due to differences between populations is modest, and individuals from different populations can be genetically more similar than individuals from the same population. Yet sufficient genetic data can permit accurate classification of individuals into populations. Both findings can be obtained from the same data set, using the same number of polymorphic loci. This article explains why. Our analysis focuses on the frequency, omega, with which a pair of random individuals from two different populations is genetically more similar than a pair of individuals randomly selected from any single population. We compare omega to the error rates of several classification methods, using data sets that vary in number of loci, average allele frequency, populations sampled, and polymorphism ascertainment strategy. We demonstrate that classification methods achieve higher discriminatory power than omega because of their use of aggregate properties of populations. The number of loci analyzed is the most critical variable: with 100 polymorphisms, accurate classification is possible, but omega remains sizable, even when using populations as distinct as sub-Saharan Africans and Europeans. Phenotypes controlled by a dozen or fewer loci can therefore be expected to show substantial overlap between human populations. This provides empirical justification for caution when using population labels in biomedical settings, with broad implications for personalized medicine, pharmacogenetics, and the meaning of race.
由于群体间差异导致的人类遗传变异比例不大,而且不同群体的个体在基因上可能比同一群体的个体更为相似。然而,足够的基因数据能够将个体准确地分类到不同群体中。这两个发现都可以从同一数据集中获得,使用相同数量的多态性位点。本文将解释其中的原因。我们的分析聚焦于频率ω,即来自两个不同群体的一对随机个体在基因上比从任何单个群体中随机选取的一对个体更为相似的频率。我们使用在位点数量、平均等位基因频率、抽样群体以及多态性确定策略等方面存在差异的数据集,将ω与几种分类方法的错误率进行比较。我们证明,由于分类方法利用了群体的总体特征,所以它们比ω具有更高的判别能力。所分析的位点数量是最关键的变量:有100个多态性位点时,准确分类是可能的,但即使使用像撒哈拉以南非洲人和欧洲人这样差异明显的群体,ω仍然相当可观。因此,可以预期由十几个或更少位点控制的表型在人类群体之间会有大量重叠。这为在生物医学环境中使用群体标签时保持谨慎提供了实证依据,对个性化医疗、药物遗传学以及种族的意义具有广泛影响。