Department of Anthropology, Lehman College, The City University of New York, The Bronx, NY, USA.
Biotechniques. 2010 Jun;48(6):449-54. doi: 10.2144/000113426.
Whole-genome studies of genetic variation are now performed routinely and have accelerated the identification of disease-associated allelic variants, positive selection, recombination, and structural variation. However, these studies are sensitive to the presence of outlier data from individuals of different ancestry than the rest of the sample. Currently, the most common method of excluding outlier individuals is to collect a population sample and exclude outliers after genome-wide data have been collected. Here we show that a small collection of 20-27 polymorphic Alu insertions, selected using a principal component-based method with genetic ancestry estimates, may be used to easily assign Africans, East Asians, and Europeans to their population of origin. In addition, we show that samples from a geographically and genetically intermediate population (in our study, samples from India) can be identified within the original sample of Africans, East Asians, and Europeans. Finally, we show that outlier individuals from neighboring geographic regions (in our study, Yemen and sub-Saharan Africa) can be identified. These results will be of value in preselection of samples for more in-depth analysis as well as customized identification of maximally informative polymorphic markers for regional studies.
全基因组遗传变异研究现在已经常规化,并加速了与疾病相关的等位变异、正选择、重组和结构变异的鉴定。然而,这些研究对来自与样本其余部分不同祖先的个体的异常数据很敏感。目前,排除异常个体最常见的方法是收集一个人群样本,并在全基因组数据收集后排除异常个体。在这里,我们展示了使用基于主成分的方法和遗传祖先估计,选择 20-27 个多态性 Alu 插入,可能用于轻松地将非洲人、东亚人和欧洲人分配到其起源人群中。此外,我们还表明,来自地理和遗传上处于中间位置的人群(在我们的研究中,来自印度的样本)可以在非洲人、东亚人和欧洲人原始样本中被识别。最后,我们表明,可以识别来自邻近地理区域的异常个体(在我们的研究中,也门和撒哈拉以南非洲)。这些结果将有助于在更深入的分析之前对样本进行预选,以及针对区域研究定制最大信息量的多态标记的识别。