Wall Jeffrey D, Cox Murray P, Mendez Fernando L, Woerner August, Severson Tesa, Hammer Michael F
Institute for Human Genetics and Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94143, USA.
Genome Res. 2008 Aug;18(8):1354-61. doi: 10.1101/gr.075630.107. Epub 2008 May 20.
While there are now extensive databases of human genomic sequences from both private and public efforts to catalog human nucleotide variation, there are very few large-scale surveys designed for the purpose of analyzing human population history. Demographic inference from patterns of SNP variation in current large public databases is complicated by ascertainment biases associated with SNP discovery and the ways that populations and regions of the genome are sampled. Here, we present results from a resequencing survey of 40 independent intergenic regions on the autosomes and X chromosome comprising ~210 kb from each of 90 humans from six geographically diverse populations (i.e., a total of ~18.9 Mb). Unlike other public DNA sequence databases, we include multiple indigenous populations that serve as important reservoirs of human genetic diversity, such as the San of Namibia, the Biaka of the Central African Republic, and Melanesians from Papua New Guinea. In fact, only 20% of the SNPs that we find are contained in the HapMap database. We identify several key differences in patterns of variability in our database compared with other large public databases, including higher levels of nucleotide diversity within populations, greater levels of differentiation between populations, and significant differences in the frequency spectrum. Because variants at loci included in this database are less likely to be subject to ascertainment biases or linked to sites under selection, these data will be more useful for accurately reconstructing past changes in size and structure of human populations.
虽然目前已有大量来自私人和公共机构的人类基因组序列数据库,用于编目人类核苷酸变异,但旨在分析人类群体历史的大规模调查却很少。当前大型公共数据库中SNP变异模式的人口统计学推断因SNP发现相关的确定偏差以及基因组群体和区域的采样方式而变得复杂。在此,我们展示了对常染色体和X染色体上40个独立基因间区域的重测序调查结果,这些区域来自六个地理上不同群体的90个人,每个区域约210 kb(即总共约18.9 Mb)。与其他公共DNA序列数据库不同,我们纳入了多个作为人类遗传多样性重要储存库的本土群体,如纳米比亚的桑人、中非共和国的比亚卡人以及巴布亚新几内亚的美拉尼西亚人。事实上,我们发现的SNP中只有20%包含在HapMap数据库中。我们确定了与其他大型公共数据库相比,我们数据库中变异模式的几个关键差异,包括群体内更高水平的核苷酸多样性、群体间更大程度的分化以及频率谱的显著差异。由于该数据库中位点的变异不太可能受到确定偏差的影响,也不太可能与选择下的位点连锁,这些数据将更有助于准确重建人类群体过去的规模和结构变化。