Zhang Chun, Bailey Dione K, Awad Tarif, Liu Guoying, Xing Guoliang, Cao Manqiu, Valmeekam Venu, Retief Jacques, Matsuzaki Hajime, Taub Margaret, Seielstad Mark, Kennedy Giulia C
Affymetrix Inc, 3380 Central Expressway, Santa Clara, CA 95051, USA.
Bioinformatics. 2006 Sep 1;22(17):2122-8. doi: 10.1093/bioinformatics/btl365. Epub 2006 Jul 15.
The identification of signatures of positive selection can provide important insights into recent evolutionary history in human populations. Current methods mostly rely on allele frequency determination or focus on one or a small number of candidate chromosomal regions per study. With the availability of large-scale genotype data, efficient approaches for an unbiased whole genome scan are becoming necessary.
We have developed a new method, the whole genome long-range haplotype test (WGLRH), which uses genome-wide distributions to test for recent positive selection. Adapted from the long-range haplotype (LRH) test, the WGLRH test uses patterns of linkage disequilibrium (LD) to identify regions with extremely low historic recombination. Common haplotypes with significantly longer than expected ranges of LD given their frequencies are identified as putative signatures of recent positive selection. In addition, we have also determined the ancestral alleles of SNPs by genotyping chimpanzee and gorilla DNA, and have identified SNPs where the non-ancestral alleles have risen to extremely high frequencies in human populations, termed 'flipped SNPs'. Combining the haplotype test and the flipped SNPs determination, the WGLRH test serves as an unbiased genome-wide screen for regions under putative selection, and is potentially applicable to the study of other human populations.
Using WGLRH and high-density oligonucleotide arrays interrogating 116 204 SNPs, we rapidly identified putative regions of positive selection in three populations (Asian, Caucasian, African-American), and extended these observations to a fourth population, Yoruba, with data obtained from the International HapMap consortium. We mapped significant regions to annotated genes. While some regions overlap with genes previously suggested to be under positive selection, many of the genes have not been previously implicated in natural selection and offer intriguing possibilities for further study.
the programs for the WGLRH algorithm are freely available and can be downloaded at http://www.affymetrix.com/support/supplement/WGLRH_program.zip.
识别正选择特征有助于深入了解人类群体近期的进化历史。目前的方法大多依赖于等位基因频率的测定,或者每次研究只关注一个或少数几个候选染色体区域。随着大规模基因型数据的可得性,一种高效的全基因组无偏扫描方法变得十分必要。
我们开发了一种新方法——全基因组长程单倍型检验(WGLRH),该方法利用全基因组分布来检测近期的正选择。WGLRH检验改编自长程单倍型(LRH)检验,它利用连锁不平衡(LD)模式来识别历史重组率极低的区域。根据其频率,LD范围显著长于预期的常见单倍型被确定为近期正选择的假定特征。此外,我们还通过对黑猩猩和大猩猩DNA进行基因分型来确定单核苷酸多态性(SNP)的祖先等位基因,并识别出在人类群体中非祖先等位基因频率极高的SNP,即“翻转SNP”。结合单倍型检验和翻转SNP的确定,WGLRH检验可作为全基因组范围内对假定选择区域的无偏筛选,并且可能适用于其他人类群体的研究。
利用WGLRH和高密度寡核苷酸阵列检测116204个SNP,我们迅速在三个群体(亚洲人、高加索人、非裔美国人)中识别出正选择的假定区域,并利用从国际人类基因组单体型图协会获得的数据,将这些观察结果扩展到第四个群体——约鲁巴人。我们将显著区域定位到注释基因。虽然一些区域与先前认为处于正选择下的基因重叠,但许多基因此前并未涉及自然选择,为进一步研究提供了有趣的可能性。
WGLRH算法的程序可免费获取,可从http://www.affymetrix.com/support/supplement/WGLRH_program.zip下载。