Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.
BMC Genet. 2012 Jun 26;13:49. doi: 10.1186/1471-2156-13-49.
Populations of the Arabian Peninsula have a complex genetic structure that reflects waves of migrations including the earliest human migrations from Africa and eastern Asia, migrations along ancient civilization trading routes and colonization history of recent centuries.
Here, we present a study of genome-wide admixture in this region, using 156 genotyped individuals from Qatar, a country located at the crossroads of these migration patterns. Since haplotypes of these individuals could have originated from many different populations across the world, we have developed a machine learning method "SupportMix" to infer loci-specific genomic ancestry when simultaneously analyzing many possible ancestral populations. Simulations show that SupportMix is not only more accurate than other popular admixture discovery tools but is the first admixture inference method that can efficiently scale for simultaneous analysis of 50-100 putative ancestral populations while being independent of prior demographic information.
By simultaneously using the 55 world populations from the Human Genome Diversity Panel, SupportMix was able to extract the fine-scale ancestry of the Qatar population, providing many new observations concerning the ancestry of the region. For example, as well as recapitulating the three major sub-populations in Qatar, composed of mainly Arabic, Persian, and African ancestry, SupportMix additionally identifies the specific ancestry of the Persian group to populations sampled in Greater Persia rather than from China and the ancestry of the African group to sub-Saharan origin and not Southern African Bantu origin as previously thought.
阿拉伯半岛的人口具有复杂的遗传结构,反映了包括最早从非洲和东亚迁徙而来的人类、沿着古代文明贸易路线迁徙以及近几个世纪殖民化历史在内的多波迁徙浪潮。
在这里,我们研究了该地区的全基因组混合情况,使用了来自卡塔尔的 156 个个体的基因型数据,卡塔尔是这些迁徙模式的交汇点。由于这些个体的单倍型可能来自世界各地的许多不同群体,因此我们开发了一种机器学习方法“SupportMix”,以便在同时分析许多可能的祖先群体时推断特定基因座的基因组祖先。模拟表明,SupportMix 不仅比其他流行的混合发现工具更准确,而且是第一个能够有效地同时分析 50-100 个假定祖先群体的混合推断方法,同时独立于先验人口统计信息。
通过同时使用人类基因组多样性面板中的 55 个世界群体,SupportMix 能够提取卡塔尔人口的精细遗传背景,提供了有关该地区遗传背景的许多新观察结果。例如,除了重现由主要是阿拉伯、波斯和非洲血统组成的卡塔尔的三个主要亚群之外,SupportMix 还将波斯群体的特定祖先追溯到了大波斯地区的采样群体,而不是来自中国,将非洲群体的祖先追溯到撒哈拉以南地区,而不是以前认为的南非班图地区。