Department of Statistics, Stanford University, Stanford, California, United States of America.
PLoS Genet. 2011 Dec;7(12):e1002410. doi: 10.1371/journal.pgen.1002410. Epub 2011 Dec 15.
For most of the world, human genome structure at a population level is shaped by interplay between ancient geographic isolation and more recent demographic shifts, factors that are captured by the concepts of biogeographic ancestry and admixture, respectively. The ancestry of non-admixed individuals can often be traced to a specific population in a precise region, but current approaches for studying admixed individuals generally yield coarse information in which genome ancestry proportions are identified according to continent of origin. Here we introduce a new analytic strategy for this problem that allows fine-grained characterization of admixed individuals with respect to both geographic and genomic coordinates. Ancestry segments from different continents, identified with a probabilistic model, are used to construct and study "virtual genomes" of admixed individuals. We apply this approach to a cohort of 492 parent-offspring trios from Mexico City. The relative contributions from the three continental-level ancestral populations-Africa, Europe, and America-vary substantially between individuals, and the distribution of haplotype block length suggests an admixing time of 10-15 generations. The European and Indigenous American virtual genomes of each Mexican individual can be traced to precise regions within each continent, and they reveal a gradient of Amerindian ancestry between indigenous people of southwestern Mexico and Mayans of the Yucatan Peninsula. This contrasts sharply with the African roots of African Americans, which have been characterized by a uniform mixing of multiple West African populations. We also use the virtual European and Indigenous American genomes to search for the signatures of selection in the ancestral populations, and we identify previously known targets of selection in other populations, as well as new candidate loci. The ability to infer precise ancestral components of admixed genomes will facilitate studies of disease-related phenotypes and will allow new insight into the adaptive and demographic history of indigenous people.
对于世界上大多数地区而言,人类基因组结构在群体水平上是由古代地理隔离和最近的人口变化相互作用所塑造的,这两个因素分别由生物地理亲缘关系和混合的概念来捕捉。未混合个体的亲缘关系通常可以追溯到特定地区的特定人群,但目前研究混合个体的方法通常只能提供粗略的信息,根据起源大陆来确定基因组亲缘关系的比例。在这里,我们介绍了一种新的分析策略,用于精细描述混合个体的地理和基因组坐标。通过概率模型确定来自不同大陆的亲缘关系片段,用于构建和研究混合个体的“虚拟基因组”。我们将这种方法应用于来自墨西哥城的 492 个亲子三人组队列。来自三个大陆水平的祖先群体(非洲、欧洲和美洲)的相对贡献在个体之间差异很大,单倍型块长度的分布表明混合时间为 10-15 代。每个墨西哥个体的欧洲和美洲原住民虚拟基因组可以追溯到每个大陆的精确区域,它们揭示了墨西哥西南部的原住民和尤卡坦半岛的玛雅人之间的美洲印第安人祖先的梯度。这与非裔美国人的非洲根源形成鲜明对比,后者的特征是多个西非群体的均匀混合。我们还使用虚拟欧洲和美洲原住民基因组来搜索祖先群体中的选择信号,并确定了其他群体中以前已知的选择目标,以及新的候选基因座。推断混合基因组中精确的祖先成分的能力将促进与疾病相关表型的研究,并为了解原住民的适应性和人口历史提供新的见解。