Jin Wenfei, Li Ran, Zhou Ying, Xu Shuhua
Max Planck Independent Research Group on Population Genomics, Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
Eur J Hum Genet. 2014 Jul;22(7):930-7. doi: 10.1038/ejhg.2013.265. Epub 2013 Nov 20.
The ancestral chromosomal segments in admixed genomes are of significant importance for both population history inference and admixture mapping, because they essentially provide the basic information for tracking genetic events. However, the distributions of the lengths of ancestral chromosomal segments (LACS) under some admixture models remain poorly understood. Here we introduced a theoretical framework on the distribution of LACS in two representative admixture models, that is, hybrid isolation (HI) model and gradual admixture (GA) model. Although the distribution of LACS in the GA model differs from that in the HI model, we demonstrated that the mean LACS in the HI model is approximately half of that in the GA model if both admixture proportion and admixture time in the two models are identical. We showed that the theoretical framework greatly facilitated the inference and understanding of population admixture history by analyzing African-American and Mexican empirical data. In addition, we found the peak of association signatures in the HI model was much narrower and sharper than that in the GA model, indicating that the identification of putative causal allele in the HI model is more efficient than that in the GA model. Thus admixture mapping with case-only data would be a reasonable and economical choice in the HI model due to the weak background noise. However, according to our previous studies, many populations are likely to be gradually admixed and have pretty high background linkage disequilibrium. Therefore, we suggest using a case-control approach rather than a case-only approach to conduct admixture mapping to retain the statistics power in recently admixed populations.
混合基因组中的祖先染色体片段对于群体历史推断和混合映射都具有重要意义,因为它们本质上为追踪遗传事件提供了基本信息。然而,在一些混合模型下,祖先染色体片段长度(LACS)的分布仍知之甚少。在此,我们介绍了一个关于LACS在两种代表性混合模型中的分布的理论框架,即杂交隔离(HI)模型和渐进混合(GA)模型。尽管GA模型中LACS的分布与HI模型不同,但我们证明,如果两个模型中的混合比例和混合时间相同,HI模型中LACS的平均值约为GA模型中的一半。我们通过分析非裔美国人和墨西哥人的实证数据表明,该理论框架极大地促进了对群体混合历史的推断和理解。此外,我们发现HI模型中关联信号的峰值比GA模型中的窄得多且尖锐得多,这表明在HI模型中鉴定假定的因果等位基因比在GA模型中更有效。因此,由于背景噪声较弱,在HI模型中仅用病例数据进行混合映射将是一种合理且经济的选择。然而,根据我们之前的研究,许多群体可能是渐进混合的,并且具有相当高的背景连锁不平衡。因此,我们建议使用病例对照方法而不是仅用病例方法来进行混合映射,以在最近混合的群体中保留统计功效。