School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
Genetics. 2012 Aug;191(4):1295-308. doi: 10.1534/genetics.112.140228. Epub 2012 May 29.
We present a new haplotype-based approach for inferring local genetic ancestry of individuals in an admixed population. Most existing approaches for local ancestry estimation ignore the latent genetic relatedness between ancestral populations and treat them as independent. In this article, we exploit such information by building an inheritance model that describes both the ancestral populations and the admixed population jointly in a unified framework. Based on an assumption that the common hypothetical founder haplotypes give rise to both the ancestral and the admixed population haplotypes, we employ an infinite hidden Markov model to characterize each ancestral population and further extend it to generate the admixed population. Through an effective utilization of the population structural information under a principled nonparametric Bayesian framework, the resulting model is significantly less sensitive to the choice and the amount of training data for ancestral populations than state-of-the-art algorithms. We also improve the robustness under deviation from common modeling assumptions by incorporating population-specific scale parameters that allow variable recombination rates in different populations. Our method is applicable to an admixed population from an arbitrary number of ancestral populations and also performs competitively in terms of spurious ancestry proportions under a general multiway admixture assumption. We validate the proposed method by simulation under various admixing scenarios and present empirical analysis results from a worldwide-distributed dataset from the Human Genome Diversity Project.
我们提出了一种新的基于单倍型的方法,用于推断混合人群中个体的局部遗传祖先。大多数现有的局部祖先估计方法忽略了祖先群体之间潜在的遗传亲缘关系,并将其视为独立的。在本文中,我们通过构建一个遗传模型来利用这些信息,该模型在统一的框架中联合描述了祖先群体和混合群体。基于常见假设的共同起始单倍型导致了祖先和混合群体单倍型的假设,我们采用了一个无限隐藏马尔可夫模型来描述每个祖先群体,并进一步扩展它来生成混合群体。通过在原则性的无参数贝叶斯框架下有效利用群体结构信息,与最先进的算法相比,该模型对祖先群体的选择和训练数据量的敏感性显著降低。我们还通过引入特定于群体的比例参数来提高对偏离常见建模假设的稳健性,该参数允许不同群体的重组率不同。我们的方法适用于任意数量祖先群体的混合群体,并且在一般的多向混合假设下,在虚假祖先比例方面表现也很有竞争力。我们通过各种混合场景下的模拟验证了所提出的方法,并从人类基因组多样性计划的全球分布数据集呈现了经验分析结果。