Suppr
超能文献

通过狄利克雷过程进行贝叶斯单倍型推断

Bayesian haplotype inference via the Dirichlet process.

作者信息

Xing Eric P, Jordan Michael I, Sharan Roded

机构信息

School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.

出版信息

J Comput Biol. 2007 Apr;14(3):267-84. doi: 10.1089/cmb.2006.0102.

DOI:10.1089/cmb.2006.0102

PMID:17563311

Abstract

The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship trading off these errors against the size of the pool of haplotypes. We describe an algorithm based on Markov chain Monte Carlo for posterior inference in our model. The overall result is a flexible Bayesian method, referred to as DP-Haplotyper, that is reminiscent of parsimony methods in its preference for small haplotype pools. We further generalize the model to treat pedigree relationships (e.g., trios) between the population's genotypes. We apply DP-Haplotyper to the analysis of both simulated and real genotype data, and compare to extant methods.

摘要

从单核苷酸多态性（SNP）基因型推断单倍型的问题，对于理解群体内部和群体之间的遗传变异至关重要，在疾病易感性及其他复杂性状的遗传分析中有着重要应用。该问题可被表述为一个混合模型，其中混合成分对应于群体中的单倍型库。这个库的大小是未知的；实际上，知道库的大小就相当于了解了关于基因组及其历史的一些重要信息。因此，用于拟合基因型混合的方法必须关键地解决估计具有未知数量混合成分的混合模型这一问题。在本文中，我们基于一种称为狄利克雷过程的非参数先验，提出了一种针对此问题的贝叶斯方法。该模型还纳入了一个似然函数，该似然函数捕捉了单倍型/基因型关系中的统计误差，并在这些误差与单倍型库大小之间进行权衡。我们描述了一种基于马尔可夫链蒙特卡罗的算法，用于在我们的模型中进行后验推断。总体结果是一种灵活的贝叶斯方法，称为DP - Haplotyper，它在对小单倍型库的偏好方面类似于简约法。我们进一步推广该模型以处理群体基因型之间的谱系关系（例如三联体）。我们将DP - Haplotyper应用于模拟和真实基因型数据的分析，并与现有方法进行比较。