Yin Junming, Jordan Michael I, Song Yun S
Computer Science Division and Department of Statistics, University of California, Berkeley, CA, USA.
Bioinformatics. 2009 Jun 15;25(12):i231-9. doi: 10.1093/bioinformatics/btp229.
Two known types of meiotic recombination are crossovers and gene conversions. Although they leave behind different footprints in the genome, it is a challenging task to tease apart their relative contributions to the observed genetic variation. In particular, for a given population SNP dataset, the joint estimation of the crossover rate, the gene conversion rate and the mean conversion tract length is widely viewed as a very difficult problem.
In this article, we devise a likelihood-based method using an interleaved hidden Markov model (HMM) that can jointly estimate the aforementioned three parameters fundamental to recombination. Our method significantly improves upon a recently proposed method based on a factorial HMM. We show that modeling overlapping gene conversions is crucial for improving the joint estimation of the gene conversion rate and the mean conversion tract length. We test the performance of our method on simulated data. We then apply our method to analyze real biological data from the telomere of the X chromosome of Drosophila melanogaster, and show that the ratio of the gene conversion rate to the crossover rate for the region may not be nearly as high as previously claimed.
A software implementation of the algorithms discussed in this article is available at http://www.cs.berkeley.edu/ approximately yss/software.html.
已知两种减数分裂重组类型为交叉互换和基因转换。尽管它们在基因组中留下不同的痕迹,但区分它们对观察到的遗传变异的相对贡献是一项具有挑战性的任务。特别是,对于给定的群体单核苷酸多态性(SNP)数据集,交叉互换率、基因转换率和平均转换片段长度的联合估计被广泛认为是一个非常困难的问题。
在本文中,我们设计了一种基于似然性的方法,使用交错隐马尔可夫模型(HMM)来联合估计重组的上述三个基本参数。我们的方法显著改进了最近提出的基于因子隐马尔可夫模型的方法。我们表明,对重叠基因转换进行建模对于改进基因转换率和平均转换片段长度的联合估计至关重要。我们在模拟数据上测试了我们方法的性能。然后,我们将我们的方法应用于分析果蝇X染色体端粒的真实生物学数据,并表明该区域的基因转换率与交叉互换率之比可能不像先前声称的那么高。
本文讨论的算法的软件实现可在http://www.cs.berkeley.edu/ approximately yss/software.html获得。