Department of Biological Sciences, Dartmouth College, Hanover, NH, USA.
Mol Biol Evol. 2010 Jul;27(7):1673-85. doi: 10.1093/molbev/msq053. Epub 2010 Feb 25.
Recently, hidden Markov models have been applied to numerous problems in genomics. Here, we introduce an explicit population genetics hidden Markov model (popGenHMM) that uses single nucleotide polymorphism (SNP) frequency data to identify genomic regions that have experienced recent selection. Our popGenHMM assumes that SNP frequencies are emitted independently following diffusion approximation expectations but that neighboring SNP frequencies are partially correlated by selective state. We give results from the training and application of our popGenHMM to a set of early release data from the Drosophila Population Genomics Project (dpgp.org) that consists of approximately 7.8 Mb of resequencing from 32 North American Drosophila melanogaster lines. These results demonstrate the potential utility of our model, making predictions based on the site frequency spectrum (SFS) for regions of the genome that represent selected elements.
最近,隐马尔可夫模型已被应用于基因组学中的众多问题。在这里,我们引入了一个显式的群体遗传学隐马尔可夫模型(popGenHMM),该模型使用单核苷酸多态性(SNP)频率数据来识别经历过近期选择的基因组区域。我们的 popGenHMM 假设 SNP 频率是根据扩散逼近期望独立发出的,但相邻 SNP 频率通过选择状态部分相关。我们给出了从训练和应用我们的 popGenHMM 到一组来自果蝇群体基因组计划(dpgp.org)的早期发布数据的结果,这些数据包括来自 32 个北美果蝇品系的大约 7.8 Mb 的重测序。这些结果证明了我们的模型的潜在用途,它基于基因组中代表选择元素的区域的位点频率谱(SFS)进行预测。