BGI-Shenzhen, Shenzhen, China.
PLoS One. 2012;7(7):e37558. doi: 10.1371/journal.pone.0037558. Epub 2012 Jul 24.
We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.
我们提出了一个用于从新一代测序(NGS)数据中估计和应用样本等位基因频率谱的统计框架。在这种方法中,我们首先使用最大似然法估计等位基因频率谱。与以前的方法不同,似然函数是使用动态规划算法计算的,并使用分析导数进行数值优化。然后,我们使用贝叶斯方法来估计单个位点的样本等位基因频率,并展示如何将该方法用于基因型调用和 SNP 调用。我们还展示了如何将该方法扩展到各种其他情况,包括偏离哈迪-温伯格平衡的情况。我们使用模拟和应用于真实数据集来评估方法的统计特性。