Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
Mol Ecol Resour. 2019 Sep;19(5):1144-1152. doi: 10.1111/1755-0998.13019. Epub 2019 Jun 12.
Testing for deviations from Hardy-Weinberg equilibrium (HWE) is a common practice for quality control in genetic studies. Variable sites violating HWE may be identified as technical errors in the sequencing or genotyping process, or they may be of particular evolutionary interest. Large-scale genetic studies based on next-generation sequencing (NGS) methods have become more prevalent as cost is decreasing but these methods are still associated with statistical uncertainty. The large-scale studies usually consist of samples from diverse ancestries that make the existence of some degree of population structure almost inevitable. Precautions are therefore needed when analysing these data set, as population structure causes deviations from HWE. Here we propose a method that takes population structure into account in the testing for HWE, such that other factors causing deviations from HWE can be detected. We show the effectiveness of PCAngsd in low-depth NGS data, as well as in genotype data, for both simulated and real data set, where the use of genotype likelihoods enables us to model the uncertainty.
检测哈迪-温伯格平衡(HWE)偏离是遗传研究中质量控制的常见做法。违反 HWE 的变异位点可能被识别为测序或基因分型过程中的技术错误,或者它们可能具有特殊的进化意义。基于下一代测序(NGS)方法的大规模遗传研究随着成本的降低变得更加普遍,但这些方法仍然存在统计不确定性。这些大规模研究通常由来自不同祖先的样本组成,这使得存在一定程度的群体结构几乎是不可避免的。因此,在分析这些数据集时需要谨慎,因为群体结构会导致 HWE 偏离。在这里,我们提出了一种在检测 HWE 时考虑群体结构的方法,以便检测其他导致 HWE 偏离的因素。我们展示了 PCAngsd 在低深度 NGS 数据以及基因型数据中的有效性,无论是模拟数据集还是真实数据集,使用基因型似然度使我们能够模拟不确定性。