Department of Biology, Indiana University, Bloomington, IN 47405, USA.
Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA.
Genetics. 2021 Mar 3;217(1):1-10. doi: 10.1093/genetics/iyaa014.
Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.
基因型调用错误会对遗传分析产生反常影响,干扰关联研究,并掩盖罕见变异。现在的分析通常会纳入错误率来控制虚假发现。然而,由于研究之间存在差异,可靠估计错误率可能很困难。大多数研究即使基因型可能以不止一种方式被误报,也只报告单一的错误率估计值。在这里,我们报告了一种使用系谱信息估计双等位基因座不同类型基因分型错误发生率的方法。我们的方法通过利用未忠实传递单倍型相位的实例来识别潜在的基因分型错误。不一致相位的预期频率取决于系谱中基因型的组合以及每种基因型误报的概率。我们开发了一种模型,该模型利用这些频率的差异来估计不同类型基因型错误的速率。模拟表明,我们的方法在各种情况下都能准确估计这些错误率。我们将此方法应用于三代系谱中全基因组测序的猫头鹰猴(Aotus nancymaae)数据集。我们发现不同类型基因分型错误的估计值存在显著差异,最常见的是纯合参考位点误报为杂合子,反之亦然。我们描述的方法适用于任何可以可靠调用单倍型相位的基因型集合,应该有助于控制假发现。