Liu Nianjun, Zhang Dabao, Zhao Hongyu
Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Ala. 35294, USA.
Hum Hered. 2009;67(3):154-62. doi: 10.1159/000181153. Epub 2008 Dec 15.
Identifying genotyping errors is an important issue in genetic research, yet it has been relatively less studied in samples consisting of unrelated individuals. In this article, we consider several models of genotyping errors, which were originally proposed for pedigree data, for unrelated population samples with single nucleotide polymorphism (SNP) genotype data. The mathematical constraints are investigated for detecting genotyping errors without resampling replicates or genotyping relatives.
For the various proposed genotyping error models, we unveil the conditions under which the parameters are identifiable. These results are verified through applications to simulated and real SNP data.
We show that, with constraints, two particular models provide both identifiable error rate and allele frequencies of an SNP for unrelated population data. The simulation study shows that these two models present unbiased estimates for the allele frequencies. One of the models also gives an unbiased estimate for the genotyping error rate.
While the Hardy-Weinberg equilibrium test can be used to detect genotyping errors, a key advantage of these models is the explicit estimates of genotyping error rates and allele frequencies. This work may help researchers to estimate error rates and to use the estimates in their analysis to increase power and decrease bias, without the extra work of genotyping family members or replicates.
识别基因分型错误是基因研究中的一个重要问题,但在由无关个体组成的样本中对此研究相对较少。在本文中,我们考虑了几种最初为系谱数据提出的基因分型错误模型,用于具有单核苷酸多态性(SNP)基因型数据的无关群体样本。研究了在不进行重复抽样或对亲属进行基因分型的情况下检测基因分型错误的数学约束条件。
对于各种提出的基因分型错误模型,我们揭示了参数可识别的条件。通过应用于模拟和真实的SNP数据对这些结果进行了验证。
我们表明,在有约束条件下,两种特定模型可为无关群体数据提供可识别的SNP错误率和等位基因频率。模拟研究表明,这两种模型对等位基因频率给出了无偏估计。其中一种模型对基因分型错误率也给出了无偏估计。
虽然哈迪 - 温伯格平衡检验可用于检测基因分型错误,但这些模型的一个关键优势是对基因分型错误率和等位基因频率的明确估计。这项工作可能有助于研究人员估计错误率,并在分析中使用这些估计值来提高检验效能和减少偏差,而无需对家庭成员或重复样本进行额外的基因分型工作。